Re: Optimizing TDB memory usage for large data sets

Andy Seaborne Thu, 15 Aug 2013 05:49:03 -0700

On 15/08/13 10:21, Knut-Olav Hoven wrote:

Hi!


Hi there - thanks for the detailed report.


Two issues, related to memory usage. Import and delete of large graphs.

I am currently doing some tests with 128MB heap with a little over 1M
tuples.
I know I can throw a lot of memory onto the problem, but sooner or later I
will run out.

There are some fixed size caches (as you've discovered) - 128M is likelyto be to small for them.

I've noticed that TDB takes the complete resultset into memory when calling
"DatasetGraphTDB.deleteAny" before looping over all of them to delete them.
This makes a problem for very large graphs if I try to delete the entire
graph or a large selection.

There is supposed to be a specific implement for deleteAny which is likeGraphTDB.removeWorker. But there isn't. Actually, I don't see whyGraphTDB.removeWorker needs to exist if a properDatasetGraphTDB.deleteAny existed.


Recorded as JENA-513.

I'll sort this out by moving the GraphTDB.removeWorker toDatasetGraphTDB and use for deleteAny(...) and from GraphTDB.remove.

The GraphTDB.removeWorker code gets batches of 1000 items, deletes themand tries again until there is nothing more matching the delete pattern.Deletes are not done by iterator.

That said, having the code for iterator remove for RecordRangeIteratorand in TupleTable would be excellent regardless of this. When I wentlooking for BTree code originally, I found various possibilities but alltoo closely tied to their usage to be reusable. We could pull out theB+Tree code into a reusable module.

There are some RecordRangeIterator iterator cases that will not workwith Iterator.delete ... for example, when the B+Tree is not on the samemachine as the TupleIndex client.

I figured out a way to make the iterators backed by indexes/nodes and can
now delete each directly from the iterator. Just hope I have covered all
cases by implementing remove() in RecordRangeIterator and in TupleTable
(connected to all indexes). This was the "easy" part.

The difficult part is the Transaction and Journal which doesn't write to
the journal before the transaction is just about to be committed. This
means that there becomes many Block objects kept in memory in the HashMap
"BlockMgrJournal.writeBlocks".

Yes - this is a limitation of the current transaction system. Theblocks may still be accessed so they can't be written to the journal andforgotten. There could be a cache that knows where the block is in thejournal and fetches it back (minor but them the journal is jumbled andif in numerical block order, the writes for flushing back to the diskare likely more efficient).

My very long term approach would be to use immutable B+Trees where theblocks tree to the root are copied when a block first changes. Thismeans that transactional data is written once, during the writetransaction. Commit means switch to the new root for all subsequenttransactions. Old trees remain. The hard part is that tree needs togarbage collected. Typically, this is done by a background task writinga new copy. c.f. CouchDB, BDB-JE (?) and Mulgara (not B+Trees but sameapproach) amongst others.


This is a not insignificant rewrite of the B+Tree ad BlockMgr code.

If there were a spill cache for BlockMgrJournal that would be a greatthing to have. It's a much more direct way to get scalable transactionsand works without a DB format change.

Trying to fix this by just writing to the journal directly results in
another issue in all those unit tests that open multiple transactions. The
problem is that the journal is not replayed onto the database files if
there are any transactions open. The reason for why BlockMgrJournal works
in those tests are that the writesBlocks HashMap are never cleared after
transaction (and the other transactions hit that one instead of the backing
files).

I also encountered a case during import that led to a corrupt database that
I could not recorver. Always got an exception from "ObjectFileStorage.read"
telling me that I had an "Impossibly large object".

Those cases always started with an OutOfMemoryError during import while
writing to the database files. By lowering the caches Node2NodeIdCacheSize
and NodeId2NodeCacheSize and splitting import files into smaller
batches/transactions it went fine. It seems to recover by just returning an
empty ByteBuffer instead of throwing the exception, but it would just cover
up a bad state I guess. Maybe there might be some optimization that can be
done to the part where the journal is spooled onto the database files to
avoid the OutOfMemoryError issue all together to avoid corrupt databases.

Sorry - if "Impossibly large object" happens the database isunrecoverable. The problem happened at write time - it's just detectedat read time.

Should I open some issues in Jira?


Please do.

I can provide some patches for the iterators remove() functions.


Awesome.



Sincerely,

Knut-Olav Hoven
NRK, Norwegian Broadcaster Corporation


        Andy

Re: Optimizing TDB memory usage for large data sets

Reply via email to