Re: Optimizing TDB memory usage for large data sets

Andy Seaborne Fri, 16 Aug 2013 13:18:56 -0700

Hi Stephen,

On 16/08/13 20:20, Stephen Allen wrote:

On Thu, Aug 15, 2013 at 8:47 AM, Andy Seaborne <[email protected]>
wrote:

...

There is supposed to be a specific implement for deleteAny which is
like GraphTDB.removeWorker.  But there isn't.   Actually, I don't
see why GraphTDB.removeWorker needs to exist if a proper
DatasetGraphTDB.deleteAny existed.


Recorded as JENA-513.

I'll sort this out by moving the GraphTDB.removeWorker to
DatasetGraphTDB and use for deleteAny(...) and from
GraphTDB.remove.

The GraphTDB.removeWorker code gets batches of 1000 items, deletes
them and tries again until there is nothing more matching the
delete pattern.  Deletes are not done by iterator.


So as an alternative, you can use SPARQL Update combined with
setting the ARQ.spillToDiskThreshold parameter to a reasonable value
(10,000 maybe?).  This will enable stream-to-disk functionality for
the intermediate bindings for DELETE/INSERT/WHERE queries (as well
as several of the SPARQL operators in the WHERE clause, see
JENA-119). This should eliminate memory bounds for the most part
except for the TDB's BlockMgrJournal.


JENA-513 is done so do have a look at the code if your interested.  Do

you see an advantage of spill-to-disk? Doesn't it have to materializethe triples?


I would have thought the removeWorker loop should be more efficient
because it works in NodeId space and does not touch the NodeTable.  The
repeated use of find() is efficient because B+Tree branch blocks are

in-memory with very high probability. And there are no temporary filesto worry about.

If there were a spill cache for BlockMgrJournal that would be a
great thing to have.  It's a much more direct way to get scalable
transactions and works without a DB format change.


Agreed.  Unfortunately the *DataBag classes require all data to be
written before any reading occurs, which makes them inappropriate.
Can't we just use another disk-backed B+Tree as a temporary store
here instead of the in-memory HashMap?


If we used some other B+Tree implementation, then maybe.

FYI: This is new ...
http://directory.apache.org/mavibot/vision.html

solving a different B+Tree issue.

The requirement is a persistent hash map from block id (8 bytes) toblock bytes (up to 8K of data). This is happening underneath the indexB+Trees during a transaction.

TDB B+Trees are fixed size key and value and not designed for storing 8Kblocks (no very large value support - they use 8K blocks so can't storean 8K block as there are a few bytes of overhead). They are optimizedfor range scans.

There is a external hash table in TDB as well, but again, not designedfor storing 8K units.

In fact, a regular filesystem directory and one file per 8K spilledblock would be a good mock up. That would utilize the OS file caching,there is no need to sync the files and is quite easy to do and debug.

An advantage of early spill-to-journal, and an in-memory tombstone(~10bytes) (or a B+tree of tombstones) is that this can be the finalwrite of the data to the journal. An off-journal temporary store meansit has to be written to the off-journal store, then read in and writtento the journal. That's extra I/O, probably real I/O as well with diskhead movement (read from one place, write to another).

The only oddity is that the journal is append-only (does not have to bewithin the current uncommitted transaction but append only files arefaster than random access files). If a block gets spilled, is thenupdated, and then spilled again we assume it's sufficiently rare thatappending a second, newer copy to the journal to overwrite the first onplayback is acceptable. Not perfect but the write-in-place to theactive journal area may cost more (it's more likely to need a disk seek).


(Yes, the journal on an SSD is a good idea.)

I've actually been running into this issue because now that
streaming SPARQL Update support is available, I find I am generating
and streaming so much data in a single transaction that I need to
devote a not-insignificant amount of heap just for storing the
pending blocks.


        Andy

Re: Optimizing TDB memory usage for large data sets

Reply via email to