On 16/11/17 15:22, Osma Suominen wrote:
...

Please separate this out into another email - what is the problem and does it apply to the current codebase?

Sorry I wasn't clear. This is something you mentioned yourself in another e-mail on 2017-10-06 about how to load large files into Fuseki with TDB2:

This seems to work:

wget --post-file=/home/afs/Datasets/BSBM/bsbm-200m.nt --header 'Content-type: application/n-triples' http://localhost:3030/data

200M BSBM (49Gbytes) loaded at 42K triples/s.

The content length in the fuskei log is reported wrongly (1002691465 ... int/long error) but the triple count is right.

Now fixed.


The only connection to TDB2 is that with TDB1 transaction sizes were limited, so I guess that the overflow situation never happened. With TDB2 you can now push very large files into Fuseki (yay!), but this exposes the problem. It's a very minor issue at least to me. I'm more interested in the other questions - especially if it's possible to maintain a Fuseki endpoint with a TDB2 store, occasionally pushing new data but not filling the disk doing so.

It was an int/long bug so it happens at 2G.

(there is an equivalent problem in Apache Common FileUpload - but in a place that Jena does not use fortunately when called from the UI. It's unfixable in FileUpload. If yoy say "getString()" then the string is limited to 2G charcaters becasue Java string are char[]'s.)

A 2G file into TDB1 is a few G of RAM maximum, and isn't near the size limits for Fuseki. Fuseki uses TDB cautiously and further restricts the delayed work queue.

   Andy


-Osma

Reply via email to