On 16/11/17 15:22, Osma Suominen wrote:
...
Please separate this out into another email - what is the problem and
does it apply to the current codebase?
Sorry I wasn't clear. This is something you mentioned yourself in
another e-mail on 2017-10-06 about how to load large files into Fuseki
with TDB2:
This seems to work:
wget --post-file=/home/afs/Datasets/BSBM/bsbm-200m.nt --header
'Content-type: application/n-triples' http://localhost:3030/data
200M BSBM (49Gbytes) loaded at 42K triples/s.
The content length in the fuskei log is reported wrongly (1002691465
... int/long error) but the triple count is right.
Now fixed.
The only connection to TDB2 is that with TDB1 transaction sizes were
limited, so I guess that the overflow situation never happened. With
TDB2 you can now push very large files into Fuseki (yay!), but this
exposes the problem. It's a very minor issue at least to me. I'm more
interested in the other questions - especially if it's possible to
maintain a Fuseki endpoint with a TDB2 store, occasionally pushing new
data but not filling the disk doing so.
It was an int/long bug so it happens at 2G.
(there is an equivalent problem in Apache Common FileUpload - but in a
place that Jena does not use fortunately when called from the UI. It's
unfixable in FileUpload. If yoy say "getString()" then the string is
limited to 2G charcaters becasue Java string are char[]'s.)
A 2G file into TDB1 is a few G of RAM maximum, and isn't near the size
limits for Fuseki. Fuseki uses TDB cautiously and further restricts the
delayed work queue.
Andy
-Osma