Re: Strategies for loading large (>500m triples) datasets

Sarven Capadisli Fri, 02 Mar 2012 12:50:08 -0800

Experiment is all good. I accept what I'm getting myself into ;) I'mtrying to step up from tdbloader due to my needs.

Just a quick FYI: I'm currently testing tdbloader3 on a very smallsample size just to make sure it is working as intended. I'll apply itto my close to 500m triples - I don't have an exact count on this yet,hopefully I can get that off tdbstats at some point. But as you know, Ihave a few roadblocks. Hoping to resolve them soon with your help.


-Sarven

On 12-03-02 03:41 PM, Paolo Castagna wrote:

For other who want to help testing tdbloader3...

We could use Freebase data dump as test dataset. It's ~600 million triples.
You can use this: https://github.com/castagna/freebase2rdf to convert the 
Freebase dump into RDF.

Here is how I run tdbloader3, in this case giving the JVM 5 GB of RAM:
java -cp 
target/jena-tdbloader3-0.1-incubating-SNAPSHOT-jar-with-dependencies.jar 
-server -d64 -Xmx5120M cmd.tdbloader3 --no-stats --compression 
--spill-size-auto --loc target/freebase
freebase-datadump-rdf.nt.gz

Last but not least, remember that tdbloader3 is still an experiment (and there 
are good reasons why it is in the SVN 'Scratch' area). The need of loading ever 
growing RDF datasets is, however, real
and rightly so.

Paolo

Paolo Castagna wrote:

Sarven Capadisli wrote:

I've documented some of my experiences here:
https://issues.apache.org/jira/browse/JENA-117#comment-13221016


Thanks Sarven

Re: https://issues.apache.org/jira/browse/JENA-117#comment-13221074

Paolo

Re: Strategies for loading large (>500m triples) datasets

Reply via email to