On 12-03-01 06:49 AM, Andy Seaborne wrote:
The dataset (provided by someone else) is split into about 80000
individual RDF/XML files, all in one directory by default. This makes
even straightforward directory listings slow, and means that the
command-line length/maximum number of arguments is exceeded, so I
can't just refer to *.rdf.

What a nuisance! But given it's a good idea to check all the data first,
you might a well parse each file to a single or few NTriples files then
load from that.

So I suggest step one is parse all the file to get an N-Triples working
set.

for each directory :
riot FILE1 FILE2 ... >> SOMEWHERE/data.nt

I have tested this approach for the past two days. Although I expect the large N-Triples files to load faster, I can't say I see an improvement over the smaller RDF/XML files. It was due to this reason that I posted my initial email to the list on batch counts.

Just to put things in perspective:

N-Triples files: from 7MB to 12 GB

RDF/XML files: 7MB to 15 MB files

The load count is all over the place. Sometimes it appears to load them up fast (for either method), and sometimes not. I haven't been able to identify the core reason behind this.

As I've said, I use tdbloader and I probably won't get a chance to try tdbloader2/3 until I'm through this.

-Sarven

Reply via email to