On Thu, Mar 1, 2012 at 11:27 AM, Rob Styles <[email protected]> wrote:
> Hi Glenn,
>
> How big is the data on disk in total? About 22Gb of RDF/XML?
It's 59Gb in total - it seems the average file size is about 750k (my
300k estimate earlier was based on a sample of the first 1000 files
which seem to be smaller than the rest :/).
> If I were doing this I would convert to ntriples which you can do with
> something like:
>
> find /source -type f -exec rapper -i rdfxml -o ntriples {} >>
> /destination/big.rdf.nt \;
>
> Then I'd load directly from that, or split that into a smaller number of
> chunks.
Thanks, that's the second vote for ntriples. I'm still not sure why
the second and subsequent tdbloader sessions were so much slower than
the first.
Regards
Glenn.