Re: Writing a whole lot of RDF to TDB versus Jena

Paolo Castagna Sat, 21 Jan 2012 07:57:21 -0800

Andy Seaborne wrote:
> Paolo has been looking at this - both hash and incremental ids.  Paolo -
> is there any thing in your mapreduce suite to do bulk incremental loading?


No, tdbloader4's [1] use case is loading into an empty database.
With current ids I did not find an easy and scalable way to merge
two or more set of TDB indexes.

As you said, hash ids (instead of file offsets) would make merging two
or more TDB indexes possible (and easier) as well as enabling different
(and much simpler) approaches at parallel loading.

There is a (non working!) branch [2] where I was trying to see what
it would take to have hash ids in TDB. I had not time recently to put
any effort on it (but it is still something I'd love to accomplish).

Paolo

 [1] https://github.com/castagna/tdbloader4
 [2] 
https://svn.apache.org/repos/asf/incubator/jena/Jena2/TDB/branches/hash-ids/

Re: Writing a whole lot of RDF to TDB versus Jena

Reply via email to