Re: Best practice? - query two tdb datasets at once

Paolo Castagna Sun, 23 Oct 2011 02:13:50 -0700

Bill Roberts wrote:

but simply copying all the data into a single TDB seems the easiest and 
quickest solution.


Related to this also... would it be possible to merge two TDB indexes?

What's the best thing to do if you have large RDF datasets (i.e. large
TDB indexes) and you want to merge them often and fast?

I have been thinking how to solve the problem of merging two TDB indexes
working directly with the on-disk format of the indexes [1]. The problem
is the node table and the fact that NodeId in the node table are offsets
on the object file (i.e. nodes.dat). But, I have a couple of ideas I'd
like to test and I am still trying to do this.

If we can merge two TDB indexes, we can build TDB indexes in parallel
(using MapReduce for example) and merge them at the end. People do similar
things when they need to build large Lucene indexes. With Lucene it's much
easier since it has a notion of "segment" and merging segments is very
easy.

Paolo

[1]https://github.com/castagna/tdbloader3/blob/master/src/test/java/dev/TDBMerge.java

Re: Best practice? - query two tdb datasets at once

Reply via email to