Andy Seaborne wrote: > Paolo has been looking at this - both hash and incremental ids. Paolo - > is there any thing in your mapreduce suite to do bulk incremental loading?
No, tdbloader4's [1] use case is loading into an empty database. With current ids I did not find an easy and scalable way to merge two or more set of TDB indexes. As you said, hash ids (instead of file offsets) would make merging two or more TDB indexes possible (and easier) as well as enabling different (and much simpler) approaches at parallel loading. There is a (non working!) branch [2] where I was trying to see what it would take to have hash ids in TDB. I had not time recently to put any effort on it (but it is still something I'd love to accomplish). Paolo [1] https://github.com/castagna/tdbloader4 [2] https://svn.apache.org/repos/asf/incubator/jena/Jena2/TDB/branches/hash-ids/
