Hi Rupert, I have created a new index and everything seems to work ok. I guess no changes in the binary data format have occurred between 2.6.3 and 2.7.4.
It took about 78 mins (TDB) + 35 mins (Solr) to process ~ 80M triples. Moreover I did not have any memory issue w.r.t. to completing the whole process using the EntityHub indexing tool. Usually I had to restart the process at least twice because of OutOfMemory exceptions. Considering the fact I am using a machine with 16GB it seems there is something wrong... cheers Andrea 2012/11/16 Rupert Westenthaler <[email protected]>: > The TDB database is located under > > {indexing-working-dir}/indexing/resources/tdb > > If you do have an TDB store with the required data, than you can > provide them under that directory. Just make sure that the > > {indexing-working-dir}/indexing/resources/rdfdata > > folder is empty when you start the tool. Otherwise the RDF files in > that folder would get imported. > > On Fri, Nov 16, 2012 at 2:18 PM, Andrea Di Menna <[email protected]> wrote: >> The first part of the process seems slower on my machine w.r.t. to >> loading triples in a TDB using directly tdbloader2 (Note: I am using >> the latest available version of Jena when running tdloader2 standalone >> - namely 2.7.4). > > Yes the indexing tool uses > > com.hp.hpl.jena:jena:2.6.3 > com.hp.hpl.jena:arq:2.8.5 > com.hp.hpl.jena:tdb:0.8.7 > > but you could still try to use your datastore. Maybe they have not > changed the binary format of the files. > > If not let me know and I will try to update the Jena Version used by > the Indexing Tool > > best > Rupert > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen
