Hi Jerven, You have arrived at a correct conclusion that it's all about the input triples order. An optimal (or near-optimal) order of the input triples would be achieved if they are sorted by predicate and then by subject or object.
When this optimal insert data order is provided chances are that OWLIM's cache will be optimally used and page hits will prevail which would lead to lesser disk I/O and cache fill-up. Having said that I would guess that the 650M dataset or parts of it already exhibits close to optimal order and thus uses less cache than the 330M dataset. Therefore adding more cache memory doesn't make any difference in this scenario (650M, nicely ordered). Hope that explains the mystery! Cheers, Ivan On Thursday 28 July 2011 17:06:45 Jerven Bolleman wrote: > Dear OWLIM developers, > > So this morning I started a new loading run. This time with a larger > tupleIndexMemory setting 15GB in this case instead of the earlier 7GB. > This loads the slow dataset in about 6 hours 15 minutes. So the funny > thing is that we have 330 million triples that are much slower to insert > than another set of 650 million triples. Leading me to conclude that > even when not using reasoning the kind of triples one inserts and in > what kind of pattern can make a very large difference to the loading > performance. The question now is what kind of pattern would be optimal > for loading performance? And why does the size of the tupleIndexMemory > make such a large difference for the 330 million triple dataset but near > no difference for the 650 million triple dataset? > > Regards, > Jerven > _______________________________________________ OWLIM-discussion mailing list OWLIM-discussion@ontotext.com http://ontotext.com/mailman/listinfo/owlim-discussion