Hi Jerven,

You have arrived at a correct conclusion that it's all about the input triples 
order. An optimal (or near-optimal) order of the input triples would be 
achieved if they are sorted by predicate and then by subject or object.

When this optimal insert data order is provided chances are that OWLIM's cache 
will be optimally used and page hits will prevail which would lead to lesser 
disk I/O and cache fill-up. 

Having said that I would guess that the 650M dataset or parts of it already 
exhibits close to optimal order and thus uses less cache than the 330M 
dataset. Therefore adding more cache memory doesn't make any difference in 
this scenario (650M, nicely ordered).


Hope that explains the mystery!


Cheers,
Ivan


On Thursday 28 July 2011 17:06:45 Jerven Bolleman wrote:
> Dear OWLIM developers,
> 
> So this morning I started a new loading run. This time with a larger
> tupleIndexMemory setting 15GB in this case instead of the earlier 7GB.
> This loads the slow dataset in about 6 hours 15 minutes. So the funny
> thing is that we have 330 million triples that are much slower to insert
> than another set of 650 million triples. Leading me to conclude that
> even when not using reasoning the kind of triples one inserts and in
> what kind of pattern can make a very large difference to the loading
> performance. The question now is what kind of pattern would be optimal
> for loading performance? And why does the size of the tupleIndexMemory
> make such a large difference for the 330 million triple dataset but near
> no difference for the 650 million triple dataset?
> 
> Regards,
> Jerven
> 
_______________________________________________
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion

Reply via email to