On 22/01/12 20:41, Marcus Cobden wrote:
On 16/01/2012 21:05, Dave Reynolds wrote:
On 16/01/12 11:42, Marcus Cobden wrote:
I've loaded a TRIG graph using ng4j, but I am finding that reasoning
over it is particularly slow.
After converting the same graph to N-Triples, and working with only jena
models the reasoning is a lot faster.

Underneath, ng4j is using a MultiUnion graph to combine the named
graphs, would this be causing some slowness?

Possibly. A reasoner will ask a lot of find operations and for
MultiUnion each find will be distributed to each graph which does
entail some overhead. If there is any redundancy between the graphs
then there could be duplicated traversals.

You could test if it is MultiUnion or some other aspect of ng4j by
converting the data to an OntModel instead with addSubModel to add
each graph.

It looks like it's not something ng4j specific:

Pre-flattened n-triples:
~17.14s
ng4j:
~395.08s
ont-model:
~313.46s

I'm running over the BSBM dataset, split into graphs.

How many graphs?

Are the graphs reasonably disjoint or do they contain redundant copies of assertions?

There are 29355
triples before inference, and 40359 after.

This is roughly what my code is doing:

OntModel om = ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM);

// Add a bunch of submodels read from the filesystem.
//Model m = ModelFactory.createDefaultModel();
//om.addSubModel(m);

Assert.assertEquals(29355, om.size());

Resource config = ModelFactory.createDefaultModel()
.createResource()
.addProperty(ReasonerVocabulary.PROPsetRDFSLevel,
RDFSRuleReasoner.FULL_RULES);
Reasoner reasoner = RDFSRuleReasonerFactory.theInstance().create(config);
reasoner.setDerivationLogging(true);

InfModel infm = ModelFactory.createInfModel(reasoner, om);

Assert.assertEquals(40359, infm.size());

All looks reasonable.

Do you have any other suggestions?

Not really I'm afraid.

The reasoner is just doing find calls on the underlying model. It sounds like the overheads of MultiUnion routing the find to each submodel and then running a uniqueness filter over the concatenated results is costing you 20x on performance. Very surprising unless there's a LOT of redundancy between the graphs!

Sounds like some profiling of MultiUnion might be useful if anyone has spare capacity to look at that.

In the meantime it seems like you should create a merge graph to do the inference over, even if you do all the other work using the separated graphs.

Dave

Reply via email to