On 06/10/14 09:39, Claude Warren wrote:
I have the following code


         data = TDBFactory.createDataset( dir.getAbsolutePath() );
         data.begin( ReadWrite.WRITE);
         // read 2 files into the model (approx 3.5K triples)
         Model dataM = loadDir( ctxt, "/WEB-INF/resources/rdf/documents");
         // read http://www.w3.org/2009/08/skos-reference/skos.rdf
         Model schemaM = loadDir( ctxt, "/WEB-INF/resources/rdf/schemas" );
         Reasoner reasoner = ReasonerRegistry.getOWLMiniReasoner();
         reasoner = reasoner.bindSchema(schemaM);

         infModel = ModelFactory.createInfModel(reasoner, dataM);
         data.commit();

         // DEBUGGING CODE
         String realPath = ctxt.getRealPath("/WEB-INF/resources/");
         File f2 = new File(realPath, "data.rdf");

         // OOM occures here
         infModel.write(new FileOutputStream(f2));


So my questions are:


1) Should in inferencing be done within a write transaction and if so how
does one assure that all inferencing will be done within a write
transaction?  In this case I would expect that all inferencing would be
done in the infModel.write() call but in the general case I may not be
making that call.

Inferencing does not write to the store so no need for a write transaction.

2) Shoudn't the inferencing be writing to the TDB datastore?

No, or at least that's not the current design.

The rule engines were designed (many years ago) for purely in-memory use. They exploit specialist data structures (RETE network for the forward rules, goal tables for the backward rules (though those are pretty crude)). The results of inference are not necessarily materialized as triples and when they are they don't go into the base graph (the forward engine has a separate deductions graph for this purpose).

Running the rule engine over a TDB store just makes them go slower and doesn't offer any improved scaling.

The general pattern with Jena is to do the inference in-memory then store the (selected) results to persistent storage.

If is, of course, possible to develop reasoners that work at scale directly over persistent storage. There is a long history of research in deductive databases as well as various techniques to stretch in-memory inference further with more memory efficiency and spill-to-disk. However, those would be new developments, not retrofits to the existing engines.

Dave

Reply via email to