On 06/10/14 09:39, Claude Warren wrote:
I have the following code
data = TDBFactory.createDataset( dir.getAbsolutePath() );
data.begin( ReadWrite.WRITE);
// read 2 files into the model (approx 3.5K triples)
Model dataM = loadDir( ctxt, "/WEB-INF/resources/rdf/documents");
// read http://www.w3.org/2009/08/skos-reference/skos.rdf
Model schemaM = loadDir( ctxt, "/WEB-INF/resources/rdf/schemas" );
Reasoner reasoner = ReasonerRegistry.getOWLMiniReasoner();
reasoner = reasoner.bindSchema(schemaM);
infModel = ModelFactory.createInfModel(reasoner, dataM);
data.commit();
// DEBUGGING CODE
String realPath = ctxt.getRealPath("/WEB-INF/resources/");
File f2 = new File(realPath, "data.rdf");
// OOM occures here
infModel.write(new FileOutputStream(f2));
So my questions are:
1) Should in inferencing be done within a write transaction and if so how
does one assure that all inferencing will be done within a write
transaction? In this case I would expect that all inferencing would be
done in the infModel.write() call but in the general case I may not be
making that call.
Inferencing does not write to the store so no need for a write transaction.
2) Shoudn't the inferencing be writing to the TDB datastore?
No, or at least that's not the current design.
The rule engines were designed (many years ago) for purely in-memory
use. They exploit specialist data structures (RETE network for the
forward rules, goal tables for the backward rules (though those are
pretty crude)). The results of inference are not necessarily
materialized as triples and when they are they don't go into the base
graph (the forward engine has a separate deductions graph for this purpose).
Running the rule engine over a TDB store just makes them go slower and
doesn't offer any improved scaling.
The general pattern with Jena is to do the inference in-memory then
store the (selected) results to persistent storage.
If is, of course, possible to develop reasoners that work at scale
directly over persistent storage. There is a long history of research in
deductive databases as well as various techniques to stretch in-memory
inference further with more memory efficiency and spill-to-disk.
However, those would be new developments, not retrofits to the existing
engines.
Dave