Hi,

On Mon, 2011-08-29 at 10:26 +0200, Martin Weitzel wrote: 
> (Note: This is a crosspost from the depreciated List@
> http://tech.groups.yahoo.com/group/jena-dev/)

Answered here rather than there since we are trying to migrate to the
Apache lists :)

> My main questions are:
> (Q1) What is the best-practice to persist InfModels without losing the
> update-functionality and without triggering a whole new inference-process on
> load? Is this possible with TDB?

The short answer is that there is no provision in Jena for persisting
reasoner state.

You can compute the complete inference closure and persist that, just
like any other bunch of triples, but if you need to later update the
data then you have to start inference again from scratch.

In your case if you really do want arbitrary update then there's no
particular point in persisting the base data in a database, it would be
just as quick to load that small about of data in from file.

> (Q2) How to maintain the update-functionality when using mutliple layer of
> reasoners?

The bottom layer will work as normal. The upper layers will see your
incremental adds/removes but won't see any entailments from the bottom
layers as increments - typically you'll need to trigger a rebind() on
the upper layers which losses incrementality. Layered reasoners are a
pain.

You *may* be able to achieve your goal of having separate rule groups by
using one reasoner but using guard clauses in your rules to delay one
group until you know the other group have finished.

You *may* be able to maintain separate, not layered, reasoners and
manually move the relevant entailment results from one InfModel to the
next.

But in general if you can avoid layering reasoners then do so.

> (Q3) And first of all: Does the InfModel update-functionality work the way I
> am
> assuming? Will updateing the Ontology (adding, removing a Statement)
> correctly
> regarded respective related reasoning actions?

Backward rules (without any table declarations) are always rerun on
demand. There is no state and so they are always incremental in that
sense.

Forward rules are incremental on add. When you add a new triple then
partial results from the previous matches will be reused.

However, deletes are not incremental - if you remove a triple then the
forward reasoner will be started over from scratch. 

> (Q4) Are there other related technologies, that I should consider for my
> application? (owlapi, other reasoners*, Joseki/Fuseki, ...)
> 
> I would appreciate any hints!

There's a couple of approaches you could think about.

If you are going to be dealing with a lot of updates, including removes,
and if your queries between updates tend to be few and narrow, then you
may want to switch to pure on demand reasoning - i.e. do everything as
backward rules. That way there is no state to maintain.
You may even be able to replace some of your reasoning by sufficiently
complex SPARQL queries - the combination of path expressions and
sub-queries has made SPARQL pretty powerful these days.

Conversely if your typical workload involves a lot of queries and only
occasional updates, and if those updates are usually pure adds then you
do want to maintain reasoning state. In that case one option would be to
separately persist both the base data and the inference closure. When
you application starts up it can answer queries from the persisted
inference closure straight away. It could start up a background worker
thread which builds an in-memory InfModel from the base data ready to
handle update processing. That way you get quick start up for query
answering but by the time an update comes in you have a working
in-memory InfModel ready and can switch to that for doing the update and
for future query answering.

[Note that in your case you seem to have both forward and backward rules
so presumably you would persist only the results of forward inference.]


By the way are the rules in http://pastebin.com/i96sM1W3 the full set of
rules?  If so then your slow performance and 20x growth suggests you
have an lot of offer and required conditions and that the quadratic set
of match statements between them is dominating all your data. I don't
understand the details of what you are trying to achieve but you may be
better off having separate phases which first pick out plausible
candidates and then do inference over that reduced plausible set. That
way you may be able to reduce the combinatorics, and with luck the
candidate selection can be done as an on-demand query - that might
reduce your need for persistence of inference state.

Dave


Reply via email to