Re: Parallel SPARQL queries with ARQ?

Dave Reynolds Fri, 28 Oct 2011 03:14:55 -0700

Hi,

On Fri, 2011-10-28 at 08:16 +0000, Rinne Mikko wrote: 
> Hi!
> 
> New to Jena and the list, please bear with me if this has been explained over 
> and over again. I had so far no luck with the documentation, mailing list 
> archives or googling, so here we go:
> 
> Can ARQ be used to execute multiple parallel SPARQL queries?
> 
> I would like to configure e.g. 100 or 1000 queries and then run them against 
> a single file of triples. I wrote a piece of code to run the queries in 
> sequence and got suprisingly good performance with brute force, but I would 
> expect going through the dataset only once to perform much better.
> 
> If ARQ doesn't support this, is it the Jena forward-chaining RETE 
> engine<http://jena.sourceforge.net/inference/> I should be looking at, and 
> translate the SPARQL queries manually?


Like Paolo I'm not quite sure what you are trying to do but based on
that question let me take a guess ...

It sounds like you have your data, maybe in a memory model, and want to
run a *lot* of queries over that single data set. You suspect that
instead of each query starting over again maybe you could stream the
data once through some sort of query sieve to do all the queries at once
in one pass. Is that about right?

If so then there is no specific parallel-SPARQL-query support in Jena
but as you say it might be possible to use the RETE engine depending on
the specifics of what you are doing.

As an aside, note it is possible issue SPARQL queries in parallel (most
of the Jena stores are Multiple Reader Single Writer), so on a
multi-core machine you might get extra speed from the brute force query
approach by spreading the queries across a small number of threads.

The RETE engine works by keeping tables of partially matched triple
patterns so that each new triple is matched against the rules
incrementally. Which does seem related to what you want.

The problem is that JenaRules is not SPARQL - there's no equivalents of
SPARQL constructs like UNION, ORDER BY, DISTINCT etc and the set of
built in predicates for filtering is different. Furthermore all you can
do as a result of a rule matching is assert a set of triples as a result
(or call some java code like Print) - you don't have access to a stream
of binding results in the way you do with SPARQL.

However, if your queries are primarily just basic graph patterns and if
the results from your queries can be expressed as new triples then you
could indeed use the RETE engine. 

Whether it will gain you any benefit depends on the specifics. If there
is a lot of shared patterns between your rules then it might. If not
the overheads of the rule machinery may outweigh the gain from reuse of
partial matches.

I would suggest you try a small experiment first to measure the
cost/gains before committing to it.

> Ultimately I would like to track the processing of each new triple from the 
> dataset, in case it matches a query.

That is the way the RETE engine works. So long as you are only adding
and not removing triples (and so long as you don't have any nasty
non-monotonic operators in your rules) then each triple added to the
model is filtered through the RETE network to see if it triggers more
rules.

> Any proposals on good documentation?

The primary documentation for the rules engine is:

http://incubator.apache.org/jena/documentation/inference/index.html#rules

Dave

Re: Parallel SPARQL queries with ARQ?

Reply via email to