Hi, On Fri, 2011-10-28 at 08:16 +0000, Rinne Mikko wrote: > Hi! > > New to Jena and the list, please bear with me if this has been explained over > and over again. I had so far no luck with the documentation, mailing list > archives or googling, so here we go: > > Can ARQ be used to execute multiple parallel SPARQL queries? > > I would like to configure e.g. 100 or 1000 queries and then run them against > a single file of triples. I wrote a piece of code to run the queries in > sequence and got suprisingly good performance with brute force, but I would > expect going through the dataset only once to perform much better. > > If ARQ doesn't support this, is it the Jena forward-chaining RETE > engine<http://jena.sourceforge.net/inference/> I should be looking at, and > translate the SPARQL queries manually?
Like Paolo I'm not quite sure what you are trying to do but based on that question let me take a guess ... It sounds like you have your data, maybe in a memory model, and want to run a *lot* of queries over that single data set. You suspect that instead of each query starting over again maybe you could stream the data once through some sort of query sieve to do all the queries at once in one pass. Is that about right? If so then there is no specific parallel-SPARQL-query support in Jena but as you say it might be possible to use the RETE engine depending on the specifics of what you are doing. As an aside, note it is possible issue SPARQL queries in parallel (most of the Jena stores are Multiple Reader Single Writer), so on a multi-core machine you might get extra speed from the brute force query approach by spreading the queries across a small number of threads. The RETE engine works by keeping tables of partially matched triple patterns so that each new triple is matched against the rules incrementally. Which does seem related to what you want. The problem is that JenaRules is not SPARQL - there's no equivalents of SPARQL constructs like UNION, ORDER BY, DISTINCT etc and the set of built in predicates for filtering is different. Furthermore all you can do as a result of a rule matching is assert a set of triples as a result (or call some java code like Print) - you don't have access to a stream of binding results in the way you do with SPARQL. However, if your queries are primarily just basic graph patterns and if the results from your queries can be expressed as new triples then you could indeed use the RETE engine. Whether it will gain you any benefit depends on the specifics. If there is a lot of shared patterns between your rules then it might. If not the overheads of the rule machinery may outweigh the gain from reuse of partial matches. I would suggest you try a small experiment first to measure the cost/gains before committing to it. > Ultimately I would like to track the processing of each new triple from the > dataset, in case it matches a query. That is the way the RETE engine works. So long as you are only adding and not removing triples (and so long as you don't have any nasty non-monotonic operators in your rules) then each triple added to the model is filtered through the RETE network to see if it triggers more rules. > Any proposals on good documentation? The primary documentation for the rules engine is: http://incubator.apache.org/jena/documentation/inference/index.html#rules Dave
