Re: Parallel SPARQL queries with ARQ?

Rinne Mikko Fri, 28 Oct 2011 03:19:47 -0700

Hi Paolo!

Many thanks for the quick reply! Sorry if I was not clear in the first mail - 
it is always a compromise between compact and comprehensive. I'll try to 
elaborate a bit more.


First, on the use case: I'm ultimately aiming at an event processor (a.k.a 
subscription matcher), where subscriptions are defined as SPARQL queries and 
events are input as N3-serialized RDF streams. I'm trying to evaluate the 
suitability of Jena/ARQ for this kind of task. Processing one SPARQL-query at a 
time on an N3-file works well, but next I should understand how to move beyond 
that and whether this ultimate goal is reachable (and how much new code would 
it take).

This is what I'm doing now:

1) Read the triples into a model (only once :-), although I should prepare for 
the case where the model is a stream instead of a small file)

2) Loop through the set of SPARQL-filters (a.k.a subscriptions), pre-creating 
QueryExecutionFactories into an array.

3) Run the execSelects in another loop (which I'm timing to observe how long it 
takes to execute the queries in sequence)

----------

The next step I'd like to do, is to "load" all the queries first, and execute 
my model against the whole set of queries at once.

Ultimately, I'd like to load all the queries, stream in N3 triple by triple and 
observe the performance (notification speed with a large number of 
subscriptions) in that scenario.

> Mikko, maybe you can "payback" with a small example on how you could
> use SPARQL CONSTRUCT queries to do small "inference" over RDF data.

I'm afraid that the simple SPARQL-queries I'm using for testing at the moment 
would not help at all. :-( Right now I'm just using some simple filtering to 
give each imaginary "subscriber" their slice of the test data. Once we proceed 
to test with something more elaborate, I'll let you know. :-)

Thank you for the links - didn't check all yet, but I will!

Best Wishes,

Mikko


On 28. Oct 2011, at 11:41 AM, Paolo Castagna wrote:

> Hi Mikko,
> I am not giving you answers to all of your questions, just a few.
> 
> Rinne Mikko wrote:
>> Hi!
>> New to Jena and the list, please bear with me if this has been explained 
>> over and over again. I had so far no luck with the documentation, mailing 
>> list archives or googling, so here we go:
>> Can ARQ be used to execute multiple parallel SPARQL queries?
> 
> I am not sure I follow what you are doing exactly.
> But, are you reading a file with your data each time you execute a query?
> 
> A better approach would be to use something like TDB to load (and index)
> your data once and then run your queries against the data stored in TDB
> (using ARQ as you are doing now). See: http://openjena.org/wiki/TDB
> 
> Another option for you might be to use a SPARQL endpoint (and issue queries
> in parallel via HTTP clients), see: http://openjena.org/wiki/Fuseki
> 
>> I would like to configure e.g. 100 or 1000 queries and then run them against 
>> a single file of triples. I wrote a piece of code to run the queries in 
>> sequence and got suprisingly good performance with brute force, but I would 
>> expect going through the dataset only once to perform much better.
> 
> This is what made me think that you are parsing and reading in your data
> from a file and doing this for each of your queries.
> 
> If so, bad.
> 
> Read the data once and keep it in memory if it's small.
> Then run all your queries against that Jena Model.
> 
>> If ARQ doesn't support this, is it the Jena forward-chaining RETE 
>> engine<http://jena.sourceforge.net/inference/> I should be looking at, and 
>> translate the SPARQL queries manually?
> 
> Now you got me confused.
> 
> Because I do not understand what you are trying to achieve, I don't know
> your use case.
> 
> (However, it's seems something interesting... it seems to me you are
> doing some "inference" via SPARQL and then you would like to keep data
> up-to-date as you add/remove RDF data to/from your system).
> 
>> Ultimately I would like to track the processing of each new triple from the 
>> dataset, in case it matches a query. Any proposals on good documentation? 
>> Which level should I try to interface on?
> 
> Interesting question... I'd love to have a clear and simple answer to
> your question and a good example to point you at. But, I don't.
> 
> A related (and recent) thread from jena-users:
> http://markmail.org/thread/l4ymug3ujoqifnty
> 
> You might find this example interesting and useful:
> https://github.com/castagna/Apache-Jena-Examples/blob/master/src/main/java/org/apache/jena/examples/ExampleTDB_04.java
> This is from GitHub, therefore it's not "official".
> Some people learn best from examples.
> 
> When I need to learn how to use a new software and its APIs I prefer
> Java code and many examples (with a few comments in it).
> 
> (If others want to contribute more small examples, go ahead: fork it
> and send pull requests! ;-))
> 
> Mikko, maybe you can "payback" with a small example on how you could
> use SPARQL CONSTRUCT queries to do small "inference" over RDF data.
> 
> Paolo
> 
>> Thanks!
>> Mikko
>

Re: Parallel SPARQL queries with ARQ?

Reply via email to