Re: Aggregators and concurrent use of Query object

Holger Knublauch Tue, 27 Sep 2011 01:58:20 -0700

FYI we have locally applied the patch mentioned below and this has fixed the 
problem.


Many thanks
Holger


On Sep 21, 2011, at 1:54 AM, Stephen Allen wrote:

> Hi Holger,
> 
> I believe you are correct that Query objects with aggregators cannot be
> reused by different threads.  They *can* be reused by the same thread or by
> different threads that synchronize the compile step, but even then there is
> a problem with the Query object hanging onto references to a new aggregator
> for each query execution.
> 
> The thing causing this appears to be in AlgebraGenerator.java line 562,
> where the aggregators added to a Query object are referenced directly by the
> compiled query plan.  Instead, we should make a copy of the aggregators so
> that the original Query object remains immutable.
> 
> I've created a JIRA issue and submitted a patch, JENA-120:
> https://issues.apache.org/jira/browse/JENA-120
> 
> As a work-around until the patch is applied, I think you can synchronize
> around the QueryExecutionFactory.create() method.  Or, you can decide not to
> cache Group By queries (test for this with Query.hasGroupBy()).
> 
> I don't know if there are other issues that may prevent reusing Query
> objects, maybe Andy can chime in here.
> 
> -Stephen
> 
> P.S.  Your strategy of caching Query objects does avoid having to reparse
> the query string, which can be quite beneficial.  Along these same lines, a
> better enhancement to ARQ would be a mechanism to cache the query plans
> after the optimizer step.  Query optimization itself can get quite expensive
> (n! for left-deep trees, and even worse for bushy trees).
> 
> 
> 
>> -----Original Message-----
>> From: Holger Knublauch [mailto:[email protected]]
>> Sent: Tuesday, September 20, 2011 1:14 AM
>> To: [email protected]
>> Subject: Aggregators and concurrent use of Query object
>> 
>> Hi Andy,
>> 
>> we have (unreliably) run into exceptions like the one below, and my
>> suspicion is that the ARQ Query class is not meant to be re-used by
>> multiple threads. Although each step in the Query is converted into a
>> corresponding Algebra objects for execution, the Aggregators seem to be
>> shared between multiple objects. Is this correct and do I need to
>> create a new Query each time I want a QueryExecution? This would slow
>> down things quite a lot, as we currently cache all Queries that were
>> created from string representation. If this is the case, are there any
>> ways to tell which particular queries are not thread-safe, e.g. all
>> queries involving aggregations?
>> 
>> If I am totally off the mark, do you know what else could cause the
>> exception below, only sometimes in multi-threading conditions?
>> 
>> Thank you,
>> Holger
>> 
>> 
>> com.hp.hpl.jena.sparql.ARQInternalErrorException: Null for accumulator
>>      at
>> com.hp.hpl.jena.sparql.expr.aggregate.AggregatorBase.getValue(Aggregato
>> rBase.java:61)
>>      at
>> com.hp.hpl.jena.sparql.engine.iterator.QueryIterGroup.calc(QueryIterGro
>> up.java:121)
>>      at
>> com.hp.hpl.jena.sparql.engine.iterator.QueryIterGroup.<init>(QueryIterG
>> roup.java:32)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:4
>> 13)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.ExecutionDispatch.visit(ExecutionDis
>> patch.java:255)
>>      at
>> com.hp.hpl.jena.sparql.algebra.op.OpGroup.visit(OpGroup.java:37)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.ExecutionDispatch.exec(ExecutionDisp
>> atch.java:33)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.OpExecutor.executeOp(OpExecutor.java
>> :107)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:4
>> 41)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.ExecutionDispatch.visit(ExecutionDis
>> patch.java:241)
>>      at
>> com.hp.hpl.jena.sparql.algebra.op.OpExtend.visit(OpExtend.java:107)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.ExecutionDispatch.exec(ExecutionDisp
>> atch.java:33)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.OpExecutor.executeOp(OpExecutor.java
>> :107)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:3
>> 93)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.ExecutionDispatch.visit(ExecutionDis
>> patch.java:213)
>>      at
>> com.hp.hpl.jena.sparql.algebra.op.OpProject.visit(OpProject.java:34)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.ExecutionDispatch.exec(ExecutionDisp
>> atch.java:33)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.OpExecutor.executeOp(OpExecutor.java
>> :107)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:8
>> 0)
>>      at com.hp.hpl.jena.sparql.engine.main.QC.execute(QC.java:40)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.QueryEngineMain.eval(QueryEngineMain
>> .java:52)
>>      at
>> com.hp.hpl.jena.sparql.engine.QueryEngineBase.evaluate(QueryEngineBase.
>> java:138)
>>      at
>> com.hp.hpl.jena.sparql.engine.QueryEngineBase.createPlan(QueryEngineBas
>> e.java:109)
>>      at
>> com.hp.hpl.jena.sparql.engine.QueryEngineBase.getPlan(QueryEngineBase.j
>> ava:97)
>>      at
>> com.hp.hpl.jena.sparql.engine.main.QueryEngineMain$1.create(QueryEngine
>> Main.java:91)
>>      at
>> com.hp.hpl.jena.sparql.engine.QueryExecutionBase.getPlan(QueryExecution
>> Base.java:266)
>>      at
>> com.hp.hpl.jena.sparql.engine.QueryExecutionBase.startQueryIterator(Que
>> ryExecutionBase.java:243)
>>      at
>> com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execResultSet(QueryExe
>> cutionBase.java:248)
>>      at
>> com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execSelect(QueryExecut
>> ionBase.java:94)
>>      at
>> org.topbraid.spin.arq.SPINARQFunction.executeBody(SPINARQFunction.java:
>> 121)
> 
>

Re: Aggregators and concurrent use of Query object

Reply via email to