[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271403#comment-13271403
 ] 

Stephen Owens commented on JENA-244:
------------------------------------

Versions involved:
==============
jena-arq-2.9.0-incubating.jar
jena-core-2.7.0-incubating.jar


We discovered a Jena deadlock during some long running stability tests. Jena is 
being used in a service accessed by a web application so there were multiple 
web threads accessing the same underlying model. The model was protected by the 
standard enterCriticalSection calls, set for read because we were just 
querying. This worked perfectly until we switched to an RDFS model. The RDFS 
model exhibited occasional deadlocks. 

The deadlock appears to be an issue with out of order lock acquisition. The 
first thread acquired a lock on LPTopGoalIterator and then tried to get a lock 
LPBRuleEngine, the second thread had a lock on LPBRuleEngine and tried to get a 
lock on LPTopGoalIterator with the expected result that nobody is going 
anywhere from that point forward. I traced through the relevant Jena code and 
the issue seems to be that one of the two queries is the first query against 
that model and so is triggering the inferencing model to add rules to the 
model. In that first query the following sequence happens:

- Because it is the first query against an inference model the 
FBRuleInfGraph.prepare method triggers the addition of inferences to the model 
which turns the read only execution of the query into an update of the model. 
- The update uses the LPBRuleEngine.addRule method which is synchronized thus 
acquiring a lock on LPBRuleEngine. 
- Before trying to update it calls LPBRuleEngine.checkSafeToUpdate to see if 
there are any outstanding queries.
- Surprisingly, at least to me in reviewing the code, checkSafeToUpdate tries 
to close any statement iterators it finds open. I don't see how this could be a 
safe thing to do in a multi-threaded environment but I don't understand the 
code well enough to be sure. 
- That close operation calls LPTopGoalIterator.close which is a synchronized 
method and waits at that point until it can acquire a lock on LPTopGoalIterator.


Meanwhile a separate thread executing a query is doing this:

- Using LPTopGoalIterator.moveForward to move through its goals.
- Since this is a synchronized method it acquires a lock on LPTopGoalIterator
- That method synchronizes on its LPBRuleEngine and waits until it can acquire 
the lock. 


And at this point we're in a deadlock. 

In terms of a work around I can probably call prepare before querying the model 
the first time. That will likely work as long as I'm not planning to write to 
the model after the initial prepare since any write would invalidate the 
preparation and leave it prone to deadlock again. In my particular case the 
information is likely static so that might work. 

The longer term fix will require a change to the lock acquisition strategy. the 
LPTopGoalIterator.hasNext method could synchronize on the engine before calling 
moveForward, moveForward could become not synchronized and internally 
synchronize on LPBRuleEngine first and only after it acquires that synchronize 
on itself. Alternately the checkSafeToUpdate could stop trying to close 
external iterators and either throw an exception or wait and retry if it finds 
any open iterators. In the case I'm seeing I doubt the retry strategy would be 
viable since neither thread yet have a valid model. 

I suspect that the right solution may be much further up the stack. Maybe the 
model preparation should be done higher before it gets so deep into the SPARQL 
execution phase. That might allow for a cleaner locking strategy that doesn't 
allow multiple SPARQL evaluations to start until the model is stable. Maybe all 
the way up at com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct()? 
At this point I'm just guessing because I don't know that section of the code 
well enough. I'm not submitting a patch because I'd like feedback from someone 
that knows this code well on their suggested approach. 


Here's the trace of a deadlock:

        at 
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.moveForward(LPTopGoalIterator.java:83)
        - waiting to lock <0x0000000779070940> (a 
com.hp.hpl.jena.reasoner.rulesys.impl.LPBRuleEngine)
        - locked <0x0000000776153ff0> (a 
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator)
        at 
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.hasNext(LPTopGoalIterator.java:196)
        at 
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:76)
        at 
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:76)
        at 
com.hp.hpl.jena.util.iterator.UniqueExtendedIterator.hasNext(UniqueExtendedIterator.java:78)
        at 
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:76)
        at 
com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:55)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:151)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:79)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:64)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:60)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct(QueryExecutionBase.java:254)
        at 
com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct(QueryExecutionBase.java:223)


"http-8080-9":
        at 
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.close(LPTopGoalIterator.java:169)
        - waiting to lock <0x0000000776153ff0> (a 
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator)
        at 
com.hp.hpl.jena.reasoner.rulesys.impl.LPBRuleEngine.checkSafeToUpdate(LPBRuleEngine.java:235)
        at 
com.hp.hpl.jena.reasoner.rulesys.impl.LPBRuleEngine.addRule(LPBRuleEngine.java:129)
        - locked <0x0000000779070940> (a 
com.hp.hpl.jena.reasoner.rulesys.impl.LPBRuleEngine)
        at 
com.hp.hpl.jena.reasoner.rulesys.FBRuleInfGraph.addBRules(FBRuleInfGraph.java:290)
        at 
com.hp.hpl.jena.reasoner.rulesys.FBRuleInfGraph.preloadDeductions(FBRuleInfGraph.java:903)
        at 
com.hp.hpl.jena.reasoner.rulesys.FBRuleInfGraph.prepare(FBRuleInfGraph.java:483)
        - locked <0x0000000778e81270> (a 
com.hp.hpl.jena.reasoner.rulesys.RDFSRuleInfGraph)
        at 
com.hp.hpl.jena.reasoner.rulesys.FBRuleInfGraph.findWithContinuation(FBRuleInfGraph.java:574)
        at 
com.hp.hpl.jena.reasoner.rulesys.FBRuleInfGraph.graphBaseFind(FBRuleInfGraph.java:606)
        at com.hp.hpl.jena.graph.impl.GraphBase.find(GraphBase.java:285)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.<init>(QueryIterTriplePattern.java:80)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterTriplePattern.nextStage(QueryIterTriplePattern.java:53)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.makeNextStage(QueryIterRepeatApply.java:113)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:65)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.makeNextStage(QueryIterRepeatApply.java:106)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:65)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:64)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.makeNextStage(QueryIterRepeatApply.java:106)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:65)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.makeNextStage(QueryIterRepeatApply.java:106)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:65)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
        at 
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
        at 
com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct(QueryExecutionBase.java:254)
        at 
com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct(QueryExecutionBase.java:223)

                
> Deadlock during SPARQL execution on an inference model
> ------------------------------------------------------
>
>                 Key: JENA-244
>                 URL: https://issues.apache.org/jira/browse/JENA-244
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Jena
>            Reporter: Stephen Owens
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to