[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-12-07 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526363#comment-13526363
 ] 

Andy Seaborne commented on JENA-244:


Good to hear.

I didn't get failures but my experience of other similar incidents is that 
machine, OS, and java version can all play into the probability a test failing. 
Usually once it is seen to fails, it often/always fails, but it will run clean 
on another system on the same code.  Hence, they sometimes only get solved by 
code analysis of possible causes.

With that positive confirmation, I'll close this JIRA.  Thanks.

> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
> Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-12-07 Thread Stephen Owens (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526344#comment-13526344
 ] 

Stephen Owens commented on JENA-244:


I re-ran my test with 2.7.5-SNAPSHOT and it ran clean. When I fall back to 
2.7.4 I can get it to fail reliably. I tried increasing the thread count and 
the number of threads to add more stress and was not able to get it to fail. I 
also did a quick review of your change and it looks good, it directly addresses 
the issue I was seeing. Thanks!

I'm surprised you weren't getting the failure before your patch. Were you able 
to see the failure against 2.7.4? 

> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
> Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-12-07 Thread Stephen Owens (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526343#comment-13526343
 ] 

Stephen Owens commented on JENA-244:


Definitely, I'll give it a try and let you know the results. 

> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
> Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-12-07 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526274#comment-13526274
 ] 

Andy Seaborne commented on JENA-244:


Stephen - I recently (last weekend) came across this situation in the Jena Test 
Suite itself.  I believe I have a fix and would be grateful if you could test 
it.

There is a separate branch [1] of jena-core where I have been stripping out 
several unused features.  While these features don't directly affect the 
reasoner system, they do seem to have caused a timing change and a concurrency 
test started failing with a deadlock very like the one you have reported here.

There is a fix in the branch.  For I now, I have retrofitted the fit to 
jena-core trunk.

Would you be able to try the development build? [2]  Either copy the jar there, 
or the latest apache-jena build, or a dependency on jena-core-2.7.5-SNAPSHOT.

I ran your test example but wasn't getting failures with jena-core trunk before 
retrofitting the possible fix.  However, precise timing does affect the 
situation so different machines may act differently.

I plan to propose to the team that this branch becomes jena-core trunk but 
integration is not a simple replacement because it changes some internal APIs 
(in fairly trivial ways) so ARQ, TDB and SDB need updates at the same time.

[1] https://svn.apache.org/repos/asf/jena/branches/jena-core-simplified/

[2] 
https://repository.apache.org/content/groups/snapshots/org/apache/jena/jena-core/2.7.5-SNAPSHOT/
at least increment 46


> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
> Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-12-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526264#comment-13526264
 ] 

Hudson commented on JENA-244:
-

Integrated in Jena__Development_Test #319 (See 
[https://builds.apache.org/job/Jena__Development_Test/319/])
Fix for JENA-244 integrated into trunk for testing. (Revision 1418235)

 Result = SUCCESS
andy : 
Files : 
* 
/jena/trunk/jena-core/src/main/java/com/hp/hpl/jena/reasoner/rulesys/impl/LPTopGoalIterator.java


> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
> Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-12-06 Thread Stephen Owens (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13525992#comment-13525992
 ] 

Stephen Owens commented on JENA-244:


Found a way that the work around is not safe. I had ended up doing a 
pre-emptive prepare and all was well for a while, as long as you don't need to 
trigger a new dynamic prepare in a multi-threaded use case. It turns out 
however that removing anything from the model triggers the need for the prepare 
and my 'static' model wasn't perfectly static. Back to occasional deadlocks. 

I've tried this in 2.7.4 and it is still a current issue, reviewing the code 
further and I'll post back if I find something. 

> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
> Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-09-12 Thread Dave Reynolds (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453997#comment-13453997
 ] 

Dave Reynolds commented on JENA-244:


Stephen - yes, no work has been done on this bug. Too much "day job" in the way.

> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
> Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-09-12 Thread Simon Helsen (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453983#comment-13453983
 ] 

Simon Helsen commented on JENA-244:
---

Have you tested this with a snapshot 2.7.4 build?

> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
> Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-09-11 Thread Stephen Owens (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453644#comment-13453644
 ] 

Stephen Owens commented on JENA-244:


Just retested this with 2.9.3, the deadlock is still reproducible with the test 
case. 

> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
> Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-05-09 Thread Stephen Owens (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272049#comment-13272049
 ] 

Stephen Owens commented on JENA-244:


Thanks for the confirmation. 

I'm working on a reproducible case and I think I have it failing reliably. I'll 
polish that up a bit and should be able to submit tomorrow.

> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-05-09 Thread Dave Reynolds (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271778#comment-13271778
 ] 

Dave Reynolds commented on JENA-244:


Yes, once the forward chaining is done, and so long as you are not changing the 
underlying data, the models might as well then be static.

You shouldn't *need* to call the explicit pre-prepare (but it's a long time 
since I looked at that code so there may be a bug there but it will at least be 
a different bug)
The locking for all that is simpler than for backward chaining (doesn't mean 
there isn't a bug!). If it isn't then the pre-prepare should definitely work 
whereas with the hybrid/backward chaining it may not.

If you can get a minimal reproducible test case that you could share that would 
be fantastic!

Dave


> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-05-09 Thread Stephen Owens (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271599#comment-13271599
 ] 

Stephen Owens commented on JENA-244:


Dave,

Good point about API calls triggering the same behaviour. You're right that 
there shouldn't be special action from ARQ, the graph should be able to protect 
itself. 

Once I've tried the work around I'll let you know. I'm just trying to create a 
reproducible test case that shows the behaviour so I can verify. 

Thanks for the pointer on the forward chaining, I don't see why that wouldn't 
work in terms of the rules I need. I think that this is the relevant statement 
that makes this a preferable alternative for consistency in execution?

"Once the preparation phase is complete the inference graph will act as if it 
were the union of all the statements in the original model together with all 
the statements in the internal deductions graph generated by the rule firings. 
All queries will see all of these statements and will be of similar speed to 
normal model accesses. It is possible to separately access the original raw 
data and the set of deduced statements if required, see above."


Given that it is the prepare call that is responsible for both forward and 
backward chaining additions (I think) then would I still need to ensure that I 
call prepare before any queries are submitted? Or because forward chaining adds 
rules to the internal deductions graph is the lock strategy different enough 
that this issue wouldn't be a problem?

> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-05-09 Thread Dave Reynolds (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271560#comment-13271560
 ] 

Dave Reynolds commented on JENA-244:


If the pre-emptive prepare() work around does not work then another possible 
workaround might be to switch to a pure forward-chaining RDFS config e.g. 
http://incubator.apache.org/jena/documentation/inference/#RDFSPlusRules 

> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-05-09 Thread Dave Reynolds (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271551#comment-13271551
 ] 

Dave Reynolds commented on JENA-244:


Thanks for the clear and detailed analysis.

I agree that the lock acquisition strategy probably needs to move up, at least 
to hasNext and possibly up to prepare itself. I also agree that the iterator 
closing needs to reviewed, doesn't sound right.

I'm less convinced about moving prepare calls up the query level. Any deadlock 
you can provoke via ARQ you could also provoke via straight API calls so it 
would still need fixing at the engine level anyway. It seems preferable if ARQ 
doesn't need to take any special actions when querying an InfGraph. 

This is probably one I should take on but I can't do that in the immediate 
future. Is your work around of pre-emptively called prepare() sufficient for 
your application?

Dave


> Deadlock during SPARQL execution on an inference model
> --
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
>Reporter: Stephen Owens
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model

2012-05-09 Thread Stephen Owens (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271403#comment-13271403
 ] 

Stephen Owens commented on JENA-244:


Versions involved:
==
jena-arq-2.9.0-incubating.jar
jena-core-2.7.0-incubating.jar


We discovered a Jena deadlock during some long running stability tests. Jena is 
being used in a service accessed by a web application so there were multiple 
web threads accessing the same underlying model. The model was protected by the 
standard enterCriticalSection calls, set for read because we were just 
querying. This worked perfectly until we switched to an RDFS model. The RDFS 
model exhibited occasional deadlocks. 

The deadlock appears to be an issue with out of order lock acquisition. The 
first thread acquired a lock on LPTopGoalIterator and then tried to get a lock 
LPBRuleEngine, the second thread had a lock on LPBRuleEngine and tried to get a 
lock on LPTopGoalIterator with the expected result that nobody is going 
anywhere from that point forward. I traced through the relevant Jena code and 
the issue seems to be that one of the two queries is the first query against 
that model and so is triggering the inferencing model to add rules to the 
model. In that first query the following sequence happens:

- Because it is the first query against an inference model the 
FBRuleInfGraph.prepare method triggers the addition of inferences to the model 
which turns the read only execution of the query into an update of the model. 
- The update uses the LPBRuleEngine.addRule method which is synchronized thus 
acquiring a lock on LPBRuleEngine. 
- Before trying to update it calls LPBRuleEngine.checkSafeToUpdate to see if 
there are any outstanding queries.
- Surprisingly, at least to me in reviewing the code, checkSafeToUpdate tries 
to close any statement iterators it finds open. I don't see how this could be a 
safe thing to do in a multi-threaded environment but I don't understand the 
code well enough to be sure. 
- That close operation calls LPTopGoalIterator.close which is a synchronized 
method and waits at that point until it can acquire a lock on LPTopGoalIterator.


Meanwhile a separate thread executing a query is doing this:

- Using LPTopGoalIterator.moveForward to move through its goals.
- Since this is a synchronized method it acquires a lock on LPTopGoalIterator
- That method synchronizes on its LPBRuleEngine and waits until it can acquire 
the lock. 


And at this point we're in a deadlock. 

In terms of a work around I can probably call prepare before querying the model 
the first time. That will likely work as long as I'm not planning to write to 
the model after the initial prepare since any write would invalidate the 
preparation and leave it prone to deadlock again. In my particular case the 
information is likely static so that might work. 

The longer term fix will require a change to the lock acquisition strategy. the 
LPTopGoalIterator.hasNext method could synchronize on the engine before calling 
moveForward, moveForward could become not synchronized and internally 
synchronize on LPBRuleEngine first and only after it acquires that synchronize 
on itself. Alternately the checkSafeToUpdate could stop trying to close 
external iterators and either throw an exception or wait and retry if it finds 
any open iterators. In the case I'm seeing I doubt the retry strategy would be 
viable since neither thread yet have a valid model. 

I suspect that the right solution may be much further up the stack. Maybe the 
model preparation should be done higher before it gets so deep into the SPARQL 
execution phase. That might allow for a cleaner locking strategy that doesn't 
allow multiple SPARQL evaluations to start until the model is stable. Maybe all 
the way up at com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct()? 
At this point I'm just guessing because I don't know that section of the code 
well enough. I'm not submitting a patch because I'd like feedback from someone 
that knows this code well on their suggested approach. 


Here's the trace of a deadlock:

at 
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.moveForward(LPTopGoalIterator.java:83)
- waiting to lock <0x000779070940> (a 
com.hp.hpl.jena.reasoner.rulesys.impl.LPBRuleEngine)
- locked <0x000776153ff0> (a 
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator)
at 
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.hasNext(LPTopGoalIterator.java:196)
at 
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:76)
at 
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:76)
at 
com.hp.hpl.jena.util.iterator.UniqueExtendedIterator.hasNext(UniqueExtendedIterator.java:78)
at 
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:76)