[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526363#comment-13526363 ] Andy Seaborne commented on JENA-244: Good to hear. I didn't get failures but my experience of other similar incidents is that machine, OS, and java version can all play into the probability a test failing. Usually once it is seen to fails, it often/always fails, but it will run clean on another system on the same code. Hence, they sometimes only get solved by code analysis of possible causes. With that positive confirmation, I'll close this JIRA. Thanks. > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526344#comment-13526344 ] Stephen Owens commented on JENA-244: I re-ran my test with 2.7.5-SNAPSHOT and it ran clean. When I fall back to 2.7.4 I can get it to fail reliably. I tried increasing the thread count and the number of threads to add more stress and was not able to get it to fail. I also did a quick review of your change and it looks good, it directly addresses the issue I was seeing. Thanks! I'm surprised you weren't getting the failure before your patch. Were you able to see the failure against 2.7.4? > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526343#comment-13526343 ] Stephen Owens commented on JENA-244: Definitely, I'll give it a try and let you know the results. > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526274#comment-13526274 ] Andy Seaborne commented on JENA-244: Stephen - I recently (last weekend) came across this situation in the Jena Test Suite itself. I believe I have a fix and would be grateful if you could test it. There is a separate branch [1] of jena-core where I have been stripping out several unused features. While these features don't directly affect the reasoner system, they do seem to have caused a timing change and a concurrency test started failing with a deadlock very like the one you have reported here. There is a fix in the branch. For I now, I have retrofitted the fit to jena-core trunk. Would you be able to try the development build? [2] Either copy the jar there, or the latest apache-jena build, or a dependency on jena-core-2.7.5-SNAPSHOT. I ran your test example but wasn't getting failures with jena-core trunk before retrofitting the possible fix. However, precise timing does affect the situation so different machines may act differently. I plan to propose to the team that this branch becomes jena-core trunk but integration is not a simple replacement because it changes some internal APIs (in fairly trivial ways) so ARQ, TDB and SDB need updates at the same time. [1] https://svn.apache.org/repos/asf/jena/branches/jena-core-simplified/ [2] https://repository.apache.org/content/groups/snapshots/org/apache/jena/jena-core/2.7.5-SNAPSHOT/ at least increment 46 > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526264#comment-13526264 ] Hudson commented on JENA-244: - Integrated in Jena__Development_Test #319 (See [https://builds.apache.org/job/Jena__Development_Test/319/]) Fix for JENA-244 integrated into trunk for testing. (Revision 1418235) Result = SUCCESS andy : Files : * /jena/trunk/jena-core/src/main/java/com/hp/hpl/jena/reasoner/rulesys/impl/LPTopGoalIterator.java > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13525992#comment-13525992 ] Stephen Owens commented on JENA-244: Found a way that the work around is not safe. I had ended up doing a pre-emptive prepare and all was well for a while, as long as you don't need to trigger a new dynamic prepare in a multi-threaded use case. It turns out however that removing anything from the model triggers the need for the prepare and my 'static' model wasn't perfectly static. Back to occasional deadlocks. I've tried this in 2.7.4 and it is still a current issue, reviewing the code further and I'll post back if I find something. > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453997#comment-13453997 ] Dave Reynolds commented on JENA-244: Stephen - yes, no work has been done on this bug. Too much "day job" in the way. > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453983#comment-13453983 ] Simon Helsen commented on JENA-244: --- Have you tested this with a snapshot 2.7.4 build? > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453644#comment-13453644 ] Stephen Owens commented on JENA-244: Just retested this with 2.9.3, the deadlock is still reproducible with the test case. > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > Attachments: JenaDeadLockTest.java, JenaDeadLockTest.java > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272049#comment-13272049 ] Stephen Owens commented on JENA-244: Thanks for the confirmation. I'm working on a reproducible case and I think I have it failing reliably. I'll polish that up a bit and should be able to submit tomorrow. > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271778#comment-13271778 ] Dave Reynolds commented on JENA-244: Yes, once the forward chaining is done, and so long as you are not changing the underlying data, the models might as well then be static. You shouldn't *need* to call the explicit pre-prepare (but it's a long time since I looked at that code so there may be a bug there but it will at least be a different bug) The locking for all that is simpler than for backward chaining (doesn't mean there isn't a bug!). If it isn't then the pre-prepare should definitely work whereas with the hybrid/backward chaining it may not. If you can get a minimal reproducible test case that you could share that would be fantastic! Dave > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271599#comment-13271599 ] Stephen Owens commented on JENA-244: Dave, Good point about API calls triggering the same behaviour. You're right that there shouldn't be special action from ARQ, the graph should be able to protect itself. Once I've tried the work around I'll let you know. I'm just trying to create a reproducible test case that shows the behaviour so I can verify. Thanks for the pointer on the forward chaining, I don't see why that wouldn't work in terms of the rules I need. I think that this is the relevant statement that makes this a preferable alternative for consistency in execution? "Once the preparation phase is complete the inference graph will act as if it were the union of all the statements in the original model together with all the statements in the internal deductions graph generated by the rule firings. All queries will see all of these statements and will be of similar speed to normal model accesses. It is possible to separately access the original raw data and the set of deduced statements if required, see above." Given that it is the prepare call that is responsible for both forward and backward chaining additions (I think) then would I still need to ensure that I call prepare before any queries are submitted? Or because forward chaining adds rules to the internal deductions graph is the lock strategy different enough that this issue wouldn't be a problem? > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271560#comment-13271560 ] Dave Reynolds commented on JENA-244: If the pre-emptive prepare() work around does not work then another possible workaround might be to switch to a pure forward-chaining RDFS config e.g. http://incubator.apache.org/jena/documentation/inference/#RDFSPlusRules > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271551#comment-13271551 ] Dave Reynolds commented on JENA-244: Thanks for the clear and detailed analysis. I agree that the lock acquisition strategy probably needs to move up, at least to hasNext and possibly up to prepare itself. I also agree that the iterator closing needs to reviewed, doesn't sound right. I'm less convinced about moving prepare calls up the query level. Any deadlock you can provoke via ARQ you could also provoke via straight API calls so it would still need fixing at the engine level anyway. It seems preferable if ARQ doesn't need to take any special actions when querying an InfGraph. This is probably one I should take on but I can't do that in the immediate future. Is your work around of pre-emptively called prepare() sufficient for your application? Dave > Deadlock during SPARQL execution on an inference model > -- > > Key: JENA-244 > URL: https://issues.apache.org/jira/browse/JENA-244 > Project: Apache Jena > Issue Type: Bug > Components: Jena >Reporter: Stephen Owens > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-244) Deadlock during SPARQL execution on an inference model
[ https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271403#comment-13271403 ] Stephen Owens commented on JENA-244: Versions involved: == jena-arq-2.9.0-incubating.jar jena-core-2.7.0-incubating.jar We discovered a Jena deadlock during some long running stability tests. Jena is being used in a service accessed by a web application so there were multiple web threads accessing the same underlying model. The model was protected by the standard enterCriticalSection calls, set for read because we were just querying. This worked perfectly until we switched to an RDFS model. The RDFS model exhibited occasional deadlocks. The deadlock appears to be an issue with out of order lock acquisition. The first thread acquired a lock on LPTopGoalIterator and then tried to get a lock LPBRuleEngine, the second thread had a lock on LPBRuleEngine and tried to get a lock on LPTopGoalIterator with the expected result that nobody is going anywhere from that point forward. I traced through the relevant Jena code and the issue seems to be that one of the two queries is the first query against that model and so is triggering the inferencing model to add rules to the model. In that first query the following sequence happens: - Because it is the first query against an inference model the FBRuleInfGraph.prepare method triggers the addition of inferences to the model which turns the read only execution of the query into an update of the model. - The update uses the LPBRuleEngine.addRule method which is synchronized thus acquiring a lock on LPBRuleEngine. - Before trying to update it calls LPBRuleEngine.checkSafeToUpdate to see if there are any outstanding queries. - Surprisingly, at least to me in reviewing the code, checkSafeToUpdate tries to close any statement iterators it finds open. I don't see how this could be a safe thing to do in a multi-threaded environment but I don't understand the code well enough to be sure. - That close operation calls LPTopGoalIterator.close which is a synchronized method and waits at that point until it can acquire a lock on LPTopGoalIterator. Meanwhile a separate thread executing a query is doing this: - Using LPTopGoalIterator.moveForward to move through its goals. - Since this is a synchronized method it acquires a lock on LPTopGoalIterator - That method synchronizes on its LPBRuleEngine and waits until it can acquire the lock. And at this point we're in a deadlock. In terms of a work around I can probably call prepare before querying the model the first time. That will likely work as long as I'm not planning to write to the model after the initial prepare since any write would invalidate the preparation and leave it prone to deadlock again. In my particular case the information is likely static so that might work. The longer term fix will require a change to the lock acquisition strategy. the LPTopGoalIterator.hasNext method could synchronize on the engine before calling moveForward, moveForward could become not synchronized and internally synchronize on LPBRuleEngine first and only after it acquires that synchronize on itself. Alternately the checkSafeToUpdate could stop trying to close external iterators and either throw an exception or wait and retry if it finds any open iterators. In the case I'm seeing I doubt the retry strategy would be viable since neither thread yet have a valid model. I suspect that the right solution may be much further up the stack. Maybe the model preparation should be done higher before it gets so deep into the SPARQL execution phase. That might allow for a cleaner locking strategy that doesn't allow multiple SPARQL evaluations to start until the model is stable. Maybe all the way up at com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct()? At this point I'm just guessing because I don't know that section of the code well enough. I'm not submitting a patch because I'd like feedback from someone that knows this code well on their suggested approach. Here's the trace of a deadlock: at com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.moveForward(LPTopGoalIterator.java:83) - waiting to lock <0x000779070940> (a com.hp.hpl.jena.reasoner.rulesys.impl.LPBRuleEngine) - locked <0x000776153ff0> (a com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator) at com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.hasNext(LPTopGoalIterator.java:196) at com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:76) at com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:76) at com.hp.hpl.jena.util.iterator.UniqueExtendedIterator.hasNext(UniqueExtendedIterator.java:78) at com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:76)