Ok, np, its not serious, Postgres is the important one for me anyhow and it is behaving. I'll investigate how to tell Postgres to cancel the query. Just stopping the traversal is not quite good enough as every now and again we have queries on Postgres that persist even if the java thread dies. Thanks, Pieter
On 21/07/2016 22:16, Stephen Mallette wrote: >> For every traversal that starts it notifies the caller via the reference > object about the traversal. > > that's the tricky bit. you'd have to have some global tracking of spawned > traversals to know that and it would have to be bound to the Thread that > started it I guess. That information isn't going to be available out of a > standard JSR-223 ScriptEngine.eval() call. We are making some changes to > ScriptEngine where we extend upon it for purposes of Gremlin. Maybe there's > opportunity in those changes to make a change like this somewhere in that > work (though how that would happen is still murky to me). > > If we need changes to the ScriptEngine to even think about doing this, it > may be a bit of a way off before we can make much progress here. I don't > expect to see all the ScriptEngine work I had in mind done until 3.3.x as > it must include some breaking changes to some public APIs to happen. > > > > > > On Thu, Jul 21, 2016 at 4:01 PM, pieter-gmail <pieter.mar...@gmail.com> > wrote: > >> Well no, the problem is Thread.interrupted() is not reliable. Does not >> really matter who the caller is, GremlinServer or other. >> Just about every 3rd party library I can see might reset the flag >> meaning that the check will randomly return false or true. Something as >> trivial as a logger might even reset the flag. It seems to me interrupt >> is more for code that actually calls wait/join/sleep and they handle the >> any subsequent InterruptException as they please. >> >> All I can think of for GremlinServer is a way more complex multi >> threaded solution. >> The ScriptEngine.eval passes in a reference object and returns >> immediately. For every traversal that starts it notifies the caller via >> the reference object about the traversal. The caller then uses that >> traversal to interrupt it. Plus some more logic to know when the script >> is done. >> >> Ok had another idea but kinda want to try it first as it might be >> nonsense. Basically keep retrying the Thread.interrupt() till the thread >> via exceptions bubbles to the top of the stack and gets handled >> appropriately. >> >> On 21/07/2016 18:47, Stephen Mallette wrote: >>> thanks for all that pieter. the primary reason for traversal interruption >>> in the first place was so that gremlin server would have a chance to kill >>> traversals that were running too long. Without a solution to that >> problem, >>> I'm not sure what to do here. just tossing ideas around - could we still >>> check for thread interruption as an additional way to interrupt a >>> Traversal. maybe instead of: >>> >>> if (Thread.interrupted()) throw new TraversalInterruptedException(); >>> >>> we need: >>> >>> if (Thread.interrupted()) this.traversal.interrupt() >>> >>> that would then trigger whatever interrupt logic the traversal had? >>> >>> If we need to do a better job with AbstractStep, please create a JIRA >>> (and/or submit a PR) so we don't forget to make some improvements there. >>> >>> On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail <pieter.mar...@gmail.com> >>> wrote: >>> >>>> I just did a global Intellij search in the Sqlg project. >>>> >>>> HSQLDB has 13 catch (InterruptedException e) clauses. All of them >>>> swallows the exception and none resets the interrupt flag. >>>> >>>> Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2 >>>> swallows the exception without resetting the interrupt flag and one >>>> throws an exception. >>>> >>>> The rest, >>>> >>>> logback, 7 catch (InterruptedException e) 1 resets the flag while the >>>> rest swallow the exception without resetting the interrupt flag >>>> >>>> google guava about 25 catch (InterruptedException e) all resets the >>>> interrupt flag >>>> >>>> hazelcast 85 catch (InterruptedException e) too many to count but some >>>> resets the interrupt flag and some don't >>>> >>>> mchange c3po pool 7 catch (InterruptedException e), 4 throws exception >>>> without resetting the interrupt flag and 3 swallow the exception without >>>> resetting the interrupt flag. >>>> >>>> mchange common 8 catch (InterruptedException e), 2 throws an exception >>>> without resetting the interrult flag and 6 complete swallow without >>>> resetting. >>>> >>>> commons-io 8 catch (InterruptedException e) 1 reset of the interrupt >>>> flag, 7 swallow the exception without resetting the interrupt flag >>>> >>>> jline 3 catch (InterruptedException e) all swallow the exception without >>>> resetting the flag. >>>> >>>> >>>> All and all I don't think using interrupt will be a reliable strategy to >>>> use. >>>> >>>> >> http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag >>>> says that it is good practise to always reset the flag. It might be good >>>> but it is not common. >>>> From the above rather quick search only google guava respected that good >>>> practice. >>>> >>>> AbstractStep code >>>> if (Thread.interrupted()) throw new TraversalInterruptedException(); >>>> >>>> will also reset the interrupt flag potentially making someone else's >>>> Thread.interrupted() check fail. >>>> >>>> >>>> All that said I do not have a solution for GremlinServer not having >>>> access to the traversal. >>>> >>>> Thanks >>>> Pieter >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 21/07/2016 17:09, Stephen Mallette wrote: >>>>> I don't recall all the issues with doing traversal interruption with a >>>>> flag. I suppose it could work in the same way that thread interruption >>>>> works now. I will say that I'm hesitant to say that we should change >> this >>>>> on the basis of this being a problem general to databases as we've only >>>>> seen in so far in HSQLDB. If it was shown to be a problem in other >> graphs >>>>> i'd be more amplified to see a change. Not sure if any other graph >>>>> providers out there can attest to a problem with the thread >> interruption >>>>> approach but it would be nice to hear so if there did. >>>>> >>>>> Of course, I think you alluded to the bigger problem, which is that >>>> Gremlin >>>>> Server uses thread interruption to kill script executions and >> iterations >>>>> that exceed timeouts. So, the problem there is that, if someone >> submits a >>>>> script like this: >>>>> >>>>> t = g.V() >>>>> x = t.toList() >>>>> >>>>> that script gets pushed into a ScriptEngine.eval() method. That method >>>>> blocks until it is complete. Under that situation, Gremlin Server >> doesn't >>>>> have access to the Traversal to call interrupt on it. "t" is iterating >>>> via >>>>> toList() and there is no way to stop it. Not sure what we could do >> about >>>>> situations like that. >>>>> >>>>> On Wed, Jul 20, 2016 at 4:00 PM, pieter-gmail <pieter.mar...@gmail.com >>>>> wrote: >>>>> >>>>>> The current interrupt implementation is failing on Sqlg's HSQLDB >>>>>> implementation. >>>>>> The reason for this is that HSQLDB itself relies on Thread.interrupt() >>>>>> for its own internal logic. When TinkerPop interrupts the thread it >>>>>> thinks it has to do with its own logic and as a result the interrupt >>>>>> flag is reset and no exception is thrown. >>>>>> >>>>>> Reading the Thread.interrupt javadocs it says that wait(), join() and >>>>>> sleep() will all reset the interrupt flag throw an >> InterruptedException. >>>>>> This makes TinkerPop's reliance on the flag being set somewhat >> fragile. >>>>>> All of those methods I suspect are common with database io code and >>>>>> TinkerPop being a high level database layer makes it susceptible to >> 3rd >>>>>> party interpretations of interrupt semantics. >>>>>> >>>>>> In some ways the TraversalInterruptionTest itself had to carefully >> reset >>>>>> the flag with its usage of Thread.sleep(). >>>>>> >>>>>> My proposal is to mark the traversal itself as interrupted rather than >>>>>> the thread and keep the logic contained to TinkerPop's space. >>>>>> >>>>>> Another benefit is that the traversal.interrupt() can raise an event >>>>>> that implementations can listen to. On receipt of the event they would >>>>>> then be able to send a separate request to the database to cancel a >>>>>> particular query. In my case would be a nice way for Sqlg to tell >>>>>> Postgresql or HSQLDB to cancel a particular query (the latest one the >>>>>> traversal executed). >>>>>> >>>>>> In many ways the semantics are the same. Currently for client code >>>>>> wanting to interrupt a particular traversal it needs to have a >> reference >>>>>> to the thread the traversal is executing in. Now instead it needs to >>>>>> keep a reference to executing traversals and interrupt them directly. >>>>>> >>>>>> Add Traversal.interrupt() and Traversal.isInterrupted(boolean >>>>>> ClearInterrupted) >>>>>> >>>>>> Caveat, I am not familiar with GremlinServer nor the complications >>>>>> around interrupt there so perhaps I am missing something. >>>>>> >>>>>> Thanks >>>>>> Pieter >>>>>> >>