thanks for all that pieter. the primary reason for traversal interruption in the first place was so that gremlin server would have a chance to kill traversals that were running too long. Without a solution to that problem, I'm not sure what to do here. just tossing ideas around - could we still check for thread interruption as an additional way to interrupt a Traversal. maybe instead of:
if (Thread.interrupted()) throw new TraversalInterruptedException(); we need: if (Thread.interrupted()) this.traversal.interrupt() that would then trigger whatever interrupt logic the traversal had? If we need to do a better job with AbstractStep, please create a JIRA (and/or submit a PR) so we don't forget to make some improvements there. On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail <pieter.mar...@gmail.com> wrote: > I just did a global Intellij search in the Sqlg project. > > HSQLDB has 13 catch (InterruptedException e) clauses. All of them > swallows the exception and none resets the interrupt flag. > > Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2 > swallows the exception without resetting the interrupt flag and one > throws an exception. > > The rest, > > logback, 7 catch (InterruptedException e) 1 resets the flag while the > rest swallow the exception without resetting the interrupt flag > > google guava about 25 catch (InterruptedException e) all resets the > interrupt flag > > hazelcast 85 catch (InterruptedException e) too many to count but some > resets the interrupt flag and some don't > > mchange c3po pool 7 catch (InterruptedException e), 4 throws exception > without resetting the interrupt flag and 3 swallow the exception without > resetting the interrupt flag. > > mchange common 8 catch (InterruptedException e), 2 throws an exception > without resetting the interrult flag and 6 complete swallow without > resetting. > > commons-io 8 catch (InterruptedException e) 1 reset of the interrupt > flag, 7 swallow the exception without resetting the interrupt flag > > jline 3 catch (InterruptedException e) all swallow the exception without > resetting the flag. > > > All and all I don't think using interrupt will be a reliable strategy to > use. > > http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag > says that it is good practise to always reset the flag. It might be good > but it is not common. > From the above rather quick search only google guava respected that good > practice. > > AbstractStep code > if (Thread.interrupted()) throw new TraversalInterruptedException(); > > will also reset the interrupt flag potentially making someone else's > Thread.interrupted() check fail. > > > All that said I do not have a solution for GremlinServer not having > access to the traversal. > > Thanks > Pieter > > > > > > > On 21/07/2016 17:09, Stephen Mallette wrote: > > I don't recall all the issues with doing traversal interruption with a > > flag. I suppose it could work in the same way that thread interruption > > works now. I will say that I'm hesitant to say that we should change this > > on the basis of this being a problem general to databases as we've only > > seen in so far in HSQLDB. If it was shown to be a problem in other graphs > > i'd be more amplified to see a change. Not sure if any other graph > > providers out there can attest to a problem with the thread interruption > > approach but it would be nice to hear so if there did. > > > > Of course, I think you alluded to the bigger problem, which is that > Gremlin > > Server uses thread interruption to kill script executions and iterations > > that exceed timeouts. So, the problem there is that, if someone submits a > > script like this: > > > > t = g.V() > > x = t.toList() > > > > that script gets pushed into a ScriptEngine.eval() method. That method > > blocks until it is complete. Under that situation, Gremlin Server doesn't > > have access to the Traversal to call interrupt on it. "t" is iterating > via > > toList() and there is no way to stop it. Not sure what we could do about > > situations like that. > > > > On Wed, Jul 20, 2016 at 4:00 PM, pieter-gmail <pieter.mar...@gmail.com> > > wrote: > > > >> The current interrupt implementation is failing on Sqlg's HSQLDB > >> implementation. > >> The reason for this is that HSQLDB itself relies on Thread.interrupt() > >> for its own internal logic. When TinkerPop interrupts the thread it > >> thinks it has to do with its own logic and as a result the interrupt > >> flag is reset and no exception is thrown. > >> > >> Reading the Thread.interrupt javadocs it says that wait(), join() and > >> sleep() will all reset the interrupt flag throw an InterruptedException. > >> This makes TinkerPop's reliance on the flag being set somewhat fragile. > >> All of those methods I suspect are common with database io code and > >> TinkerPop being a high level database layer makes it susceptible to 3rd > >> party interpretations of interrupt semantics. > >> > >> In some ways the TraversalInterruptionTest itself had to carefully reset > >> the flag with its usage of Thread.sleep(). > >> > >> My proposal is to mark the traversal itself as interrupted rather than > >> the thread and keep the logic contained to TinkerPop's space. > >> > >> Another benefit is that the traversal.interrupt() can raise an event > >> that implementations can listen to. On receipt of the event they would > >> then be able to send a separate request to the database to cancel a > >> particular query. In my case would be a nice way for Sqlg to tell > >> Postgresql or HSQLDB to cancel a particular query (the latest one the > >> traversal executed). > >> > >> In many ways the semantics are the same. Currently for client code > >> wanting to interrupt a particular traversal it needs to have a reference > >> to the thread the traversal is executing in. Now instead it needs to > >> keep a reference to executing traversals and interrupt them directly. > >> > >> Add Traversal.interrupt() and Traversal.isInterrupted(boolean > >> ClearInterrupted) > >> > >> Caveat, I am not familiar with GremlinServer nor the complications > >> around interrupt there so perhaps I am missing something. > >> > >> Thanks > >> Pieter > >> > >