Re: [DISCUSS] interrupt
"It's not clear to me if the problem exists in HSQLDB, the test, or tail step" This had nothing to do with the TailStep bug. That one is resolved for the most part. For the rest, where the problem is, is itself part of the problem. Thread.interrupt() has rather weak semantics having many different behaviors. Some reset the flag, some throw an exception some swallow and some do a combination of all. I don't think engaging 3rd parties with regards to this is an option. Firstly there are way to many 3rd parties where the InterruptException is being caught to even start. Secondly I reckon as the semantics are weak every 3rd party engagement will turn into a discussion itself. From what I gather many 3rd parties call Thread.wait/join/sleep and handle the InterruptException for the interrupt that they are expecting. I imagine they are swallowing the exception and not resetting the flag with good cause. Regarding delegating the query to a separate thread, even if Sqlg executes the sql in a different thread there are still many other 3rd party libraries that might interfere with the expected interrupt logic outside of just the sql query. This makes me of the opinion that Thread.interrupt is an unreliable mechanism for interrupting a traversal. Regarding asynchronous or synchronous I'd say the interrupt request should be asynchronous with a Future that returns on a successful cancellation. That way you can wait for it or not. >From what I understand the complexity is more in GremlinServer that executes scripts and has no real concept of a traversal. It does not even really want to interrupt a traversal as such but rather a script which may itself contain many traversals. I reckon it will have to pass in a object when executing the script which the graph will store in a threadvar. The graph can then register all traversals executing in the thread on that object. And when the time comes to interrupt a script GremlinServer will call interrupt on that object which in turn will interrupt the current executing traversal. Something like that is what I am thinking of. Cheers Pieter On 22/07/2016 14:24, Robert Dale wrote: > Trying to summarize the concerns I think I'm hearing: > 1. cancelling the gremlin job > 2. cancelling the task in the backend database, this implies handling > at minimum: > a. commit state: interruptable > b: rollback state: probably not interruptable > 3. responding to the client, returning the thread > > Should these things done synchronously or asynchronously or some > combination? The answer may depend on how decoupled they are. > > Separately, are tests doing the right thing? It's not clear to me if > the problem exists in HSQLDB, the test, or tail step. > > I think if Thread.interrupt() is the right way, then that's the way it > should be done regardless of bad citizen libraries. > > Handle 3rd party bad citizens by: > - filing a bug with them. Maybe they will fix or justify the behavior. > - tracking them in a Known Issues list > - workaround them as close as possible to the problem: > I'm not familiar with how providers work so I don't know how generally > applicable this would be, but in the case of Sqlg, the sql query > itself could be delegated to a separate thread in which special > interrupt strategies could be implemented such as the while loop. > > Side question: are there management tools in gremlin server to see > currently running tasks and kill them? Or is that something that would > be delegated to the backend database? >
Re: [DISCUSS] interrupt
Trying to summarize the concerns I think I'm hearing: 1. cancelling the gremlin job 2. cancelling the task in the backend database, this implies handling at minimum: a. commit state: interruptable b: rollback state: probably not interruptable 3. responding to the client, returning the thread Should these things done synchronously or asynchronously or some combination? The answer may depend on how decoupled they are. Separately, are tests doing the right thing? It's not clear to me if the problem exists in HSQLDB, the test, or tail step. I think if Thread.interrupt() is the right way, then that's the way it should be done regardless of bad citizen libraries. Handle 3rd party bad citizens by: - filing a bug with them. Maybe they will fix or justify the behavior. - tracking them in a Known Issues list - workaround them as close as possible to the problem: I'm not familiar with how providers work so I don't know how generally applicable this would be, but in the case of Sqlg, the sql query itself could be delegated to a separate thread in which special interrupt strategies could be implemented such as the while loop. Side question: are there management tools in gremlin server to see currently running tasks and kill them? Or is that something that would be delegated to the backend database? -- Robert Dale On Thu, Jul 21, 2016 at 4:21 PM, pieter-gmail wrote: > Ok, np, its not serious, Postgres is the important one for me anyhow and > it is behaving. I'll investigate how to tell Postgres to cancel the > query. Just stopping the traversal is not quite good enough as every now > and again we have queries on Postgres that persist even if the java > thread dies. > Thanks, > Pieter > > On 21/07/2016 22:16, Stephen Mallette wrote: >>> For every traversal that starts it notifies the caller via the reference >> object about the traversal. >> >> that's the tricky bit. you'd have to have some global tracking of spawned >> traversals to know that and it would have to be bound to the Thread that >> started it I guess. That information isn't going to be available out of a >> standard JSR-223 ScriptEngine.eval() call. We are making some changes to >> ScriptEngine where we extend upon it for purposes of Gremlin. Maybe there's >> opportunity in those changes to make a change like this somewhere in that >> work (though how that would happen is still murky to me). >> >> If we need changes to the ScriptEngine to even think about doing this, it >> may be a bit of a way off before we can make much progress here. I don't >> expect to see all the ScriptEngine work I had in mind done until 3.3.x as >> it must include some breaking changes to some public APIs to happen. >> >> >> >> >> >> On Thu, Jul 21, 2016 at 4:01 PM, pieter-gmail >> wrote: >> >>> Well no, the problem is Thread.interrupted() is not reliable. Does not >>> really matter who the caller is, GremlinServer or other. >>> Just about every 3rd party library I can see might reset the flag >>> meaning that the check will randomly return false or true. Something as >>> trivial as a logger might even reset the flag. It seems to me interrupt >>> is more for code that actually calls wait/join/sleep and they handle the >>> any subsequent InterruptException as they please. >>> >>> All I can think of for GremlinServer is a way more complex multi >>> threaded solution. >>> The ScriptEngine.eval passes in a reference object and returns >>> immediately. For every traversal that starts it notifies the caller via >>> the reference object about the traversal. The caller then uses that >>> traversal to interrupt it. Plus some more logic to know when the script >>> is done. >>> >>> Ok had another idea but kinda want to try it first as it might be >>> nonsense. Basically keep retrying the Thread.interrupt() till the thread >>> via exceptions bubbles to the top of the stack and gets handled >>> appropriately. >>> >>> On 21/07/2016 18:47, Stephen Mallette wrote: thanks for all that pieter. the primary reason for traversal interruption in the first place was so that gremlin server would have a chance to kill traversals that were running too long. Without a solution to that >>> problem, I'm not sure what to do here. just tossing ideas around - could we still check for thread interruption as an additional way to interrupt a Traversal. maybe instead of: if (Thread.interrupted()) throw new TraversalInterruptedException(); we need: if (Thread.interrupted()) this.traversal.interrupt() that would then trigger whatever interrupt logic the traversal had? If we need to do a better job with AbstractStep, please create a JIRA (and/or submit a PR) so we don't forget to make some improvements there. On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail wrote: > I just did a global Intellij search in the Sqlg project. > > HSQLDB has 13 catch (InterruptedException e) clauses. All of them >>>
Re: [DISCUSS] interrupt
Ok, np, its not serious, Postgres is the important one for me anyhow and it is behaving. I'll investigate how to tell Postgres to cancel the query. Just stopping the traversal is not quite good enough as every now and again we have queries on Postgres that persist even if the java thread dies. Thanks, Pieter On 21/07/2016 22:16, Stephen Mallette wrote: >> For every traversal that starts it notifies the caller via the reference > object about the traversal. > > that's the tricky bit. you'd have to have some global tracking of spawned > traversals to know that and it would have to be bound to the Thread that > started it I guess. That information isn't going to be available out of a > standard JSR-223 ScriptEngine.eval() call. We are making some changes to > ScriptEngine where we extend upon it for purposes of Gremlin. Maybe there's > opportunity in those changes to make a change like this somewhere in that > work (though how that would happen is still murky to me). > > If we need changes to the ScriptEngine to even think about doing this, it > may be a bit of a way off before we can make much progress here. I don't > expect to see all the ScriptEngine work I had in mind done until 3.3.x as > it must include some breaking changes to some public APIs to happen. > > > > > > On Thu, Jul 21, 2016 at 4:01 PM, pieter-gmail > wrote: > >> Well no, the problem is Thread.interrupted() is not reliable. Does not >> really matter who the caller is, GremlinServer or other. >> Just about every 3rd party library I can see might reset the flag >> meaning that the check will randomly return false or true. Something as >> trivial as a logger might even reset the flag. It seems to me interrupt >> is more for code that actually calls wait/join/sleep and they handle the >> any subsequent InterruptException as they please. >> >> All I can think of for GremlinServer is a way more complex multi >> threaded solution. >> The ScriptEngine.eval passes in a reference object and returns >> immediately. For every traversal that starts it notifies the caller via >> the reference object about the traversal. The caller then uses that >> traversal to interrupt it. Plus some more logic to know when the script >> is done. >> >> Ok had another idea but kinda want to try it first as it might be >> nonsense. Basically keep retrying the Thread.interrupt() till the thread >> via exceptions bubbles to the top of the stack and gets handled >> appropriately. >> >> On 21/07/2016 18:47, Stephen Mallette wrote: >>> thanks for all that pieter. the primary reason for traversal interruption >>> in the first place was so that gremlin server would have a chance to kill >>> traversals that were running too long. Without a solution to that >> problem, >>> I'm not sure what to do here. just tossing ideas around - could we still >>> check for thread interruption as an additional way to interrupt a >>> Traversal. maybe instead of: >>> >>> if (Thread.interrupted()) throw new TraversalInterruptedException(); >>> >>> we need: >>> >>> if (Thread.interrupted()) this.traversal.interrupt() >>> >>> that would then trigger whatever interrupt logic the traversal had? >>> >>> If we need to do a better job with AbstractStep, please create a JIRA >>> (and/or submit a PR) so we don't forget to make some improvements there. >>> >>> On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail >>> wrote: >>> I just did a global Intellij search in the Sqlg project. HSQLDB has 13 catch (InterruptedException e) clauses. All of them swallows the exception and none resets the interrupt flag. Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2 swallows the exception without resetting the interrupt flag and one throws an exception. The rest, logback, 7 catch (InterruptedException e) 1 resets the flag while the rest swallow the exception without resetting the interrupt flag google guava about 25 catch (InterruptedException e) all resets the interrupt flag hazelcast 85 catch (InterruptedException e) too many to count but some resets the interrupt flag and some don't mchange c3po pool 7 catch (InterruptedException e), 4 throws exception without resetting the interrupt flag and 3 swallow the exception without resetting the interrupt flag. mchange common 8 catch (InterruptedException e), 2 throws an exception without resetting the interrult flag and 6 complete swallow without resetting. commons-io 8 catch (InterruptedException e) 1 reset of the interrupt flag, 7 swallow the exception without resetting the interrupt flag jline 3 catch (InterruptedException e) all swallow the exception without resetting the flag. All and all I don't think using interrupt will be a reliable strategy to use. >> http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag says that it
Re: [DISCUSS] interrupt
> For every traversal that starts it notifies the caller via the reference object about the traversal. that's the tricky bit. you'd have to have some global tracking of spawned traversals to know that and it would have to be bound to the Thread that started it I guess. That information isn't going to be available out of a standard JSR-223 ScriptEngine.eval() call. We are making some changes to ScriptEngine where we extend upon it for purposes of Gremlin. Maybe there's opportunity in those changes to make a change like this somewhere in that work (though how that would happen is still murky to me). If we need changes to the ScriptEngine to even think about doing this, it may be a bit of a way off before we can make much progress here. I don't expect to see all the ScriptEngine work I had in mind done until 3.3.x as it must include some breaking changes to some public APIs to happen. On Thu, Jul 21, 2016 at 4:01 PM, pieter-gmail wrote: > Well no, the problem is Thread.interrupted() is not reliable. Does not > really matter who the caller is, GremlinServer or other. > Just about every 3rd party library I can see might reset the flag > meaning that the check will randomly return false or true. Something as > trivial as a logger might even reset the flag. It seems to me interrupt > is more for code that actually calls wait/join/sleep and they handle the > any subsequent InterruptException as they please. > > All I can think of for GremlinServer is a way more complex multi > threaded solution. > The ScriptEngine.eval passes in a reference object and returns > immediately. For every traversal that starts it notifies the caller via > the reference object about the traversal. The caller then uses that > traversal to interrupt it. Plus some more logic to know when the script > is done. > > Ok had another idea but kinda want to try it first as it might be > nonsense. Basically keep retrying the Thread.interrupt() till the thread > via exceptions bubbles to the top of the stack and gets handled > appropriately. > > On 21/07/2016 18:47, Stephen Mallette wrote: > > thanks for all that pieter. the primary reason for traversal interruption > > in the first place was so that gremlin server would have a chance to kill > > traversals that were running too long. Without a solution to that > problem, > > I'm not sure what to do here. just tossing ideas around - could we still > > check for thread interruption as an additional way to interrupt a > > Traversal. maybe instead of: > > > > if (Thread.interrupted()) throw new TraversalInterruptedException(); > > > > we need: > > > > if (Thread.interrupted()) this.traversal.interrupt() > > > > that would then trigger whatever interrupt logic the traversal had? > > > > If we need to do a better job with AbstractStep, please create a JIRA > > (and/or submit a PR) so we don't forget to make some improvements there. > > > > On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail > > wrote: > > > >> I just did a global Intellij search in the Sqlg project. > >> > >> HSQLDB has 13 catch (InterruptedException e) clauses. All of them > >> swallows the exception and none resets the interrupt flag. > >> > >> Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2 > >> swallows the exception without resetting the interrupt flag and one > >> throws an exception. > >> > >> The rest, > >> > >> logback, 7 catch (InterruptedException e) 1 resets the flag while the > >> rest swallow the exception without resetting the interrupt flag > >> > >> google guava about 25 catch (InterruptedException e) all resets the > >> interrupt flag > >> > >> hazelcast 85 catch (InterruptedException e) too many to count but some > >> resets the interrupt flag and some don't > >> > >> mchange c3po pool 7 catch (InterruptedException e), 4 throws exception > >> without resetting the interrupt flag and 3 swallow the exception without > >> resetting the interrupt flag. > >> > >> mchange common 8 catch (InterruptedException e), 2 throws an exception > >> without resetting the interrult flag and 6 complete swallow without > >> resetting. > >> > >> commons-io 8 catch (InterruptedException e) 1 reset of the interrupt > >> flag, 7 swallow the exception without resetting the interrupt flag > >> > >> jline 3 catch (InterruptedException e) all swallow the exception without > >> resetting the flag. > >> > >> > >> All and all I don't think using interrupt will be a reliable strategy to > >> use. > >> > >> > http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag > >> says that it is good practise to always reset the flag. It might be good > >> but it is not common. > >> From the above rather quick search only google guava respected that good > >> practice. > >> > >> AbstractStep code > >> if (Thread.interrupted()) throw new TraversalInterruptedException(); > >> > >> will also reset the interrupt flag potentially making someone else's > >> Thread.interrupted() check fail. > >> > >> > >
Re: [DISCUSS] interrupt
Well no, the problem is Thread.interrupted() is not reliable. Does not really matter who the caller is, GremlinServer or other. Just about every 3rd party library I can see might reset the flag meaning that the check will randomly return false or true. Something as trivial as a logger might even reset the flag. It seems to me interrupt is more for code that actually calls wait/join/sleep and they handle the any subsequent InterruptException as they please. All I can think of for GremlinServer is a way more complex multi threaded solution. The ScriptEngine.eval passes in a reference object and returns immediately. For every traversal that starts it notifies the caller via the reference object about the traversal. The caller then uses that traversal to interrupt it. Plus some more logic to know when the script is done. Ok had another idea but kinda want to try it first as it might be nonsense. Basically keep retrying the Thread.interrupt() till the thread via exceptions bubbles to the top of the stack and gets handled appropriately. On 21/07/2016 18:47, Stephen Mallette wrote: > thanks for all that pieter. the primary reason for traversal interruption > in the first place was so that gremlin server would have a chance to kill > traversals that were running too long. Without a solution to that problem, > I'm not sure what to do here. just tossing ideas around - could we still > check for thread interruption as an additional way to interrupt a > Traversal. maybe instead of: > > if (Thread.interrupted()) throw new TraversalInterruptedException(); > > we need: > > if (Thread.interrupted()) this.traversal.interrupt() > > that would then trigger whatever interrupt logic the traversal had? > > If we need to do a better job with AbstractStep, please create a JIRA > (and/or submit a PR) so we don't forget to make some improvements there. > > On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail > wrote: > >> I just did a global Intellij search in the Sqlg project. >> >> HSQLDB has 13 catch (InterruptedException e) clauses. All of them >> swallows the exception and none resets the interrupt flag. >> >> Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2 >> swallows the exception without resetting the interrupt flag and one >> throws an exception. >> >> The rest, >> >> logback, 7 catch (InterruptedException e) 1 resets the flag while the >> rest swallow the exception without resetting the interrupt flag >> >> google guava about 25 catch (InterruptedException e) all resets the >> interrupt flag >> >> hazelcast 85 catch (InterruptedException e) too many to count but some >> resets the interrupt flag and some don't >> >> mchange c3po pool 7 catch (InterruptedException e), 4 throws exception >> without resetting the interrupt flag and 3 swallow the exception without >> resetting the interrupt flag. >> >> mchange common 8 catch (InterruptedException e), 2 throws an exception >> without resetting the interrult flag and 6 complete swallow without >> resetting. >> >> commons-io 8 catch (InterruptedException e) 1 reset of the interrupt >> flag, 7 swallow the exception without resetting the interrupt flag >> >> jline 3 catch (InterruptedException e) all swallow the exception without >> resetting the flag. >> >> >> All and all I don't think using interrupt will be a reliable strategy to >> use. >> >> http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag >> says that it is good practise to always reset the flag. It might be good >> but it is not common. >> From the above rather quick search only google guava respected that good >> practice. >> >> AbstractStep code >> if (Thread.interrupted()) throw new TraversalInterruptedException(); >> >> will also reset the interrupt flag potentially making someone else's >> Thread.interrupted() check fail. >> >> >> All that said I do not have a solution for GremlinServer not having >> access to the traversal. >> >> Thanks >> Pieter >> >> >> >> >> >> >> On 21/07/2016 17:09, Stephen Mallette wrote: >>> I don't recall all the issues with doing traversal interruption with a >>> flag. I suppose it could work in the same way that thread interruption >>> works now. I will say that I'm hesitant to say that we should change this >>> on the basis of this being a problem general to databases as we've only >>> seen in so far in HSQLDB. If it was shown to be a problem in other graphs >>> i'd be more amplified to see a change. Not sure if any other graph >>> providers out there can attest to a problem with the thread interruption >>> approach but it would be nice to hear so if there did. >>> >>> Of course, I think you alluded to the bigger problem, which is that >> Gremlin >>> Server uses thread interruption to kill script executions and iterations >>> that exceed timeouts. So, the problem there is that, if someone submits a >>> script like this: >>> >>> t = g.V() >>> x = t.toList() >>> >>> that script gets pushed into a ScriptEngine.eval() method. That
Re: [DISCUSS] interrupt
thanks for all that pieter. the primary reason for traversal interruption in the first place was so that gremlin server would have a chance to kill traversals that were running too long. Without a solution to that problem, I'm not sure what to do here. just tossing ideas around - could we still check for thread interruption as an additional way to interrupt a Traversal. maybe instead of: if (Thread.interrupted()) throw new TraversalInterruptedException(); we need: if (Thread.interrupted()) this.traversal.interrupt() that would then trigger whatever interrupt logic the traversal had? If we need to do a better job with AbstractStep, please create a JIRA (and/or submit a PR) so we don't forget to make some improvements there. On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail wrote: > I just did a global Intellij search in the Sqlg project. > > HSQLDB has 13 catch (InterruptedException e) clauses. All of them > swallows the exception and none resets the interrupt flag. > > Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2 > swallows the exception without resetting the interrupt flag and one > throws an exception. > > The rest, > > logback, 7 catch (InterruptedException e) 1 resets the flag while the > rest swallow the exception without resetting the interrupt flag > > google guava about 25 catch (InterruptedException e) all resets the > interrupt flag > > hazelcast 85 catch (InterruptedException e) too many to count but some > resets the interrupt flag and some don't > > mchange c3po pool 7 catch (InterruptedException e), 4 throws exception > without resetting the interrupt flag and 3 swallow the exception without > resetting the interrupt flag. > > mchange common 8 catch (InterruptedException e), 2 throws an exception > without resetting the interrult flag and 6 complete swallow without > resetting. > > commons-io 8 catch (InterruptedException e) 1 reset of the interrupt > flag, 7 swallow the exception without resetting the interrupt flag > > jline 3 catch (InterruptedException e) all swallow the exception without > resetting the flag. > > > All and all I don't think using interrupt will be a reliable strategy to > use. > > http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag > says that it is good practise to always reset the flag. It might be good > but it is not common. > From the above rather quick search only google guava respected that good > practice. > > AbstractStep code > if (Thread.interrupted()) throw new TraversalInterruptedException(); > > will also reset the interrupt flag potentially making someone else's > Thread.interrupted() check fail. > > > All that said I do not have a solution for GremlinServer not having > access to the traversal. > > Thanks > Pieter > > > > > > > On 21/07/2016 17:09, Stephen Mallette wrote: > > I don't recall all the issues with doing traversal interruption with a > > flag. I suppose it could work in the same way that thread interruption > > works now. I will say that I'm hesitant to say that we should change this > > on the basis of this being a problem general to databases as we've only > > seen in so far in HSQLDB. If it was shown to be a problem in other graphs > > i'd be more amplified to see a change. Not sure if any other graph > > providers out there can attest to a problem with the thread interruption > > approach but it would be nice to hear so if there did. > > > > Of course, I think you alluded to the bigger problem, which is that > Gremlin > > Server uses thread interruption to kill script executions and iterations > > that exceed timeouts. So, the problem there is that, if someone submits a > > script like this: > > > > t = g.V() > > x = t.toList() > > > > that script gets pushed into a ScriptEngine.eval() method. That method > > blocks until it is complete. Under that situation, Gremlin Server doesn't > > have access to the Traversal to call interrupt on it. "t" is iterating > via > > toList() and there is no way to stop it. Not sure what we could do about > > situations like that. > > > > On Wed, Jul 20, 2016 at 4:00 PM, pieter-gmail > > wrote: > > > >> The current interrupt implementation is failing on Sqlg's HSQLDB > >> implementation. > >> The reason for this is that HSQLDB itself relies on Thread.interrupt() > >> for its own internal logic. When TinkerPop interrupts the thread it > >> thinks it has to do with its own logic and as a result the interrupt > >> flag is reset and no exception is thrown. > >> > >> Reading the Thread.interrupt javadocs it says that wait(), join() and > >> sleep() will all reset the interrupt flag throw an InterruptedException. > >> This makes TinkerPop's reliance on the flag being set somewhat fragile. > >> All of those methods I suspect are common with database io code and > >> TinkerPop being a high level database layer makes it susceptible to 3rd > >> party interpretations of interrupt semantics. > >> > >> In some ways the TraversalInterruptionTest itse
Re: [DISCUSS] interrupt
I just did a global Intellij search in the Sqlg project. HSQLDB has 13 catch (InterruptedException e) clauses. All of them swallows the exception and none resets the interrupt flag. Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2 swallows the exception without resetting the interrupt flag and one throws an exception. The rest, logback, 7 catch (InterruptedException e) 1 resets the flag while the rest swallow the exception without resetting the interrupt flag google guava about 25 catch (InterruptedException e) all resets the interrupt flag hazelcast 85 catch (InterruptedException e) too many to count but some resets the interrupt flag and some don't mchange c3po pool 7 catch (InterruptedException e), 4 throws exception without resetting the interrupt flag and 3 swallow the exception without resetting the interrupt flag. mchange common 8 catch (InterruptedException e), 2 throws an exception without resetting the interrult flag and 6 complete swallow without resetting. commons-io 8 catch (InterruptedException e) 1 reset of the interrupt flag, 7 swallow the exception without resetting the interrupt flag jline 3 catch (InterruptedException e) all swallow the exception without resetting the flag. All and all I don't think using interrupt will be a reliable strategy to use. http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag says that it is good practise to always reset the flag. It might be good but it is not common. >From the above rather quick search only google guava respected that good practice. AbstractStep code if (Thread.interrupted()) throw new TraversalInterruptedException(); will also reset the interrupt flag potentially making someone else's Thread.interrupted() check fail. All that said I do not have a solution for GremlinServer not having access to the traversal. Thanks Pieter On 21/07/2016 17:09, Stephen Mallette wrote: > I don't recall all the issues with doing traversal interruption with a > flag. I suppose it could work in the same way that thread interruption > works now. I will say that I'm hesitant to say that we should change this > on the basis of this being a problem general to databases as we've only > seen in so far in HSQLDB. If it was shown to be a problem in other graphs > i'd be more amplified to see a change. Not sure if any other graph > providers out there can attest to a problem with the thread interruption > approach but it would be nice to hear so if there did. > > Of course, I think you alluded to the bigger problem, which is that Gremlin > Server uses thread interruption to kill script executions and iterations > that exceed timeouts. So, the problem there is that, if someone submits a > script like this: > > t = g.V() > x = t.toList() > > that script gets pushed into a ScriptEngine.eval() method. That method > blocks until it is complete. Under that situation, Gremlin Server doesn't > have access to the Traversal to call interrupt on it. "t" is iterating via > toList() and there is no way to stop it. Not sure what we could do about > situations like that. > > On Wed, Jul 20, 2016 at 4:00 PM, pieter-gmail > wrote: > >> The current interrupt implementation is failing on Sqlg's HSQLDB >> implementation. >> The reason for this is that HSQLDB itself relies on Thread.interrupt() >> for its own internal logic. When TinkerPop interrupts the thread it >> thinks it has to do with its own logic and as a result the interrupt >> flag is reset and no exception is thrown. >> >> Reading the Thread.interrupt javadocs it says that wait(), join() and >> sleep() will all reset the interrupt flag throw an InterruptedException. >> This makes TinkerPop's reliance on the flag being set somewhat fragile. >> All of those methods I suspect are common with database io code and >> TinkerPop being a high level database layer makes it susceptible to 3rd >> party interpretations of interrupt semantics. >> >> In some ways the TraversalInterruptionTest itself had to carefully reset >> the flag with its usage of Thread.sleep(). >> >> My proposal is to mark the traversal itself as interrupted rather than >> the thread and keep the logic contained to TinkerPop's space. >> >> Another benefit is that the traversal.interrupt() can raise an event >> that implementations can listen to. On receipt of the event they would >> then be able to send a separate request to the database to cancel a >> particular query. In my case would be a nice way for Sqlg to tell >> Postgresql or HSQLDB to cancel a particular query (the latest one the >> traversal executed). >> >> In many ways the semantics are the same. Currently for client code >> wanting to interrupt a particular traversal it needs to have a reference >> to the thread the traversal is executing in. Now instead it needs to >> keep a reference to executing traversals and interrupt them directly. >> >> Add Traversal.interrupt() and Traversal.isInterrupted(boolean >> ClearInterrupted)
Re: [DISCUSS] interrupt
I don't recall all the issues with doing traversal interruption with a flag. I suppose it could work in the same way that thread interruption works now. I will say that I'm hesitant to say that we should change this on the basis of this being a problem general to databases as we've only seen in so far in HSQLDB. If it was shown to be a problem in other graphs i'd be more amplified to see a change. Not sure if any other graph providers out there can attest to a problem with the thread interruption approach but it would be nice to hear so if there did. Of course, I think you alluded to the bigger problem, which is that Gremlin Server uses thread interruption to kill script executions and iterations that exceed timeouts. So, the problem there is that, if someone submits a script like this: t = g.V() x = t.toList() that script gets pushed into a ScriptEngine.eval() method. That method blocks until it is complete. Under that situation, Gremlin Server doesn't have access to the Traversal to call interrupt on it. "t" is iterating via toList() and there is no way to stop it. Not sure what we could do about situations like that. On Wed, Jul 20, 2016 at 4:00 PM, pieter-gmail wrote: > The current interrupt implementation is failing on Sqlg's HSQLDB > implementation. > The reason for this is that HSQLDB itself relies on Thread.interrupt() > for its own internal logic. When TinkerPop interrupts the thread it > thinks it has to do with its own logic and as a result the interrupt > flag is reset and no exception is thrown. > > Reading the Thread.interrupt javadocs it says that wait(), join() and > sleep() will all reset the interrupt flag throw an InterruptedException. > This makes TinkerPop's reliance on the flag being set somewhat fragile. > All of those methods I suspect are common with database io code and > TinkerPop being a high level database layer makes it susceptible to 3rd > party interpretations of interrupt semantics. > > In some ways the TraversalInterruptionTest itself had to carefully reset > the flag with its usage of Thread.sleep(). > > My proposal is to mark the traversal itself as interrupted rather than > the thread and keep the logic contained to TinkerPop's space. > > Another benefit is that the traversal.interrupt() can raise an event > that implementations can listen to. On receipt of the event they would > then be able to send a separate request to the database to cancel a > particular query. In my case would be a nice way for Sqlg to tell > Postgresql or HSQLDB to cancel a particular query (the latest one the > traversal executed). > > In many ways the semantics are the same. Currently for client code > wanting to interrupt a particular traversal it needs to have a reference > to the thread the traversal is executing in. Now instead it needs to > keep a reference to executing traversals and interrupt them directly. > > Add Traversal.interrupt() and Traversal.isInterrupted(boolean > ClearInterrupted) > > Caveat, I am not familiar with GremlinServer nor the complications > around interrupt there so perhaps I am missing something. > > Thanks > Pieter >