Re: [DISCUSS] interrupt

2016-07-22 Thread pieter-gmail
"It's not clear to me if the problem exists in HSQLDB, the test, or tail
step"

This had nothing to do with the TailStep bug. That one is resolved for
the most part.

For the rest, where the problem is, is itself part of the problem.
Thread.interrupt() has rather weak semantics having many different
behaviors. Some reset the flag, some throw an exception some swallow and
some do a combination of all. I don't think engaging 3rd parties with
regards to this is an option. Firstly there are way to many 3rd parties
where the InterruptException is being caught to even start. Secondly I
reckon as the semantics are weak every 3rd party engagement will turn
into a discussion itself. From what I gather many 3rd parties call
Thread.wait/join/sleep and handle the InterruptException for the
interrupt that they are expecting. I imagine they are swallowing the
exception and not resetting the flag with good cause.

Regarding delegating the query to a separate thread, even if Sqlg
executes the sql in a different thread there are still many other 3rd
party libraries that might interfere with the expected interrupt logic
outside of just the sql query.

This makes me of the opinion that Thread.interrupt is an unreliable
mechanism for interrupting a traversal.

Regarding asynchronous or synchronous I'd say the interrupt request
should be asynchronous with a Future that returns on a successful
cancellation. That way you can wait for it or not.

>From what I understand the complexity is more in GremlinServer that
executes scripts and has no real concept of a traversal. It does not
even really want to interrupt a traversal as such but rather a script
which may itself contain many traversals. I reckon it will have to pass
in a object when executing the script which the graph will store in a
threadvar. The graph can then register all traversals executing in the
thread on that object. And when the time comes to interrupt a script
GremlinServer will call interrupt on that object which in turn will
interrupt the current executing traversal. Something like that is what I
am thinking of.

Cheers
Pieter


On 22/07/2016 14:24, Robert Dale wrote:
> Trying to summarize the concerns I think I'm hearing:
> 1. cancelling the gremlin job
> 2. cancelling the task in the backend database, this implies handling
> at minimum:
>   a. commit state: interruptable
>   b: rollback state: probably not interruptable
> 3. responding to the client, returning the thread
>
> Should these things done synchronously or asynchronously or some
> combination? The answer may depend on how decoupled they are.
>
> Separately, are tests doing the right thing? It's not clear to me if
> the problem exists in HSQLDB, the test, or tail step.
>
> I think if Thread.interrupt() is the right way, then that's the way it
> should be done regardless of bad citizen libraries.
>
> Handle 3rd party bad citizens by:
> - filing a bug with them. Maybe they will fix or justify the behavior.
> - tracking them in a Known Issues list
> - workaround them as close as possible to the problem:
> I'm not familiar with how providers work so I don't know how generally
> applicable this would be, but in the case of Sqlg, the sql query
> itself could be delegated to a separate thread in which special
> interrupt strategies could be implemented such as the while loop.
>
> Side question: are there management tools in gremlin server to see
> currently running tasks and kill them? Or is that something that would
> be delegated to the backend database?
>



Re: [DISCUSS] interrupt

2016-07-22 Thread Robert Dale
Trying to summarize the concerns I think I'm hearing:
1. cancelling the gremlin job
2. cancelling the task in the backend database, this implies handling
at minimum:
  a. commit state: interruptable
  b: rollback state: probably not interruptable
3. responding to the client, returning the thread

Should these things done synchronously or asynchronously or some
combination? The answer may depend on how decoupled they are.

Separately, are tests doing the right thing? It's not clear to me if
the problem exists in HSQLDB, the test, or tail step.

I think if Thread.interrupt() is the right way, then that's the way it
should be done regardless of bad citizen libraries.

Handle 3rd party bad citizens by:
- filing a bug with them. Maybe they will fix or justify the behavior.
- tracking them in a Known Issues list
- workaround them as close as possible to the problem:
I'm not familiar with how providers work so I don't know how generally
applicable this would be, but in the case of Sqlg, the sql query
itself could be delegated to a separate thread in which special
interrupt strategies could be implemented such as the while loop.

Side question: are there management tools in gremlin server to see
currently running tasks and kill them? Or is that something that would
be delegated to the backend database?

-- 
Robert Dale

On Thu, Jul 21, 2016 at 4:21 PM, pieter-gmail  wrote:
> Ok, np, its not serious, Postgres is the important one for me anyhow and
> it is behaving. I'll investigate how to tell Postgres to cancel the
> query. Just stopping the traversal is not quite good enough as every now
> and again we have queries on Postgres that persist even if the java
> thread dies.
> Thanks,
> Pieter
>
> On 21/07/2016 22:16, Stephen Mallette wrote:
>>> For every traversal that starts it notifies the caller via the reference
>> object about the traversal.
>>
>> that's the tricky bit. you'd have to have some global tracking of spawned
>> traversals to know that and it would have to be bound to the Thread that
>> started it I guess. That information isn't going to be available out of a
>> standard JSR-223 ScriptEngine.eval() call. We are making some changes to
>> ScriptEngine where we extend upon it for purposes of Gremlin. Maybe there's
>> opportunity in those changes to make a change like this somewhere in that
>> work (though how that would happen is still murky to me).
>>
>> If we need changes to the ScriptEngine to even think about doing this, it
>> may be a bit of a way off before we can make much progress here. I don't
>> expect to see all the ScriptEngine work I had in mind done until 3.3.x as
>> it must include some breaking changes to some public APIs to happen.
>>
>>
>>
>>
>>
>> On Thu, Jul 21, 2016 at 4:01 PM, pieter-gmail 
>> wrote:
>>
>>> Well no, the problem is Thread.interrupted() is not reliable. Does not
>>> really matter who the caller is, GremlinServer or other.
>>> Just about every 3rd party library I can see might reset the flag
>>> meaning that the check will randomly return false or true. Something as
>>> trivial as a logger might even reset the flag. It seems to me interrupt
>>> is more for code that actually calls wait/join/sleep and they handle the
>>> any subsequent InterruptException as they please.
>>>
>>> All I can think of for GremlinServer is a way more complex multi
>>> threaded solution.
>>> The ScriptEngine.eval passes in a reference object and returns
>>> immediately. For every traversal that starts it notifies the caller via
>>> the reference object about the traversal. The caller then uses that
>>> traversal to interrupt it. Plus some more logic to know when the script
>>> is done.
>>>
>>> Ok had another idea but kinda want to try it first as it might be
>>> nonsense. Basically keep retrying the Thread.interrupt() till the thread
>>> via exceptions bubbles to the top of the stack and gets handled
>>> appropriately.
>>>
>>> On 21/07/2016 18:47, Stephen Mallette wrote:
 thanks for all that pieter. the primary reason for traversal interruption
 in the first place was so that gremlin server would have a chance to kill
 traversals that were running too long. Without a solution to that
>>> problem,
 I'm not sure what to do here. just tossing ideas around - could we still
 check for thread interruption as an additional way to interrupt a
 Traversal. maybe instead of:

 if (Thread.interrupted()) throw new TraversalInterruptedException();

 we need:

 if (Thread.interrupted()) this.traversal.interrupt()

 that would then trigger whatever interrupt logic the traversal had?

 If we need to do a better job with AbstractStep, please create a JIRA
 (and/or submit a PR) so we don't forget to make some improvements there.

 On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail 
 wrote:

> I just did a global Intellij search in the Sqlg project.
>
> HSQLDB has 13 catch (InterruptedException e) clauses. All of them
>>>

Re: [DISCUSS] interrupt

2016-07-21 Thread pieter-gmail
Ok, np, its not serious, Postgres is the important one for me anyhow and
it is behaving. I'll investigate how to tell Postgres to cancel the
query. Just stopping the traversal is not quite good enough as every now
and again we have queries on Postgres that persist even if the java
thread dies.
Thanks,
Pieter

On 21/07/2016 22:16, Stephen Mallette wrote:
>> For every traversal that starts it notifies the caller via the reference
> object about the traversal.
>
> that's the tricky bit. you'd have to have some global tracking of spawned
> traversals to know that and it would have to be bound to the Thread that
> started it I guess. That information isn't going to be available out of a
> standard JSR-223 ScriptEngine.eval() call. We are making some changes to
> ScriptEngine where we extend upon it for purposes of Gremlin. Maybe there's
> opportunity in those changes to make a change like this somewhere in that
> work (though how that would happen is still murky to me).
>
> If we need changes to the ScriptEngine to even think about doing this, it
> may be a bit of a way off before we can make much progress here. I don't
> expect to see all the ScriptEngine work I had in mind done until 3.3.x as
> it must include some breaking changes to some public APIs to happen.
>
>
>
>
>
> On Thu, Jul 21, 2016 at 4:01 PM, pieter-gmail 
> wrote:
>
>> Well no, the problem is Thread.interrupted() is not reliable. Does not
>> really matter who the caller is, GremlinServer or other.
>> Just about every 3rd party library I can see might reset the flag
>> meaning that the check will randomly return false or true. Something as
>> trivial as a logger might even reset the flag. It seems to me interrupt
>> is more for code that actually calls wait/join/sleep and they handle the
>> any subsequent InterruptException as they please.
>>
>> All I can think of for GremlinServer is a way more complex multi
>> threaded solution.
>> The ScriptEngine.eval passes in a reference object and returns
>> immediately. For every traversal that starts it notifies the caller via
>> the reference object about the traversal. The caller then uses that
>> traversal to interrupt it. Plus some more logic to know when the script
>> is done.
>>
>> Ok had another idea but kinda want to try it first as it might be
>> nonsense. Basically keep retrying the Thread.interrupt() till the thread
>> via exceptions bubbles to the top of the stack and gets handled
>> appropriately.
>>
>> On 21/07/2016 18:47, Stephen Mallette wrote:
>>> thanks for all that pieter. the primary reason for traversal interruption
>>> in the first place was so that gremlin server would have a chance to kill
>>> traversals that were running too long. Without a solution to that
>> problem,
>>> I'm not sure what to do here. just tossing ideas around - could we still
>>> check for thread interruption as an additional way to interrupt a
>>> Traversal. maybe instead of:
>>>
>>> if (Thread.interrupted()) throw new TraversalInterruptedException();
>>>
>>> we need:
>>>
>>> if (Thread.interrupted()) this.traversal.interrupt()
>>>
>>> that would then trigger whatever interrupt logic the traversal had?
>>>
>>> If we need to do a better job with AbstractStep, please create a JIRA
>>> (and/or submit a PR) so we don't forget to make some improvements there.
>>>
>>> On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail 
>>> wrote:
>>>
 I just did a global Intellij search in the Sqlg project.

 HSQLDB has 13 catch (InterruptedException e) clauses. All of them
 swallows the exception and none resets the interrupt flag.

 Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2
 swallows the exception without resetting the interrupt flag and one
 throws an exception.

 The rest,

 logback, 7 catch (InterruptedException e) 1 resets the flag while the
 rest swallow the exception without resetting the interrupt flag

 google guava about 25 catch (InterruptedException e) all resets the
 interrupt flag

 hazelcast 85 catch (InterruptedException e) too many to count but some
 resets the interrupt flag and some don't

 mchange c3po pool 7 catch (InterruptedException e), 4 throws exception
 without resetting the interrupt flag and 3 swallow the exception without
 resetting the interrupt flag.

 mchange common 8 catch (InterruptedException e), 2 throws an exception
 without resetting the interrult flag and 6 complete swallow without
 resetting.

 commons-io 8 catch (InterruptedException e) 1 reset of the interrupt
 flag, 7 swallow the exception without resetting the interrupt flag

 jline 3 catch (InterruptedException e) all swallow the exception without
 resetting the flag.


 All and all I don't think using interrupt will be a reliable strategy to
 use.


>> http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag
 says that it

Re: [DISCUSS] interrupt

2016-07-21 Thread Stephen Mallette
> For every traversal that starts it notifies the caller via the reference
object about the traversal.

that's the tricky bit. you'd have to have some global tracking of spawned
traversals to know that and it would have to be bound to the Thread that
started it I guess. That information isn't going to be available out of a
standard JSR-223 ScriptEngine.eval() call. We are making some changes to
ScriptEngine where we extend upon it for purposes of Gremlin. Maybe there's
opportunity in those changes to make a change like this somewhere in that
work (though how that would happen is still murky to me).

If we need changes to the ScriptEngine to even think about doing this, it
may be a bit of a way off before we can make much progress here. I don't
expect to see all the ScriptEngine work I had in mind done until 3.3.x as
it must include some breaking changes to some public APIs to happen.





On Thu, Jul 21, 2016 at 4:01 PM, pieter-gmail 
wrote:

> Well no, the problem is Thread.interrupted() is not reliable. Does not
> really matter who the caller is, GremlinServer or other.
> Just about every 3rd party library I can see might reset the flag
> meaning that the check will randomly return false or true. Something as
> trivial as a logger might even reset the flag. It seems to me interrupt
> is more for code that actually calls wait/join/sleep and they handle the
> any subsequent InterruptException as they please.
>
> All I can think of for GremlinServer is a way more complex multi
> threaded solution.
> The ScriptEngine.eval passes in a reference object and returns
> immediately. For every traversal that starts it notifies the caller via
> the reference object about the traversal. The caller then uses that
> traversal to interrupt it. Plus some more logic to know when the script
> is done.
>
> Ok had another idea but kinda want to try it first as it might be
> nonsense. Basically keep retrying the Thread.interrupt() till the thread
> via exceptions bubbles to the top of the stack and gets handled
> appropriately.
>
> On 21/07/2016 18:47, Stephen Mallette wrote:
> > thanks for all that pieter. the primary reason for traversal interruption
> > in the first place was so that gremlin server would have a chance to kill
> > traversals that were running too long. Without a solution to that
> problem,
> > I'm not sure what to do here. just tossing ideas around - could we still
> > check for thread interruption as an additional way to interrupt a
> > Traversal. maybe instead of:
> >
> > if (Thread.interrupted()) throw new TraversalInterruptedException();
> >
> > we need:
> >
> > if (Thread.interrupted()) this.traversal.interrupt()
> >
> > that would then trigger whatever interrupt logic the traversal had?
> >
> > If we need to do a better job with AbstractStep, please create a JIRA
> > (and/or submit a PR) so we don't forget to make some improvements there.
> >
> > On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail 
> > wrote:
> >
> >> I just did a global Intellij search in the Sqlg project.
> >>
> >> HSQLDB has 13 catch (InterruptedException e) clauses. All of them
> >> swallows the exception and none resets the interrupt flag.
> >>
> >> Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2
> >> swallows the exception without resetting the interrupt flag and one
> >> throws an exception.
> >>
> >> The rest,
> >>
> >> logback, 7 catch (InterruptedException e) 1 resets the flag while the
> >> rest swallow the exception without resetting the interrupt flag
> >>
> >> google guava about 25 catch (InterruptedException e) all resets the
> >> interrupt flag
> >>
> >> hazelcast 85 catch (InterruptedException e) too many to count but some
> >> resets the interrupt flag and some don't
> >>
> >> mchange c3po pool 7 catch (InterruptedException e), 4 throws exception
> >> without resetting the interrupt flag and 3 swallow the exception without
> >> resetting the interrupt flag.
> >>
> >> mchange common 8 catch (InterruptedException e), 2 throws an exception
> >> without resetting the interrult flag and 6 complete swallow without
> >> resetting.
> >>
> >> commons-io 8 catch (InterruptedException e) 1 reset of the interrupt
> >> flag, 7 swallow the exception without resetting the interrupt flag
> >>
> >> jline 3 catch (InterruptedException e) all swallow the exception without
> >> resetting the flag.
> >>
> >>
> >> All and all I don't think using interrupt will be a reliable strategy to
> >> use.
> >>
> >>
> http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag
> >> says that it is good practise to always reset the flag. It might be good
> >> but it is not common.
> >> From the above rather quick search only google guava respected that good
> >> practice.
> >>
> >> AbstractStep code
> >> if (Thread.interrupted()) throw new TraversalInterruptedException();
> >>
> >> will also reset the interrupt flag potentially making someone else's
> >> Thread.interrupted() check fail.
> >>
> >>
> >

Re: [DISCUSS] interrupt

2016-07-21 Thread pieter-gmail
Well no, the problem is Thread.interrupted() is not reliable. Does not
really matter who the caller is, GremlinServer or other.
Just about every 3rd party library I can see might reset the flag
meaning that the check will randomly return false or true. Something as
trivial as a logger might even reset the flag. It seems to me interrupt
is more for code that actually calls wait/join/sleep and they handle the
any subsequent InterruptException as they please.

All I can think of for GremlinServer is a way more complex multi
threaded solution.
The ScriptEngine.eval passes in a reference object and returns
immediately. For every traversal that starts it notifies the caller via
the reference object about the traversal. The caller then uses that
traversal to interrupt it. Plus some more logic to know when the script
is done.

Ok had another idea but kinda want to try it first as it might be
nonsense. Basically keep retrying the Thread.interrupt() till the thread
via exceptions bubbles to the top of the stack and gets handled
appropriately.

On 21/07/2016 18:47, Stephen Mallette wrote:
> thanks for all that pieter. the primary reason for traversal interruption
> in the first place was so that gremlin server would have a chance to kill
> traversals that were running too long. Without a solution to that problem,
> I'm not sure what to do here. just tossing ideas around - could we still
> check for thread interruption as an additional way to interrupt a
> Traversal. maybe instead of:
>
> if (Thread.interrupted()) throw new TraversalInterruptedException();
>
> we need:
>
> if (Thread.interrupted()) this.traversal.interrupt()
>
> that would then trigger whatever interrupt logic the traversal had?
>
> If we need to do a better job with AbstractStep, please create a JIRA
> (and/or submit a PR) so we don't forget to make some improvements there.
>
> On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail 
> wrote:
>
>> I just did a global Intellij search in the Sqlg project.
>>
>> HSQLDB has 13 catch (InterruptedException e) clauses. All of them
>> swallows the exception and none resets the interrupt flag.
>>
>> Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2
>> swallows the exception without resetting the interrupt flag and one
>> throws an exception.
>>
>> The rest,
>>
>> logback, 7 catch (InterruptedException e) 1 resets the flag while the
>> rest swallow the exception without resetting the interrupt flag
>>
>> google guava about 25 catch (InterruptedException e) all resets the
>> interrupt flag
>>
>> hazelcast 85 catch (InterruptedException e) too many to count but some
>> resets the interrupt flag and some don't
>>
>> mchange c3po pool 7 catch (InterruptedException e), 4 throws exception
>> without resetting the interrupt flag and 3 swallow the exception without
>> resetting the interrupt flag.
>>
>> mchange common 8 catch (InterruptedException e), 2 throws an exception
>> without resetting the interrult flag and 6 complete swallow without
>> resetting.
>>
>> commons-io 8 catch (InterruptedException e) 1 reset of the interrupt
>> flag, 7 swallow the exception without resetting the interrupt flag
>>
>> jline 3 catch (InterruptedException e) all swallow the exception without
>> resetting the flag.
>>
>>
>> All and all I don't think using interrupt will be a reliable strategy to
>> use.
>>
>> http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag
>> says that it is good practise to always reset the flag. It might be good
>> but it is not common.
>> From the above rather quick search only google guava respected that good
>> practice.
>>
>> AbstractStep code
>> if (Thread.interrupted()) throw new TraversalInterruptedException();
>>
>> will also reset the interrupt flag potentially making someone else's
>> Thread.interrupted() check fail.
>>
>>
>> All that said I do not have a solution for GremlinServer not having
>> access to the traversal.
>>
>> Thanks
>> Pieter
>>
>>
>>
>>
>>
>>
>> On 21/07/2016 17:09, Stephen Mallette wrote:
>>> I don't recall all the issues with doing traversal interruption with a
>>> flag. I suppose it could work in the same way that thread interruption
>>> works now. I will say that I'm hesitant to say that we should change this
>>> on the basis of this being a problem general to databases as we've only
>>> seen in so far in HSQLDB. If it was shown to be a problem in other graphs
>>> i'd be more amplified to see a change. Not sure if any other graph
>>> providers out there can attest to a problem with the thread interruption
>>> approach but it would be nice to hear so if there did.
>>>
>>> Of course, I think you alluded to the bigger problem, which is that
>> Gremlin
>>> Server uses thread interruption to kill script executions and iterations
>>> that exceed timeouts. So, the problem there is that, if someone submits a
>>> script like this:
>>>
>>> t = g.V()
>>> x = t.toList()
>>>
>>> that script gets pushed into a ScriptEngine.eval() method. That

Re: [DISCUSS] interrupt

2016-07-21 Thread Stephen Mallette
thanks for all that pieter. the primary reason for traversal interruption
in the first place was so that gremlin server would have a chance to kill
traversals that were running too long. Without a solution to that problem,
I'm not sure what to do here. just tossing ideas around - could we still
check for thread interruption as an additional way to interrupt a
Traversal. maybe instead of:

if (Thread.interrupted()) throw new TraversalInterruptedException();

we need:

if (Thread.interrupted()) this.traversal.interrupt()

that would then trigger whatever interrupt logic the traversal had?

If we need to do a better job with AbstractStep, please create a JIRA
(and/or submit a PR) so we don't forget to make some improvements there.

On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail 
wrote:

> I just did a global Intellij search in the Sqlg project.
>
> HSQLDB has 13 catch (InterruptedException e) clauses. All of them
> swallows the exception and none resets the interrupt flag.
>
> Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2
> swallows the exception without resetting the interrupt flag and one
> throws an exception.
>
> The rest,
>
> logback, 7 catch (InterruptedException e) 1 resets the flag while the
> rest swallow the exception without resetting the interrupt flag
>
> google guava about 25 catch (InterruptedException e) all resets the
> interrupt flag
>
> hazelcast 85 catch (InterruptedException e) too many to count but some
> resets the interrupt flag and some don't
>
> mchange c3po pool 7 catch (InterruptedException e), 4 throws exception
> without resetting the interrupt flag and 3 swallow the exception without
> resetting the interrupt flag.
>
> mchange common 8 catch (InterruptedException e), 2 throws an exception
> without resetting the interrult flag and 6 complete swallow without
> resetting.
>
> commons-io 8 catch (InterruptedException e) 1 reset of the interrupt
> flag, 7 swallow the exception without resetting the interrupt flag
>
> jline 3 catch (InterruptedException e) all swallow the exception without
> resetting the flag.
>
>
> All and all I don't think using interrupt will be a reliable strategy to
> use.
>
> http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag
> says that it is good practise to always reset the flag. It might be good
> but it is not common.
> From the above rather quick search only google guava respected that good
> practice.
>
> AbstractStep code
> if (Thread.interrupted()) throw new TraversalInterruptedException();
>
> will also reset the interrupt flag potentially making someone else's
> Thread.interrupted() check fail.
>
>
> All that said I do not have a solution for GremlinServer not having
> access to the traversal.
>
> Thanks
> Pieter
>
>
>
>
>
>
> On 21/07/2016 17:09, Stephen Mallette wrote:
> > I don't recall all the issues with doing traversal interruption with a
> > flag. I suppose it could work in the same way that thread interruption
> > works now. I will say that I'm hesitant to say that we should change this
> > on the basis of this being a problem general to databases as we've only
> > seen in so far in HSQLDB. If it was shown to be a problem in other graphs
> > i'd be more amplified to see a change. Not sure if any other graph
> > providers out there can attest to a problem with the thread interruption
> > approach but it would be nice to hear so if there did.
> >
> > Of course, I think you alluded to the bigger problem, which is that
> Gremlin
> > Server uses thread interruption to kill script executions and iterations
> > that exceed timeouts. So, the problem there is that, if someone submits a
> > script like this:
> >
> > t = g.V()
> > x = t.toList()
> >
> > that script gets pushed into a ScriptEngine.eval() method. That method
> > blocks until it is complete. Under that situation, Gremlin Server doesn't
> > have access to the Traversal to call interrupt on it. "t" is iterating
> via
> > toList() and there is no way to stop it. Not sure what we could do about
> > situations like that.
> >
> > On Wed, Jul 20, 2016 at 4:00 PM, pieter-gmail 
> > wrote:
> >
> >> The current interrupt implementation is failing on Sqlg's HSQLDB
> >> implementation.
> >> The reason for this is that HSQLDB itself relies on Thread.interrupt()
> >> for its own internal logic. When TinkerPop interrupts the thread it
> >> thinks it has to do with its own logic and as a result the interrupt
> >> flag is reset and no exception is thrown.
> >>
> >> Reading the Thread.interrupt javadocs it says that wait(), join() and
> >> sleep() will all reset the interrupt flag throw an InterruptedException.
> >> This makes TinkerPop's reliance on the flag being set somewhat fragile.
> >> All of those methods I suspect are common with database io code and
> >> TinkerPop being a high level database layer makes it susceptible to 3rd
> >> party interpretations of interrupt semantics.
> >>
> >> In some ways the TraversalInterruptionTest itse

Re: [DISCUSS] interrupt

2016-07-21 Thread pieter-gmail
I just did a global Intellij search in the Sqlg project.

HSQLDB has 13 catch (InterruptedException e) clauses. All of them
swallows the exception and none resets the interrupt flag.

Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2
swallows the exception without resetting the interrupt flag and one
throws an exception.

The rest,

logback, 7 catch (InterruptedException e) 1 resets the flag while the
rest swallow the exception without resetting the interrupt flag

google guava about 25 catch (InterruptedException e) all resets the
interrupt flag

hazelcast 85 catch (InterruptedException e) too many to count but some
resets the interrupt flag and some don't

mchange c3po pool 7 catch (InterruptedException e), 4 throws exception
without resetting the interrupt flag and 3 swallow the exception without
resetting the interrupt flag.

mchange common 8 catch (InterruptedException e), 2 throws an exception
without resetting the interrult flag and 6 complete swallow without
resetting.

commons-io 8 catch (InterruptedException e) 1 reset of the interrupt
flag, 7 swallow the exception without resetting the interrupt flag

jline 3 catch (InterruptedException e) all swallow the exception without
resetting the flag.


All and all I don't think using interrupt will be a reliable strategy to
use.
http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag
says that it is good practise to always reset the flag. It might be good
but it is not common.
>From the above rather quick search only google guava respected that good
practice.

AbstractStep code
if (Thread.interrupted()) throw new TraversalInterruptedException();

will also reset the interrupt flag potentially making someone else's
Thread.interrupted() check fail.


All that said I do not have a solution for GremlinServer not having
access to the traversal.

Thanks
Pieter






On 21/07/2016 17:09, Stephen Mallette wrote:
> I don't recall all the issues with doing traversal interruption with a
> flag. I suppose it could work in the same way that thread interruption
> works now. I will say that I'm hesitant to say that we should change this
> on the basis of this being a problem general to databases as we've only
> seen in so far in HSQLDB. If it was shown to be a problem in other graphs
> i'd be more amplified to see a change. Not sure if any other graph
> providers out there can attest to a problem with the thread interruption
> approach but it would be nice to hear so if there did.
>
> Of course, I think you alluded to the bigger problem, which is that Gremlin
> Server uses thread interruption to kill script executions and iterations
> that exceed timeouts. So, the problem there is that, if someone submits a
> script like this:
>
> t = g.V()
> x = t.toList()
>
> that script gets pushed into a ScriptEngine.eval() method. That method
> blocks until it is complete. Under that situation, Gremlin Server doesn't
> have access to the Traversal to call interrupt on it. "t" is iterating via
> toList() and there is no way to stop it. Not sure what we could do about
> situations like that.
>
> On Wed, Jul 20, 2016 at 4:00 PM, pieter-gmail 
> wrote:
>
>> The current interrupt implementation is failing on Sqlg's HSQLDB
>> implementation.
>> The reason for this is that HSQLDB itself relies on Thread.interrupt()
>> for its own internal logic. When TinkerPop interrupts the thread it
>> thinks it has to do with its own logic and as a result the interrupt
>> flag is reset and no exception is thrown.
>>
>> Reading the Thread.interrupt javadocs it says that wait(), join() and
>> sleep() will all reset the interrupt flag throw an InterruptedException.
>> This makes TinkerPop's reliance on the flag being set somewhat fragile.
>> All of those methods I suspect are common with database io code and
>> TinkerPop being a high level database layer makes it susceptible to 3rd
>> party interpretations of interrupt semantics.
>>
>> In some ways the TraversalInterruptionTest itself had to carefully reset
>> the flag with its usage of Thread.sleep().
>>
>> My proposal is to mark the traversal itself as interrupted rather than
>> the thread and keep the logic contained to TinkerPop's space.
>>
>> Another benefit is that the traversal.interrupt() can raise an event
>> that implementations can listen to. On receipt of the event they would
>> then be able to send a separate request to the database to cancel a
>> particular query. In my case would be a nice way for Sqlg to tell
>> Postgresql or HSQLDB to cancel a particular query (the latest one the
>> traversal executed).
>>
>> In many ways the semantics are the same. Currently for client code
>> wanting to interrupt a particular traversal it needs to have a reference
>> to the thread the traversal is executing in. Now instead it needs to
>> keep a reference to executing traversals and interrupt them directly.
>>
>> Add Traversal.interrupt() and Traversal.isInterrupted(boolean
>> ClearInterrupted)

Re: [DISCUSS] interrupt

2016-07-21 Thread Stephen Mallette
I don't recall all the issues with doing traversal interruption with a
flag. I suppose it could work in the same way that thread interruption
works now. I will say that I'm hesitant to say that we should change this
on the basis of this being a problem general to databases as we've only
seen in so far in HSQLDB. If it was shown to be a problem in other graphs
i'd be more amplified to see a change. Not sure if any other graph
providers out there can attest to a problem with the thread interruption
approach but it would be nice to hear so if there did.

Of course, I think you alluded to the bigger problem, which is that Gremlin
Server uses thread interruption to kill script executions and iterations
that exceed timeouts. So, the problem there is that, if someone submits a
script like this:

t = g.V()
x = t.toList()

that script gets pushed into a ScriptEngine.eval() method. That method
blocks until it is complete. Under that situation, Gremlin Server doesn't
have access to the Traversal to call interrupt on it. "t" is iterating via
toList() and there is no way to stop it. Not sure what we could do about
situations like that.

On Wed, Jul 20, 2016 at 4:00 PM, pieter-gmail 
wrote:

> The current interrupt implementation is failing on Sqlg's HSQLDB
> implementation.
> The reason for this is that HSQLDB itself relies on Thread.interrupt()
> for its own internal logic. When TinkerPop interrupts the thread it
> thinks it has to do with its own logic and as a result the interrupt
> flag is reset and no exception is thrown.
>
> Reading the Thread.interrupt javadocs it says that wait(), join() and
> sleep() will all reset the interrupt flag throw an InterruptedException.
> This makes TinkerPop's reliance on the flag being set somewhat fragile.
> All of those methods I suspect are common with database io code and
> TinkerPop being a high level database layer makes it susceptible to 3rd
> party interpretations of interrupt semantics.
>
> In some ways the TraversalInterruptionTest itself had to carefully reset
> the flag with its usage of Thread.sleep().
>
> My proposal is to mark the traversal itself as interrupted rather than
> the thread and keep the logic contained to TinkerPop's space.
>
> Another benefit is that the traversal.interrupt() can raise an event
> that implementations can listen to. On receipt of the event they would
> then be able to send a separate request to the database to cancel a
> particular query. In my case would be a nice way for Sqlg to tell
> Postgresql or HSQLDB to cancel a particular query (the latest one the
> traversal executed).
>
> In many ways the semantics are the same. Currently for client code
> wanting to interrupt a particular traversal it needs to have a reference
> to the thread the traversal is executing in. Now instead it needs to
> keep a reference to executing traversals and interrupt them directly.
>
> Add Traversal.interrupt() and Traversal.isInterrupted(boolean
> ClearInterrupted)
>
> Caveat, I am not familiar with GremlinServer nor the complications
> around interrupt there so perhaps I am missing something.
>
> Thanks
> Pieter
>