Re: [DISCUSS] Converging the worker pool was: [DISCUSS] Remote Transactions

Divij Vaidya Thu, 18 Mar 2021 09:37:17 -0700

>
> I suppose the
> question is how would we ensure that each request for a session ends up
> being executed on the same thread from the previous request if jobs are
> randomly submitted to a worker pool?



I haven't thought through the details, but on top of my head, we would have
to maintain some request<->thread mapping on the server. This mapping is
also a building block for a request cancellation feature in future where a
client would be able to send a cancellation request to the server, the
server will map the request to a thread executing that request and then set
an interrupt on that thread to signify cancellation.

Divij Vaidya



On Thu, Mar 18, 2021 at 5:00 PM Stephen Mallette <spmalle...@gmail.com>
wrote:

> I started a fresh thread for this topic Divij brought up, with more
> context:
>
> > In a scenario where we have both
> > session-less and sesion-ful requests being made to the server, the
> resource
> > allocation will not be fair and may lead to starvation for some requests.
> > This problem exists even today, hence not totally correlated to what you
> > are proposing but I am afraid a wider adoption of explicit
> > transactions will bring this problem to the spotlight. Hence, we should
> fix
> > this problem first. A solution would entail converging the worker pool
> for
> > both session vs session-less requests.
>
> I'm not sure we can get that done in 3.5.0, but maybe? I suppose the
> question is how would we ensure that each request for a session ends up
> being executed on the same thread from the previous request if jobs are
> randomly submitted to a worker pool?
>
>
> On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <divijvaidy...@gmail.com>
> wrote:
>
> > Great idea Stephen. I agree with standardizing the explicit transaction
> > semantics in cases of remote server (implicit aka sessionless is already
> > self explanatory) and unifying the syntax with embedded graph usage.
> >
> > Couple of notes:
> >
> > 1. I would favor the begin() instead of create() as it's closer to
> > universally prominent SQL transaction syntax. This would reduce the
> > learning curve for users of TP.
> >
> > 2. From an implementation standpoint, sessionless requests are queued on
> > the server side and are serviced by the worker thread pool. But a
> > session-ful aka explicit transaction aka managed transaction starts a new
> > single worked thread pool every time. In a scenario where we have both
> > session-less and sesion-ful requests being made to the server, the
> resource
> > allocation will not be fair and may lead to starvation for some requests.
> > This problem exists even today, hence not totally correlated to what you
> > are proposing but I am afraid a wider adoption of explicit
> > transactions will bring this problem to the spotlight. Hence, we should
> fix
> > this problem first. A solution would entail converging the worker pool
> for
> > both session vs session-less requests.
> >
> > 3. You are proposing the idea of having a transaction scoped traversal
> > object. I agree with it but we need more clarification in behavior wrt
> the
> > following scenarios:
> >
> > Q. What happens when g.tx().commit() is called on a transaction scoped
> > traversal object? Does the traversal get closed?
> > Q. Currently, the same traversal object could be used to execute multiple
> > session-less requests simultaneously in a thread safe manner. Does the
> same
> > behaviour apply to transaction scoped traversal? If not, then how do I
> send
> > multiple requests in parallel from the client all scoped to the same
> > transaction on the server? Note that this is different from case of multi
> > threaded transactions because on the server all requests are scoped to
> > single transaction i.e. single thread but on the client they may be
> > submitted via multiple threads.
> >
> > Regards,
> > Divij Vaidya
> >
> >
> >
> > On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <spmalle...@gmail.com>
> > wrote:
> >
> > > The Transaction object is a bit important with remote cases because it
> > > holds the reference to the session. With embedded use cases we
> generally
> > > adhere to ThreadLocal transactions so the Transaction implementation in
> > > that case is more of a controller for the current thread but for remote
> > > cases the Transaction implementation holds a bit of state that can
> cross
> > > over threads. In some ways, that makes remote cases feel like a
> "threaded
> > > transaction" so that may be familiar to users?? Here's some example
> > > syntax I currently have working in a test case:
> > >
> > > g = traversal().withRemote(conn)
> > > gtx = g.tx().create()
> > > assert gtx.isOpen() == true
> > > gtx.addV('person').iterate()
> > > gtx.addV('software').iterate()
> > > gtx.commit() // alternatively you could explicitly rollback()
> > > assert gtx.isOpen() == false
> > >
> > > I hope that documentation changes are enough to unify the syntax and
> > remove
> > > confusion despite there still being ThreadLocal transactions as a
> default
> > > for embedded cases and something else for remote. At least they will
> look
> > > the same, even if you technically don't need to do a g.tx().create()
> for
> > > embedded transactions and can just have the control you always had with
> > > g.tx().commit() directly.
> > >
> > > Note that the remote Transaction object can support configuration via
> > > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the default
> > of
> > > "commit on close" to rollback or manual (or i suppose something
> custom).
> > > It's nice that this piece works. I don't see a point in supporting
> > > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in
> remote
> > > contexts.
> > >
> > > Another point is that I've added what I termed a "Graph Operation"
> > bytecode
> > > instances which is bytecode that isn't related to traversal
> > construction. I
> > > hope this isn't one of those things where we end up wanting to
> deprecate
> > an
> > > idea as fast as we added it but we needed some way to pass
> > commit/rollback
> > > commands and they aren't really part of traversal construction. They
> are
> > > operations that execute on the graph itself and I couldn't think of how
> > to
> > > treat them as traversals nicely, so they sorta just exist as a one off.
> > > Perhaps they will grow in number?? Folks have always asked if bytecode
> > > requests could get "GraphFeature" information - I suppose this could
> be a
> > > way to do that??
> > >
> > > Anyway, I will keep going down this general path as it's appearing
> > > relatively fruitful.
> > >
> > >
> > >
> > >
> > > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <spmalle...@gmail.com
> >
> > > wrote:
> > >
> > > > > We should just communicate clearly that a simple remote traversal
> > > > already uses a transaction by default,
> > > >
> > > > That's a good point because even with this change we still have a
> > > > situation where a single iterated remote traversal will generally
> mean
> > a
> > > > server managed commit/rollback, but in embedded mode, we have an open
> > > > transaction (for transaction enabled graphs - TinkerGraph will behave
> > > more
> > > > like a remote traversal, so more confusion there i guess). I'm not
> sure
> > > how
> > > > to rectify that at this time except by way of documentation.
> > > >
> > > > The only thing I can think of is that the default for embedded would
> > have
> > > > to be auto-commit per traversal. In that way it would work like
> remote
> > > > traversals and for graphs that don't support transactions like
> > > TinkerGraph.
> > > > Of course, that's the kind of change that will break a lot of code.
> > Maybe
> > > > we just keep that idea for another version.
> > > >
> > > > On Thu, Mar 18, 2021 at 4:21 AM <f...@florian-hockmann.de> wrote:
> > > >
> > > >> I like the idea of adding transactions for remote traversals as they
> > > >> close a gap in functionality that we currently have for remote
> > > traversals.
> > > >> We should just communicate clearly that a simple remote traversal
> > > already
> > > >> uses a transaction by default, meaning that the server will roll
> back
> > > any
> > > >> changes if any exception occurs. Users often ask about how to do
> > > >> transactions with a remote traversal because they don't know about
> > this
> > > and
> > > >> I'm afraid that we might add even more confusion if we now add
> > > transactions
> > > >> for remote traversals. Hence why I think that we should communicate
> > this
> > > >> clearly when we add remote transactions.
> > > >>
> > > >> Getting this to work in the GLVs should be possible but will require
> > > some
> > > >> effort. I think we would have to introduce some
> > > >> "DriverRemoteTransactionTraversal" that doesn't submit the traversal
> > on
> > > a
> > > >> terminal step but saves it and then submits all saved traversals
> > > together
> > > >> on close().
> > > >> This also means that we should directly add an async version of
> > close(),
> > > >> maybe closeAsync()? (Same for commit() and rollback())
> > > >>
> > > >> I also like create() better than traversal() because it would
> confuse
> > me
> > > >> to first start a traversal with traversal() and then also start a
> > > >> transaction with the same method.
> > > >>
> > > >> -----Ursprüngliche Nachricht-----
> > > >> Von: Stephen Mallette <spmalle...@gmail.com>
> > > >> Gesendet: Donnerstag, 18. März 2021 00:01
> > > >> An: dev@tinkerpop.apache.org
> > > >> Betreff: Re: [DSISCUSS] Remote Transactions
> > > >>
> > > >> One thing I should have noted about tx().create(). The create() very
> > > well
> > > >> could have been named traversal(), thus the previous example reading
> > as:
> > > >>
> > > >> g = traversal().withEmbedded(graph)
> > > >> // or
> > > >> g = traversal().withRemote(conn)
> > > >> gtx = g.tx().traversal()
> > > >> gtx.addV('person').iterate()
> > > >> gtx.addV('software').iterate()
> > > >> gtx.close() // alternatively you could explicitly commit() or
> > rollback()
> > > >>
> > > >> You're basically spawning a new GraphTraversalSource from the
> > > Transaction
> > > >> object rather than from a Graph or AnonymousTraversal. I chose
> > create()
> > > >> because it felt like this looked weird:
> > > >>
> > > >> g = traversal().withRemote(conn).tx().traversal()
> > > >>
> > > >> which would be a weird thing to do I guess, but just seeing
> > > "traversal()"
> > > >> over and over seemed odd looking and I wanted to differentiate with
> > the
> > > >> Transaction object even though the methods are identical in what
> they
> > > do. I
> > > >> suppose I also drew inspiration from:
> > > >>
> > > >> Transaction.createdThreadedTx()
> > > >>
> > > >> which I think we might consider deprecating. JanusGraph would simply
> > > >> expose their own Transaction object in the future that has that
> method
> > > as I
> > > >> imagine folks still use that feature. As far as I know, no other
> graph
> > > >> implements that functionality and my guess is that no one will
> likely
> > > do so
> > > >> in the future.
> > > >>
> > > >> anyway, if you like traversal() better than create() or the other
> way
> > > >> around, please let me know
> > > >>
> > > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <
> d...@bechberger.com
> > >
> > > >> wrote:
> > > >>
> > > >> > I am in favor of this change as I any idea that unifies the
> multiple
> > > >> > different ways things work in TP will only make it easier to learn
> > and
> > > >> > adopt.
> > > >> >
> > > >> > I don't know that I have enough knowledge of the inner workings of
> > > >> > transactions to know what/if this will cause problems.
> > > >> >
> > > >> > Dave
> > > >> >
> > > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
> > > >> > <spmalle...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > We haven't touched "transactions" since TP3 was originally
> > designed.
> > > >> > > They have remained a feature for embedded use cases even in the
> > face
> > > >> > > of the
> > > >> > rise
> > > >> > > of remote graph use cases and result in major inconsistencies
> that
> > > >> > > really bother users.
> > > >> > >
> > > >> > > As we close on 3.5.0, I figured that perhaps there was a chance
> to
> > > >> > > do something radical about transactions. Basically, I'd like
> > > >> > > transactions to work irrespective of remote or embedded usage
> and
> > > >> > > for them to have the
> > > >> > same
> > > >> > > API when doing so. In mulling it over for the last day or so I
> > had a
> > > >> > > realization that makes me believe that the following is
> possible:
> > > >> > >
> > > >> > > g = traversal().withEmbedded(graph)
> > > >> > > // or
> > > >> > > g = traversal().withRemote(conn)
> > > >> > > gtx = g.tx().create()
> > > >> > > gtx.addV('person').iterate()
> > > >> > > gtx.addV('software').iterate()
> > > >> > > gtx.close() // alternatively you could explicitly commit() or
> > > >> > > rollback()
> > > >> > >
> > > >> > > // you could still use g for sessionless, but gtx is "done"
> after
> > //
> > > >> > > you close so you will need to create() a new gtx instance for a
> //
> > > >> > > fresh transaction assert 2 == g.V().count().next()
> > > >> > >
> > > >> > > Note that the create() method on tx() is the new bit of
> necessary
> > > >> syntax.
> > > >> > > Technically, it is a do-nothing for embedded mode (and you could
> > > >> > > skip it for thread-local auto-transactions) but all the
> > > >> > > documentation can be shifted so that the API is identical to
> > remote.
> > > >> > > The change would be non-breaking as the embedded transaction
> > > >> > > approach would remain as it is, but would no longer be
> documented
> > as
> > > >> > > the preferred approach. Perhaps we could one day disallow it but
> > > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure we want
> to
> > > >> touch in TP3.
> > > >> > >
> > > >> > > What happens behind the scenes of g.tx().create() is that in
> > remote
> > > >> > > cases the RemoteConnection constructs a remote Transaction
> > > >> > > implementation which basically is used to spawn new traversal
> > source
> > > >> > > instances with a session based connections. The remote
> Transaction
> > > >> > > object can't support
> > > >> > transaction
> > > >> > > listeners or special read-write/close configurations, but I
> don't
> > > >> > > think that's a problem as those things don't make a lot of sense
> > in
> > > >> > > remote use cases.
> > > >> > >
> > > >> > > From the server perspective, this change would also mean that
> > > >> > > sessions would have to accept bytecode and not just scripts,
> which
> > > >> > > technically shouldn't be a problem.
> > > >> > >
> > > >> > > One downside at the moment is that i'm thinking mostly in Java
> at
> > > >> > > the moment. I'm not sure how this all works in other variants
> just
> > > >> > > yet, but
> > > >> > I'd
> > > >> > > hope to keep similar syntax.
> > > >> > >
> > > >> > > I will keep experimenting tomorrow. If you have any thoughts on
> > the
> > > >> > matter,
> > > >> > > I'd be happy to hear them. Thanks!
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > >
> >
>

Re: [DISCUSS] Converging the worker pool was: [DISCUSS] Remote Transactions

Reply via email to