Re: Async capabilities to TinkerPop

pieter gmail Fri, 29 Jul 2022 08:28:12 -0700

Does frameworks like reactor not resolve this issue with back pressure
and other sexy tricks?


Cheers
Pieter

On Fri, 2022-07-29 at 13:51 +0100, Oleksandr Porunov wrote:
> I'm also not sure but for some reason I feel that we may need some
> event loop to be implemented if we want to re-write it to async
> capabilities.
> The reason I'm telling it is because I feel that in some situations
> we may trigger a very long call stack.
> I.e.:
> Promise -> Promise -> Promise -> .... 
> 
> I guess that could be in a situation when the next part of the
> execution depends on the previous part of the execution. For example,
> 
> g.V().has("hello", "world").barrier(50).limit(5)
> 
> So, let's assume the next things about the execution of the above
> query:
> - It is executed in JanusGraph with batch query enabled (graph
> provider specific, but it's easier for me to focus on a concrete
> implementation)
> - There is no necessary index for that property (again, graph
> provider specific)
> - There are only 4 vertices with such property
> - We have 50 million vertices in total
> 
> If the above facts are true then the query will be executed like the
> following:
> 1) get first (or next) 50 vertices
> 2) filter out unmatched vertices
> 3) if the limit is not reached 5 then process the next vertices by
> starting from step "1" again. Otherwise, return data.
>  
> So, in fact, with the above scenario we will traverse all 50 million
> vertices in the graph which will result in about 1 million chain
> calls for promises based implementation. That will probably result in
> StackOverflowException.
> With synchronous code we don't have these problems because we don't
> call a new function recursively each time we need to retrieve part of
> the data.
> We can overcome the above issue by implementing some kind of event
> loop where we put all the results and then a single thread running
> that loop will call necessary async functions.
> If so, we will never have a long chain of calls. The only thing in
> such architecture is that it will require all those functions to be
> non-blocking and thus all providers will probably need to implement
> their async functionality 
> (doesn't matter if it is real async or "fake" async behind a thread
> pool).
> 
> That's just something on top of my mind, but I could be wrong and
> could miss some points. So, maybe we don't need any event loop, I
> just didn't investigate it enough yet.
> 
> Oleksandr
> 
> 
> On Fri, Jul 29, 2022 at 1:09 PM Oleksandr Porunov
> <alexandr.poru...@gmail.com> wrote:
> > Hi Divij,
> > 
> > Thanks for joining the conversation!
> > 
> > Basically the `promise` step is for remote queries execution only
> > which is seen from the `promis` method "throw new
> > IllegalStateException("Only traversals created using withRemote()
> > can be used in an async way");".
> > Under the hood Gremlin Server will be doing sync execution still
> > even if the graph provider can support async query execution.
> > The Gremlin Server will need to allocate 200 threads if we are
> > trying to execute 200 parallel queries. 
> > g.V().has("name", "hello").out("world") - this query usually will
> > first try to find vertex id for `hello` name and block the thread
> > until it is processed. After that it will fire another query to the
> > storage backend to retrieve all adjacent vertices following the
> > "world" edge and block the executing thread until we receive a
> > response from the database. After that it will return data back to
> > the client. 
> > Instead of that I wanted to propose a different execution strategy.
> > Let's fire the first query and instead of waiting for the response
> > let's say when the response is available, please execute another
> > step - which is get all adjacent vertices following the "world"
> > edge. At that step we are doing the same, we are firing a database
> > query and instead of waiting for the response we say, when the
> > response is available, please execute another step (which is kind
> > of a final step, which simply returns data to the client by
> > executing a necessary function).
> > I guess something like this would help to not allocate unnecessary
> > threads which simply wait for the response from the underlying
> > storage backends.
> > 
> > Oleksandr
> > 
> > On Fri, Jul 29, 2022 at 8:34 AM Divij Vaidya
> > <divijvaidy...@gmail.com> wrote:
> > > Hey folks
> > > 
> > > Interesting discussion! 
> > > 
> > > I am a bit confused though since I believe we already have async
> > > execution implemented in TinkerPop Java client. Let me try to
> > > clarify and please let me know if I missed something.
> > > 
> > > Java client uses a small number of websocket connections to
> > > multiplex multiple queries to the server. You can think of it as
> > > a pipe established to the server on which we could send messages
> > > belonging to different queries. On the server, these messages are
> > > queued until one of the execution threads can pick it up. Once a
> > > request is picked for execution, the results are returned in a
> > > pipelines/streaming manner i.e. the server calculates a batch of
> > > results (size of batch is configurable per query), and sends the
> > > results as messages on the same WebSocket channel. On the client
> > > size, these results are stored in a queue until the application
> > > thread consumes them uses an iterator. This model of execution
> > > *does not block the application thread* and hence, provides async
> > > capabilities. 
> > > 
> > > A sample code to achieve this would be as follows:
> > > 
> > > ```
> > > final Cluster cluster = Cluster.build("localhost")
> > >                                               .port(8182)
> > >                                              
> > > .maxInProcessPerConnection(32)
> > >                                              
> > > .maxSimultaneousUsagePerConnection(32)
> > >                                              
> > > .serializer(Serializers.GRAPHBINARY_V1D0)
> > >                                               .create();
> > > 
> > > try {
> > >           final GraphTraversalSource g =
> > > traversal().withRemote(DriverRemoteConnection.using(cluster));
> > >           CompletableFuture<List<Object>> result =
> > > g.V().has("name",
> > > "pumba").out("friendOf").id().promise(Traversal::toList);
> > > 
> > >           // do some application layer stuff
> > >           // ...
> > >           // ...
> > >           // ...
> > > 
> > >           List<Object> verticesWithNamePumba = result.join();
> > >           System.out.println(verticesWithNamePumba);
> > > } finally {
> > >           cluster.close();
> > > }
> > > ```
> > > 
> > > Note that, in the above example, the thread executing the above
> > > code is not blocked until we call "result.join()".
> > > 
> > > Does this address the use that Oleksandr brought up at the
> > > beginning of this thread?
> > > 
> > > --
> > > Divij Vaidya
> > > 
> > > 
> > > 
> > > On Fri, Jul 29, 2022 at 4:05 AM Oleksandr Porunov
> > > <alexandr.poru...@gmail.com> wrote:
> > > > Hmm, that's interesting! Thank you Joshua for the idea!
> > > > So, I guess the general idea here could be:
> > > > we can start small and start implementing async functionality
> > > > for some
> > > > parts instead of implement async functionality for everything
> > > > straightaway.
> > > > 
> > > > Oleksandr
> > > > 
> > > > On Fri, Jul 29, 2022, 00:38 Joshua Shinavier
> > > > <j...@fortytwo.net> wrote:
> > > > 
> > > > > Well, the wrapper I mentioned before did not require a full
> > > > rewrite of
> > > > > TinkerPop :-) Rather, it provided async interfaces for
> > > > vertices and edges,
> > > > > on which operations like subgraph and shortest paths queries
> > > > were evaluated
> > > > > in an asynchronous fashion (using a special language, as it
> > > > happened, but
> > > > > limited Gremlin queries would have been an option). So I
> > > > think a basic
> > > > > async API might be a useful starting point even if it doesn't
> > > > go very deep.
> > > > >
> > > > > Josh
> > > > >
> > > > >
> > > > > On Thu, Jul 28, 2022 at 4:21 PM Oleksandr Porunov <
> > > > > alexandr.poru...@gmail.com> wrote:
> > > > >
> > > > >> Hi Joshua and Pieter,
> > > > >>
> > > > >> Thank you for joining the conversation!
> > > > >>
> > > > >> I didn't actually look into the implementation details yet
> > > > but quickly
> > > > >> checking Traversal.java code I think Pieter is right here.
> > > > >> For some reason I thought we could simply wrap synchronous
> > > > method in
> > > > >> asynchronous, basically something like:
> > > > >>
> > > > >> // the method which should be implemented by a graph
> > > > provider
> > > > >>
> > > > >> Future<E> executeAsync(Callable<E> func);
> > > > >>
> > > > >> public default Future<E> asyncNext(){
> > > > >>     return executeAsync(this::next);
> > > > >> }
> > > > >>
> > > > >> but checking that code I think I was wrong about it.
> > > > Different steps may
> > > > >> execute different logic (i.e. different underlying storage
> > > > queries) for
> > > > >> different graph providers.
> > > > >> Thus, wrapping only terminal steps into async functions
> > > > won't solve the
> > > > >> problem most likely.
> > > > >>
> > > > >> I guess it will require re-writing or extending all steps to
> > > > be able to
> > > > >> pass an async state instead of a sync state.
> > > > >>
> > > > >> I'm not familiar enough with the TinkerPop code yet to claim
> > > > that, so
> > > > >> probably I could be wrong.
> > > > >> I will need to research it a bit more to find out but I
> > > > think that Pieter
> > > > >> is most likely right about a massive re-write.
> > > > >>
> > > > >> Nevertheless, even if that requires massive re-write, I
> > > > would be eager to
> > > > >> start the ball rolling.
> > > > >> I think we either need to try to implement async execution
> > > > in TinkerPop 3
> > > > >> or start making some concrete decisions regarding TinkerPop
> > > > 4.
> > > > >>
> > > > >> I see Marko A. Rodriguez started to work on RxJava back in
> > > > 2019 here
> > > > >>
> > > > https://github.com/apache/tinkerpop/tree/4.0-dev/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava
> > > > >>
> > > > >> but the process didn't go as far as I understand. I guess it
> > > > would be
> > > > >> good to know if we want to completely rewrite TinkerPop in
> > > > version 4 or not.
> > > > >>
> > > > >> If we want to completely rewrite TinkerPop in version 4 then
> > > > I assume it
> > > > >> may take quite some time to do so. In this case I would be
> > > > more likely to
> > > > >> say that it's better to implement async functionality in
> > > > TinkerPop 3 even
> > > > >> if it requires rewriting all steps.
> > > > >>
> > > > >> In case TinkerPop 4 is a redevelopment with breaking changes
> > > > but without
> > > > >> starting to rewrite the whole functionality then I guess we
> > > > could try to
> > > > >> work on TinkerPop 4 by introducing async functionality and
> > > > maybe applying
> > > > >> more breaking changes in places where it's better to re-work
> > > > some parts.
> > > > >>
> > > > >> Best regards,
> > > > >> Oleksandr
> > > > >>
> > > > >>
> > > > >> On Thu, Jul 28, 2022 at 7:47 PM pieter gmail
> > > > <pieter.mar...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Hi,
> > > > >>>
> > > > >>> Does this not imply a massive rewrite of TinkerPop? In
> > > > particular the
> > > > >>> iterator chaining pattern of steps should follow a reactive
> > > > style of
> > > > >>> coding?
> > > > >>>
> > > > >>> Cheers
> > > > >>> Pieter
> > > > >>>
> > > > >>>
> > > > >>> On Thu, 2022-07-28 at 15:18 +0100, Oleksandr Porunov wrote:
> > > > >>> > I'm interested in adding async capabilities to TinkerPop.
> > > > >>> >
> > > > >>> > There were many discussions about async capabilities for
> > > > TinkerPop
> > > > >>> > but
> > > > >>> > there was no clear consensus on how and when it should be
> > > > developed.
> > > > >>> >
> > > > >>> > The benefit for async capabilities is that the user
> > > > calling a query
> > > > >>> > shouldn't need its thread to be blocked to simply wait
> > > > for the result
> > > > >>> > of
> > > > >>> > the query execution. Instead of that a graph provider
> > > > should take
> > > > >>> > care
> > > > >>> > about implementation of async queries execution.
> > > > >>> > If that's the case then many graph providers will be able
> > > > to optimize
> > > > >>> > their
> > > > >>> > execution of async queries by handling less resources for
> > > > the query
> > > > >>> > execution.
> > > > >>> > As a real example of potential benefit we could get I
> > > > would like to
> > > > >>> > point
> > > > >>> > on how JanusGraph executes CQL queries to process Gremlin
> > > > queries.
> > > > >>> > CQL result retrieval:
> > > > >>> >
> > > > >>>
> > > > https://github.com/JanusGraph/janusgraph/blob/15a00b7938052274fe15cf26025168299a311224/janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/function/slice/CQLSimpleSliceFunction.java#L45
> > > > >>> >
> > > > >>> > As seen from the code above, JanusGraph already leverages
> > > > async
> > > > >>> > functionality for CQL queries under the hood but
> > > > JanusGraph is
> > > > >>> > required to
> > > > >>> > process those queries in synced manner, so what
> > > > JanusGraph does - it
> > > > >>> > simply
> > > > >>> > blocks the whole executing thread until result is
> > > > returned instead of
> > > > >>> > using
> > > > >>> > async execution.
> > > > >>> >
> > > > >>> > Of course, that's just a case when we can benefit from
> > > > async
> > > > >>> > execution
> > > > >>> > because the underneath storage backend can process async
> > > > queries. If
> > > > >>> > a
> > > > >>> > storage backend can't process async queries then we won't
> > > > get any
> > > > >>> > benefit
> > > > >>> > from implementing a fake async executor.
> > > > >>> >
> > > > >>> > That said, I believe quite a few graph providers may
> > > > benefit from
> > > > >>> > having a
> > > > >>> > possibility to execute queries in async fashion because
> > > > they can
> > > > >>> > optimize
> > > > >>> > their resource utilization.
> > > > >>> > I believe that we could have a feature flag for storage
> > > > providers
> > > > >>> > which
> > > > >>> > want to implement async execution. Those who can't
> > > > implement it or
> > > > >>> > don't
> > > > >>> > want to implement it may simply disable async
> > > > capabilities which will
> > > > >>> > result in throwing an exception anytime an async function
> > > > is called.
> > > > >>> > I
> > > > >>> > think it should be fine because we already have some
> > > > feature flags
> > > > >>> > like
> > > > >>> > that for graph providers. For example "Null Semantics"
> > > > was added in
> > > > >>> > TinkerPop 3.5.0 but `null` is not supported for all graph
> > > > providers.
> > > > >>> > Thus,
> > > > >>> > a feature flag for Null Semantics exists like
> > > > >>> >
> > > > "g.getGraph().features().vertex().supportsNullPropertyValues()"
> > > > .
> > > > >>> > I believe we can enable async in TinkerPop 3 by providing
> > > > async as a
> > > > >>> > feature flag and letting graph providers implement it at
> > > > their will.
> > > > >>> > Moreover if a graph provider wants to have async
> > > > capabilities but
> > > > >>> > their
> > > > >>> > storage backends don't support async capabilities then it
> > > > should be
> > > > >>> > easy to
> > > > >>> > hide async execution under an ExecutorService which
> > > > mimics async
> > > > >>> > execution.
> > > > >>> > I believe we could do that for TinkerGraph so that users
> > > > could
> > > > >>> > experiment
> > > > >>> > with async API at least. I believe we could simply have a
> > > > default
> > > > >>> > "async"
> > > > >>> > function implementation for TinkerGraph which wraps all
> > > > sync
> > > > >>> > executions in
> > > > >>> > a function and sends it to that ExecutorService (we can
> > > > discuss which
> > > > >>> > one).
> > > > >>> > In such a case TinkerGraph will support async execution
> > > > even without
> > > > >>> > real
> > > > >>> > async functionality. We could also potentially provide
> > > > some
> > > > >>> > configuration
> > > > >>> > options to TinkerGraph to configure thread pool size,
> > > > executor
> > > > >>> > service
> > > > >>> > implementation, etc.
> > > > >>> >
> > > > >>> > I didn't think about how it is better to implement those
> > > > async
> > > > >>> > capabilities
> > > > >>> > for TinkerPop yet but I think reusing a similar approach
> > > > like in
> > > > >>> > Node.js
> > > > >>> > which returns Promise when calling Terminal steps could
> > > > be good. For
> > > > >>> > example, we could have a method called `async` which
> > > > accepts a
> > > > >>> > termination
> > > > >>> > step and returns a necessary Future object.
> > > > >>> > I.e.:
> > > > >>> > g.V(123).async(Traversal.next())
> > > > >>> > g.V().async(Traversal.toList())
> > > > >>> > g.E().async(Traversal.toSet())
> > > > >>> > g.E().async(Traversal.iterate())
> > > > >>> >
> > > > >>> > I know that there were discussions about adding async
> > > > functionality
> > > > >>> > to
> > > > >>> > TinkerPop 4 eventually, but I don't see strong reasons
> > > > why we
> > > > >>> > couldn't add
> > > > >>> > async functionality to TinkerPop 3 with a feature flag.
> > > > >>> > It would be really great to hear some thoughts and
> > > > concerns about it.
> > > > >>> >
> > > > >>> > If there are no concerns, I'd like to develop a proposal
> > > > for further
> > > > >>> > discussion.
> > > > >>> >
> > > > >>> > Best regards,
> > > > >>> > Oleksandr Porunov
> > > > >>>
> > > > >>>

Re: Async capabilities to TinkerPop

Reply via email to