Hi,

This is how I understand it,

TinkerPop itself does not store any data, so all providers make the
persisted data available to TinkerPop's via custom implementations of
steps.

In Sqlg's case the data resides in a RDBMs. As things stands Sqlg uses
standard JDBC drivers which are standard blocking IO calls. However
even if Sqlg were to start using some new non blocking JDBC driver,
TinkerPop's own implementation of its `steps` is not reactive and
requires a blocking call to the underlying data store.

TinkerPop's design has since version 1 been a chaining of iterators. I
suspect this core design would have to change to make it non
blocking/reactive.

Regards
Pieter

On Fri, 2022-07-29 at 09:34 +0200, Divij Vaidya wrote:
> Hey folks
> 
> Interesting discussion! 
> 
> I am a bit confused though since I believe we already have async
> execution implemented in TinkerPop Java client. Let me try to clarify
> and please let me know if I missed something.
> 
> Java client uses a small number of websocket connections to multiplex
> multiple queries to the server. You can think of it as a pipe
> established to the server on which we could send messages belonging
> to different queries. On the server, these messages are queued until
> one of the execution threads can pick it up. Once a request is picked
> for execution, the results are returned in a pipelines/streaming
> manner i.e. the server calculates a batch of results (size of batch
> is configurable per query), and sends the results as messages on the
> same WebSocket channel. On the client size, these results are stored
> in a queue until the application thread consumes them uses an
> iterator. This model of execution *does not block the application
> thread* and hence, provides async capabilities. 
> 
> A sample code to achieve this would be as follows:
> 
> ```
> final Cluster cluster = Cluster.build("localhost")
>                                               .port(8182)
>                                              
> .maxInProcessPerConnection(32)
>                                              
> .maxSimultaneousUsagePerConnection(32)
>                                              
> .serializer(Serializers.GRAPHBINARY_V1D0)
>                                               .create();
> 
> try {
>           final GraphTraversalSource g =
> traversal().withRemote(DriverRemoteConnection.using(cluster));
>           CompletableFuture<List<Object>> result = g.V().has("name",
> "pumba").out("friendOf").id().promise(Traversal::toList);
> 
>           // do some application layer stuff
>           // ...
>           // ...
>           // ...
> 
>           List<Object> verticesWithNamePumba = result.join();
>           System.out.println(verticesWithNamePumba);
> } finally {
>           cluster.close();
> }
> ```
> 
> Note that, in the above example, the thread executing the above code
> is not blocked until we call "result.join()".
> 
> Does this address the use that Oleksandr brought up at the beginning
> of this thread?
> 
> --
> Divij Vaidya
> 
> 
> 
> On Fri, Jul 29, 2022 at 4:05 AM Oleksandr Porunov
> <alexandr.poru...@gmail.com> wrote:
> > Hmm, that's interesting! Thank you Joshua for the idea!
> > So, I guess the general idea here could be:
> > we can start small and start implementing async functionality for
> > some
> > parts instead of implement async functionality for everything
> > straightaway.
> > 
> > Oleksandr
> > 
> > On Fri, Jul 29, 2022, 00:38 Joshua Shinavier <j...@fortytwo.net>
> > wrote:
> > 
> > > Well, the wrapper I mentioned before did not require a full
> > rewrite of
> > > TinkerPop :-) Rather, it provided async interfaces for vertices
> > and edges,
> > > on which operations like subgraph and shortest paths queries were
> > evaluated
> > > in an asynchronous fashion (using a special language, as it
> > happened, but
> > > limited Gremlin queries would have been an option). So I think a
> > basic
> > > async API might be a useful starting point even if it doesn't go
> > very deep.
> > >
> > > Josh
> > >
> > >
> > > On Thu, Jul 28, 2022 at 4:21 PM Oleksandr Porunov <
> > > alexandr.poru...@gmail.com> wrote:
> > >
> > >> Hi Joshua and Pieter,
> > >>
> > >> Thank you for joining the conversation!
> > >>
> > >> I didn't actually look into the implementation details yet but
> > quickly
> > >> checking Traversal.java code I think Pieter is right here.
> > >> For some reason I thought we could simply wrap synchronous
> > method in
> > >> asynchronous, basically something like:
> > >>
> > >> // the method which should be implemented by a graph provider
> > >>
> > >> Future<E> executeAsync(Callable<E> func);
> > >>
> > >> public default Future<E> asyncNext(){
> > >>     return executeAsync(this::next);
> > >> }
> > >>
> > >> but checking that code I think I was wrong about it. Different
> > steps may
> > >> execute different logic (i.e. different underlying storage
> > queries) for
> > >> different graph providers.
> > >> Thus, wrapping only terminal steps into async functions won't
> > solve the
> > >> problem most likely.
> > >>
> > >> I guess it will require re-writing or extending all steps to be
> > able to
> > >> pass an async state instead of a sync state.
> > >>
> > >> I'm not familiar enough with the TinkerPop code yet to claim
> > that, so
> > >> probably I could be wrong.
> > >> I will need to research it a bit more to find out but I think
> > that Pieter
> > >> is most likely right about a massive re-write.
> > >>
> > >> Nevertheless, even if that requires massive re-write, I would be
> > eager to
> > >> start the ball rolling.
> > >> I think we either need to try to implement async execution in
> > TinkerPop 3
> > >> or start making some concrete decisions regarding TinkerPop 4.
> > >>
> > >> I see Marko A. Rodriguez started to work on RxJava back in 2019
> > here
> > >>
> > https://github.com/apache/tinkerpop/tree/4.0-dev/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava
> > >>
> > >> but the process didn't go as far as I understand. I guess it
> > would be
> > >> good to know if we want to completely rewrite TinkerPop in
> > version 4 or not.
> > >>
> > >> If we want to completely rewrite TinkerPop in version 4 then I
> > assume it
> > >> may take quite some time to do so. In this case I would be more
> > likely to
> > >> say that it's better to implement async functionality in
> > TinkerPop 3 even
> > >> if it requires rewriting all steps.
> > >>
> > >> In case TinkerPop 4 is a redevelopment with breaking changes but
> > without
> > >> starting to rewrite the whole functionality then I guess we
> > could try to
> > >> work on TinkerPop 4 by introducing async functionality and maybe
> > applying
> > >> more breaking changes in places where it's better to re-work
> > some parts.
> > >>
> > >> Best regards,
> > >> Oleksandr
> > >>
> > >>
> > >> On Thu, Jul 28, 2022 at 7:47 PM pieter gmail
> > <pieter.mar...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> Does this not imply a massive rewrite of TinkerPop? In
> > particular the
> > >>> iterator chaining pattern of steps should follow a reactive
> > style of
> > >>> coding?
> > >>>
> > >>> Cheers
> > >>> Pieter
> > >>>
> > >>>
> > >>> On Thu, 2022-07-28 at 15:18 +0100, Oleksandr Porunov wrote:
> > >>> > I'm interested in adding async capabilities to TinkerPop.
> > >>> >
> > >>> > There were many discussions about async capabilities for
> > TinkerPop
> > >>> > but
> > >>> > there was no clear consensus on how and when it should be
> > developed.
> > >>> >
> > >>> > The benefit for async capabilities is that the user calling a
> > query
> > >>> > shouldn't need its thread to be blocked to simply wait for
> > the result
> > >>> > of
> > >>> > the query execution. Instead of that a graph provider should
> > take
> > >>> > care
> > >>> > about implementation of async queries execution.
> > >>> > If that's the case then many graph providers will be able to
> > optimize
> > >>> > their
> > >>> > execution of async queries by handling less resources for the
> > query
> > >>> > execution.
> > >>> > As a real example of potential benefit we could get I would
> > like to
> > >>> > point
> > >>> > on how JanusGraph executes CQL queries to process Gremlin
> > queries.
> > >>> > CQL result retrieval:
> > >>> >
> > >>>
> > https://github.com/JanusGraph/janusgraph/blob/15a00b7938052274fe15cf26025168299a311224/janusgraph-cql/src/main/java/org/janusgraph/diskstorage/cql/function/slice/CQLSimpleSliceFunction.java#L45
> > >>> >
> > >>> > As seen from the code above, JanusGraph already leverages
> > async
> > >>> > functionality for CQL queries under the hood but JanusGraph
> > is
> > >>> > required to
> > >>> > process those queries in synced manner, so what JanusGraph
> > does - it
> > >>> > simply
> > >>> > blocks the whole executing thread until result is returned
> > instead of
> > >>> > using
> > >>> > async execution.
> > >>> >
> > >>> > Of course, that's just a case when we can benefit from async
> > >>> > execution
> > >>> > because the underneath storage backend can process async
> > queries. If
> > >>> > a
> > >>> > storage backend can't process async queries then we won't get
> > any
> > >>> > benefit
> > >>> > from implementing a fake async executor.
> > >>> >
> > >>> > That said, I believe quite a few graph providers may benefit
> > from
> > >>> > having a
> > >>> > possibility to execute queries in async fashion because they
> > can
> > >>> > optimize
> > >>> > their resource utilization.
> > >>> > I believe that we could have a feature flag for storage
> > providers
> > >>> > which
> > >>> > want to implement async execution. Those who can't implement
> > it or
> > >>> > don't
> > >>> > want to implement it may simply disable async capabilities
> > which will
> > >>> > result in throwing an exception anytime an async function is
> > called.
> > >>> > I
> > >>> > think it should be fine because we already have some feature
> > flags
> > >>> > like
> > >>> > that for graph providers. For example "Null Semantics" was
> > added in
> > >>> > TinkerPop 3.5.0 but `null` is not supported for all graph
> > providers.
> > >>> > Thus,
> > >>> > a feature flag for Null Semantics exists like
> > >>> >
> > "g.getGraph().features().vertex().supportsNullPropertyValues()".
> > >>> > I believe we can enable async in TinkerPop 3 by providing
> > async as a
> > >>> > feature flag and letting graph providers implement it at
> > their will.
> > >>> > Moreover if a graph provider wants to have async capabilities
> > but
> > >>> > their
> > >>> > storage backends don't support async capabilities then it
> > should be
> > >>> > easy to
> > >>> > hide async execution under an ExecutorService which mimics
> > async
> > >>> > execution.
> > >>> > I believe we could do that for TinkerGraph so that users
> > could
> > >>> > experiment
> > >>> > with async API at least. I believe we could simply have a
> > default
> > >>> > "async"
> > >>> > function implementation for TinkerGraph which wraps all sync
> > >>> > executions in
> > >>> > a function and sends it to that ExecutorService (we can
> > discuss which
> > >>> > one).
> > >>> > In such a case TinkerGraph will support async execution even
> > without
> > >>> > real
> > >>> > async functionality. We could also potentially provide some
> > >>> > configuration
> > >>> > options to TinkerGraph to configure thread pool size,
> > executor
> > >>> > service
> > >>> > implementation, etc.
> > >>> >
> > >>> > I didn't think about how it is better to implement those
> > async
> > >>> > capabilities
> > >>> > for TinkerPop yet but I think reusing a similar approach like
> > in
> > >>> > Node.js
> > >>> > which returns Promise when calling Terminal steps could be
> > good. For
> > >>> > example, we could have a method called `async` which accepts
> > a
> > >>> > termination
> > >>> > step and returns a necessary Future object.
> > >>> > I.e.:
> > >>> > g.V(123).async(Traversal.next())
> > >>> > g.V().async(Traversal.toList())
> > >>> > g.E().async(Traversal.toSet())
> > >>> > g.E().async(Traversal.iterate())
> > >>> >
> > >>> > I know that there were discussions about adding async
> > functionality
> > >>> > to
> > >>> > TinkerPop 4 eventually, but I don't see strong reasons why we
> > >>> > couldn't add
> > >>> > async functionality to TinkerPop 3 with a feature flag.
> > >>> > It would be really great to hear some thoughts and concerns
> > about it.
> > >>> >
> > >>> > If there are no concerns, I'd like to develop a proposal for
> > further
> > >>> > discussion.
> > >>> >
> > >>> > Best regards,
> > >>> > Oleksandr Porunov
> > >>>
> > >>>

Reply via email to