Tom Kolanko created TINKERPOP-2679:
--------------------------------------

             Summary: Update JavaScript driver to support processing messages 
as a stream
                 Key: TINKERPOP-2679
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2679
             Project: TinkerPop
          Issue Type: Improvement
          Components: javascript
    Affects Versions: 3.5.1
            Reporter: Tom Kolanko
             Fix For: 3.5.2


The JavaScript driver's 
[_handleMessage|https://github.com/apache/tinkerpop/blob/d4bd5cc5a228fc22442101ccb6a9751653900d32/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/connection.js#L249]
 receives messages from the gremlin server and stores each message in an object 
associated with the handler for the specific request. Currently, the driver 
waits until all the data is available from the gremlin server before allowing 
further processing of it.

The following examples assume that you have 100 vertices in your graph.

{code:javascript}
const result = await client.submit("g.V()")
console.log(result.toArray()) // 100 - all the vertices in your graph
{code}

However, this can lead to cases where a lot of memory is required to hold onto 
the results before any processing can take place. If we had the abilty to 
process results as they come in from the gremlin server we could reduce memory 
in some cases

If you are open to it I would like to submit a PR where {{submit}} can take an 
optional callback which is run on each set of data returned from the gremlin 
server, rather than waiting for the entire result set:

{code:javascript}
await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {
  console.log(data.toArray().length) // 25 - this callback will be called 4 
times (100 / 25 = 4)
})
{code}


I have the changes running locally and the overall performance is unchanged, 
queries run about the same as they used to, however, for some specific queries 
memory usage has dropped considerably. 

With the process-on-message strategy the memory usage will be related to how 
large the {{batchSize}} is rather than the final result set. Using the default 
of 64 and testing some specific cases we have I can get the memory to go from 
1.2gb to 10mb.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to