[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

GitBox Wed, 19 Jan 2022 06:37:50 -0800


jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r787809380




##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a 
per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. 
Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script 
for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin 
server and receives messages according to the `batchSize` parameter on the per 
request settings or the `resultIterationBatchSize` value configured for the 
Gremlin server. When submitting scripts the default behavior is to wait for the 
entire result set to be returned from a query before allowing any processing on 
the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory 
management to process each chunk of data as it is returned from the gremlin 
server. The Gremlin JavaScript driver can accept an optional callback to run on 
each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       I think from the user POV, having a thing (`ResultSet`, `Stream`, ...) 
that represents the result of the execution is more familiar than having 1 per 
"batch". The batch is a low level thing.
   
   There are several db client libraries that use this approach (blocking 
iterators in other languages, async iterators in node.js, streams or 
callbacks). I [worked on the paging api for 
Cassandra](https://github.com/datastax/nodejs-driver/tree/master/doc/features/paging)
 that uses several of these approaches (see [here for 
example](https://github.com/datastax/nodejs-driver#row-streaming-and-pipes)), 
but these are popular patterns across node.js db client libraries (see 
[mysql](https://github.com/mysqljs/mysql#streaming-query-rows))
   
   I think that we should return an object that fires events or a stream of 
result items, we should clearly signal when all the data has been received or 
when there was an error. If we use callbacks, I think we shouldn't mix it with 
promises to get the error.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Reply via email to