Hello all, The Gremlin Arrow Flight proof of concept demo will take place in the TinkerPop Discord channel https://discord.gg/renSpn8K?event=1006205553749545070 on Friday, Aug 12, 10:30am PST/1:30pm ET. Feel free to join us if you are interested! Discord registration/login may be required.
On Fri, Aug 5, 2022 at 3:31 PM Valentyn Kahamlyk <valent...@bitquilltech.com> wrote: > Hello all,I'm hosting in Discord a short demo for proof of concept using > Arrow Flight with Gremlin, using string queries and GraphSON for > serialization. Any questions and comments are welcome. The next step will > be to create the full designs based on the proof of concept.The planned > date is Aug 12, I will follow up with the exact time later. > > On Thu, Jun 30, 2022 at 4:35 PM Valentyn Kahamlyk < > valent...@bitquilltech.com> wrote: > >> Hello Everyone, >> >> I would like to propose exploring options to use Arrow Flight as a transport >> for Gremlin Server. Currently Gremlin Server and Clients are based on >> WebSockets with a custom sub-protocol and serialization to GraphSON and >> GraphBinary. Developers for each driver must implement those protocols from >> scratch and there is a limited amount of code which is being reused (only >> 3rd party WebSocket libraries are currently reused in the client variants). >> The protocol implementation is a complicated and error-prone process, so >> most drivers only support some subset of Gremlin Server features. The >> maintenance cost is also constantly increasing with the number of new client >> variants being added to TinkerPop. >> >> ** Motivation ** >> We would like to propose a solution to reduce maintenance and simplify the >> development of the client drivers by using a standard protocol based on the >> Apache Arrow Flight. As Arrow Flight is implemented in the most common >> languages like C++, C#, Java and Python we anticipate a larger amount of >> existing codebase can be reused which would help to reduce maintenance costs >> in the future. Also, we can reuse some other Arrow Flight features like >> authentication and error handling. >> >> ** Assumptions ** >> Proof of Concept Development will be done with Java 8. >> Need to reuse existing code as much as possible. >> It is desirable, but not necessary, to maintain compatibility with existing >> drivers. >> To simplify development at the initial stage, we will reuse existing >> serialization mechanisms. >> >> ** Requirements ** >> Gremlin Server and drivers should replace the network layer with Arrow >> Flight. >> No significant drop in performance. >> Gremlin Arrow must pass the Gherkin test suite. >> >> ** Prototype Design Overview ** >> We would like to explore solution below and create prototype to prove >> approach is feasible. >> The main idea is to replace the transport layer with FlightServer and >> FlightClient. They support asynchronous data transfer, splitting data into >> chunks, and authorization. While Arrow Flight typically requires schema, in >> a short term we can proceed with implementation using existing serializers >> and GraphBinary format. By using GraphBinary we will not have all >> capabilities that Arrow Flight provides out of the box, like efficient >> compression. However, in the future, we see the value of adding capabilities >> to generate a schema from the server-side, and that can enable additional >> use cases. >> >> First stage: replace transport layer, but keep serializers >> Pros: >> Reduction of the code base to be developed and maintained >> A relatively low number of modifications >> >> Cons: >> We may observe reduced performance due to schema transfer and other >> overhead. As part of the PoC we will assess performance overhead for small >> and large responses and identify options to mitigate it. >> Still need to support GraphBinary serialization. >> >> Second stage: replace transport layer, make dynamic schema generation and >> use native Arrow structures for data transmission >> Pros: >> Greater reduction of the codebase to be developed and maintained >> In addition, need to rework the serialization and add schema generation >> Performance can be improved for large data sets due to Arrow Flight >> optimizations and the ability to transfer data in parallel >> No need to support GraphBinary and GraphSON serialization protocols >> >> Cons: >> Reduced performance for small result sets >> Can be complicated and expensive to generate a schema for each request >> >> Please find few more diagrams attached in the pdf file attached and please >> share your thoughts. >> >> Regards, Valentyn >> >>