Hello all,

The Gremlin Arrow Flight proof of concept demo will take place in the
TinkerPop Discord channel
https://discord.gg/renSpn8K?event=1006205553749545070 on Friday, Aug
12, 10:30am PST/1:30pm ET. Feel free to join us if you are interested!
Discord registration/login may be required.


On Fri, Aug 5, 2022 at 3:31 PM Valentyn Kahamlyk <valent...@bitquilltech.com>
wrote:

> Hello all,I'm hosting in Discord a short demo for proof of concept using
> Arrow Flight with Gremlin, using string queries and GraphSON for
> serialization. Any questions and comments are welcome. The next step will
> be to create the full designs based on the proof of concept.The planned
> date is Aug 12, I will follow up with the exact time later.
>
> On Thu, Jun 30, 2022 at 4:35 PM Valentyn Kahamlyk <
> valent...@bitquilltech.com> wrote:
>
>> Hello Everyone,
>>
>> I would like to propose exploring options to use Arrow Flight as a transport 
>> for Gremlin Server. Currently Gremlin Server and Clients are based on 
>> WebSockets with a custom sub-protocol and serialization to GraphSON and 
>> GraphBinary.  Developers for each driver must implement those protocols from 
>> scratch and there is a limited amount of code which is being reused (only 
>> 3rd party WebSocket libraries are currently reused in the client variants). 
>> The protocol implementation is a complicated and error-prone process, so 
>> most drivers only support some subset of Gremlin Server features. The 
>> maintenance cost is also constantly increasing with the number of new client 
>> variants being added to TinkerPop.
>>
>> ** Motivation **
>> We would like to propose a solution to reduce maintenance and simplify the 
>> development of the client drivers by using a standard protocol based on the 
>> Apache Arrow Flight. As Arrow Flight is implemented in the most common 
>> languages like C++, C#, Java and Python we anticipate a larger amount of 
>> existing codebase can be reused which would help to reduce maintenance costs 
>> in the future. Also, we can reuse some other Arrow Flight features like 
>> authentication and error handling.
>>
>> ** Assumptions **
>> Proof of Concept Development will be done with Java 8.
>> Need to reuse existing code as much as possible.
>> It is desirable, but not necessary, to maintain compatibility with existing 
>> drivers.
>> To simplify development at the initial stage, we will reuse existing 
>> serialization mechanisms.
>>
>> ** Requirements **
>> Gremlin Server and drivers should replace the network layer with Arrow 
>> Flight.
>> No significant drop in performance.
>> Gremlin Arrow must pass the Gherkin test suite.
>>
>> ** Prototype Design Overview **
>> We would like to explore solution below and create prototype to prove 
>> approach is feasible.
>> The main idea is to replace the transport layer with FlightServer and 
>> FlightClient. They support asynchronous data transfer, splitting data into 
>> chunks, and authorization. While Arrow Flight typically requires schema, in 
>> a short term we can proceed with implementation using existing serializers 
>> and GraphBinary format. By using GraphBinary we will not have all 
>> capabilities that Arrow Flight provides out of the box, like efficient 
>> compression. However, in the future, we see the value of adding capabilities 
>> to generate a schema from the server-side, and that can enable additional 
>> use cases.
>>
>> First stage: replace transport layer, but keep serializers
>> Pros:
>> Reduction of the code base to be developed and maintained
>> A relatively low number of modifications
>>
>> Cons:
>> We may observe reduced performance due to schema transfer and other 
>> overhead. As part of the PoC we will assess performance overhead for small 
>> and large responses and identify options to mitigate it.
>> Still need to support GraphBinary serialization.
>>
>> Second stage: replace transport layer, make dynamic schema generation and 
>> use native Arrow structures for data transmission
>> Pros:
>> Greater reduction of the codebase to be developed and maintained
>> In addition, need to rework the serialization and add schema generation
>> Performance can be improved for large data sets due to Arrow Flight 
>> optimizations and the ability to transfer data in parallel
>> No need to support GraphBinary and GraphSON serialization protocols
>>
>> Cons:
>> Reduced performance for small result sets
>> Can be complicated and expensive to generate a schema for each request
>>
>> Please find few more diagrams attached in the pdf file attached and please 
>> share your thoughts.
>>
>> Regards, Valentyn
>>
>>

Reply via email to