hi Julian,

Thanks for chiming in.

On Thu, Aug 16, 2018 at 1:16 PM, Julian Hyde <jh...@apache.org> wrote:
> If your use case is SQL RPC, then you are getting close to Avatica's
> territory. Avatica[1] is a protocol for implementing
> language-independent JDBC and ODBC stacks.

I'm not proposing to develop a SQL RPC system inside Apache Arrow. But
Arrow Flight could be used to build one

>
> Now, I agree that many ODBC implementations are inefficient. Some ODBC
> stacks make more round trips than necessary, and do more copying than
> necessary. In Avatica we are trying to squeeze out those
> inefficiencies, for example minimizing the number of RPCs. We would
> also love to use Arrow as the data format and reduce copying on the
> server side and client side.

Indeed -- what I would like to see instead is for Avatica to _use_
Arrow Flight to provide an alternative platform to offer Arrow-native
connectivity in addition to the slower JDBC and ODBC standards.

>
> But conversely, people who start with a simple RPC use case - send
> SQL, get the results - may soon find themselves needing a more complex
> protocol - authentication, sessions, prepared statements, bind
> variables, getting metadata before executing, cursors, skipping over
> rows. In other words, find themselves wanting substantial portions of
> an ODBC or JDBC driver.
>
> You could find yourselves building Avatica all over again. We saw all
> of this happen in XML-RPC, and it was sad.

Agreed. I don't think this is in the cards, and what's being proposed
now is orthogonal.

>
> I suggest to keep flight for the truly simple use case, and for the
> more complex use case, invest effort putting Arrow into Avatica. We
> are always happy to welcome new contributors.

+1

>
> Julian
>
> [1] https://calcite.apache.org/avatica/docs/
> On Thu, Aug 16, 2018 at 7:56 AM Wes McKinney <wesmck...@gmail.com> wrote:
>>
>> To give some extra color on my personal motivation for interest in Arrow 
>> Flight:
>>
>> Systems that expose databases on a network frequently send data very
>> slowly. For example, ODBC is in general extremely slow. What I would
>> like to see is servers that can expose a "sql" action type.
>>
>> So, in consideration of the protocol as it stands now [1], example
>> session goes like this:
>>
>> * Client issues ListActions -> returns one or more ActionType, suppose
>> one is "sql"
>> * Client issues DoAction with type sql and body "select * from $TABLE"
>> * Server returns stream URI for query result set and Ticket in the Result 
>> proto
>> * Client issues GetFlightInfo using URI to obtain schema of result set
>> * Client issues DoGet with ticket returned by sql DoAction
>>
>> There's some possible refinements to this workflow; for example, if we
>> wanted to enable DoAction to return more structured results (e.g. to
>> avoid the extra GetFlightInfo RPC to get the schema of the query
>> result set)
>>
>> - Wes
>>
>> [1]: 
>> https://github.com/apache/arrow/blob/c52897274035f8b5192d7647b9711c68d9c54ccc/java/flight/src/main/protobuf/flight.proto
>>
>> On Thu, Aug 16, 2018 at 10:29 AM, Jacques Nadeau <jacq...@apache.org> wrote:
>> > I'm out of town this week (vacation) and will be reviewing your feedback
>> > next week. Thanks for the feedback!
>> >
>> > On Thu, Aug 9, 2018, 8:45 PM Wes McKinney <wesmck...@gmail.com> wrote:
>> >
>> >> hi folks,
>> >>
>> >> I left some feedback on this PR. If others could take a look
>> >> (particularly at the .proto service definition) that would be useful.
>> >>
>> >> We should decide on an approach to getting multiple production-worthy
>> >> Flight/RPC implementations ready to go. It would be a good goal to
>> >> deliver (end-to-end send/receive data between Python and Java, or
>> >> Python and other Python processes) in the next couple releases.
>> >>
>> >> - Wes
>> >>
>> >> On Wed, May 30, 2018 at 12:44 PM, Jacques Nadeau <jacq...@apache.org>
>> >> wrote:
>> >> > Correct, I'm maintaining standard protobuf encoding so a consumer that
>> >> > doesn't go byte by byte can still consumer/produce the messages.
>> >> >
>> >> > More impls: for sure.
>> >> >
>> >> > On Wed, May 30, 2018 at 9:01 AM, Wes McKinney <wesmck...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> I see; looking more closely I see you've sidestepped the standard
>> >> >> Protobuf serialization to write the stream as tagged components:
>> >> >>
>> >> >> https://github.com/apache/arrow/compare/master...jacques-n:flight#diff-
>> >> >> 02cfc9235e22653fce8a7636c9f95507R241
>> >> >>
>> >> >> and then reading the fields of the message tag by tag
>> >> >>
>> >> >> https://github.com/apache/arrow/compare/master...jacques-n:flight#diff-
>> >> >> 02cfc9235e22653fce8a7636c9f95507R159
>> >> >>
>> >> >> Would it be correct that if a GRPC implementation doesn't provide
>> >> >> sufficient access to the byte stream (or if it doesn't care enough
>> >> >> about zero copy) that you could allow GRPC to return an instance of
>> >> >> the FlightData structure?
>> >> >>
>> >> >> I expect we'd want to see a few interoperable implementations (I
>> >> >> suggest Java, C++, Go) to harden the fine details.
>> >> >>
>> >> >> - Wes
>> >> >>
>> >> >> On Mon, May 28, 2018 at 3:32 PM, Jacques Nadeau <jacq...@apache.org>
>> >> >> wrote:
>> >> >> > Cutting through the layers of GRPC will be a per language approach
>> >> thing.
>> >> >> > Assuming that each GRPC language implementation does a good job of
>> >> >> > separating message encapsulation from the base library, this should 
>> >> >> > be
>> >> >> > straight-forward-ish. Hope improves around this as I see creation of
>> >> >> > non-protobuf protocols built on top of the base GRPC [1]. How to do
>> >> this
>> >> >> in
>> >> >> > each language will probably take time looking at the GRPC internals
>> >> for
>> >> >> > that language but can be a secondary step once you get the protocol
>> >> >> working
>> >> >> > (you can just pay for extra copies until then).
>> >> >> >
>> >> >> > In my Java approach I believe I do one read copy and zero write 
>> >> >> > copies
>> >> >> > (needs more testing) which was my target. (Getting to zero-copy on
>> >> read
>> >> >> > means a lot more complexity because your socket-reading has to be
>> >> >> protocol
>> >> >> > aware: even our bespoke layer in Dremio doesn't try to do that. I'd
>> >> guess
>> >> >> > KRPC does the same but haven't reviewed the code to confirm.)
>> >> >> >
>> >> >> > Will try to get some more slides/readme and a proper proposed patch 
>> >> >> > up
>> >> >> soon.
>> >> >> >
>> >> >> > [1] https://grpc.io/blog/flatbuffers
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Mon, May 28, 2018 at 1:05 AM, Wes McKinney <wesmck...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> hey Jacques,
>> >> >> >>
>> >> >> >> This is great news, I look forward to digging into this. My biggest
>> >> >> >> initial question is the Protobuf encapsulation, specifically:
>> >> >> >>
>> >> >> >> https://github.com/jacques-n/arrow/blob/flight/java/flight/
>> >> >> >> src/main/protobuf/flight.proto#L99
>> >> >> >>
>> >> >> >> My understanding of Protocol Buffers is that on read, the 
>> >> >> >> "data_body"
>> >> >> >> memory would be copied out of the serialized protobuf that came
>> >> across
>> >> >> >> the wire. Your comment in the .proto says this "comes last in the
>> >> >> >> definition to help with sidecar patterns" -- my read is that it 
>> >> >> >> would
>> >> >> >> be up to us to do our own sidecar implementation, similar to how
>> >> >> >> Apache Kudu has zero-copy sidecars in their KRPC system [1] (the
>> >> >> >> comment there describes pretty much exactly the problem we have). I
>> >> >> >> saw that you also replied on a GRPC thread about this issue [2].
>> >> Could
>> >> >> >> you summarize what (if anything) stands in the way to get zero-copy
>> >> on
>> >> >> >> write and read?
>> >> >> >>
>> >> >> >> - Wes
>> >> >> >>
>> >> >> >> [1]: https://github.com/apache/kudu/blob/master/src/kudu/rpc/
>> >> >> >> rpc_sidecar.h#L34
>> >> >> >> [2]: https://github.com/grpc/grpc-java/issues/1054#issuecomment-
>> >> >> 391692087
>> >> >> >>
>> >> >> >> On Thu, May 24, 2018 at 6:57 AM, Jacques Nadeau <jacq...@apache.org>
>> >> >> >> wrote:
>> >> >> >> > FYI, if you want to see an example server you can run with a GRPC
>> >> >> >> generated
>> >> >> >> > client, you can run the ExampleFlightServer located at [1]. Very
>> >> basic
>> >> >> >> > 'test' with that class and client is located at [2].
>> >> >> >> >
>> >> >> >> > [1]
>> >> >> >> > https://github.com/jacques-n/arrow/tree/flight/java/flight/
>> >> >> >> src/main/java/org/apache/arrow/flight/example
>> >> >> >> > [2]
>> >> >> >> > https://github.com/jacques-n/arrow/blob/flight/java/flight/
>> >> >> >> src/test/java/org/apache/arrow/flight/example/TestExampleServer.java
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Thu, May 24, 2018 at 11:51 AM, Jacques Nadeau <
>> >> jacq...@apache.org>
>> >> >> >> wrote:
>> >> >> >> >
>> >> >> >> >> Hey All,
>> >> >> >> >>
>> >> >> >> >> I used my Strata talk today as a forcing function to make
>> >> additional
>> >> >> >> >> progress on a GRPC-based Arrow RPC protocol [1]. I’m calling it
>> >> >> “Apache
>> >> >> >> >> Arrow Flight”. You can take a look at the work here [2]. I’ll
>> >> work to
>> >> >> >> clean
>> >> >> >> >> up my work and explain my thoughts about the protocol in the
>> >> coming
>> >> >> >> days.
>> >> >> >> >> High-level: use protobuf as a encapsulation format so that any
>> >> client
>> >> >> >> that
>> >> >> >> >> is supported in GRPC will work. However, we can optimize the
>> >> >> read/write
>> >> >> >> >> path for targeted languages and hand control the
>> >> >> >> >> serialization/deserialization and memory handling. (I did that in
>> >> >> this
>> >> >> >> Java
>> >> >> >> >> patch [3][4][5].) I also looked at starting to use GRPC generated
>> >> >> >> bindings
>> >> >> >> >> within Python but it looks like some glue code may be needed in
>> >> the
>> >> >> C++
>> >> >> >> >> layer since Python delegates down frequently. I also am still
>> >> trying
>> >> >> to
>> >> >> >> >> understand GRPC back-pressure patterns and whether the protocol
>> >> >> >> >> realistically needs to change to cover real-world high 
>> >> >> >> >> performance
>> >> >> use
>> >> >> >> >> cases.
>> >> >> >> >>
>> >> >> >> >> I’ll send out some slides about the ideas and update README, etc.
>> >> >> soon.
>> >> >> >> >>
>> >> >> >> >> Thanks,
>> >> >> >> >> Jacques
>> >> >> >> >>
>> >> >> >> >> [1] https://github.com/jacques-n/arrow/blob/flight/java/flight/
>> >> >> >> >> src/main/protobuf/flight.proto
>> >> >> >> >> [2] http://github.com/jacques-n/arrow/
>> >> >> >> >> [3] https://github.com/jacques-n/arrow/tree/flight/
>> >> >> >> >> java/flight/src/main/java/org/apache/arrow/flight/grpc
>> >> >> >> >> [4] https://github.com/jacques-n/arrow/blob/flight/
>> >> >> >> >> java/flight/src/main/java/org/apache/arrow/flight/
>> >> >> >> ArrowMessage.java#L253
>> >> >> >> >> <https://github.com/jacques-n/arrow/blob/flight/java/flight/
>> >> >> >> src/main/java/org/apache/arrow/flight/ArrowMessage.java#L253>
>> >> >> >> >> [5] https://github.com/jacques-n/arrow/blob/flight/
>> >> >> >> >> java/flight/src/main/java/org/apache/arrow/flight/
>> >> >> >> ArrowMessage.java#L185
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>

Reply via email to