Thanks Kou!

For those interested, Spark recently landed a proposal with some similar goals 
in Spark Connect, calling it "reattachable execution": 
https://github.com/apache/spark/pull/42228

>From my understanding, Flight is more flexible here, decoupling the query 
>execution from fetching the result set, and allowing the result set to be 
>distributed/partitioned. (That said, Spark Connect's implementation lets you 
>resume from where you left off when fetching data, while currently Flight 
>doesn't standardize anything other than retrying the entire DoGet.) Both use 
>or plan to use timeouts to deal with cleaning up resources (while allowing 
>clients to explicitly free resources too).

On Tue, Aug 1, 2023, at 16:45, Sutou Kouhei wrote:
> Hi,
>
> I would like to propose adding support for long-running
> queries to Apache Arrow Flight. If anyone has comments for
> this proposal, please share them at here or the issue for
> this proposal:
>   https://github.com/apache/arrow/issues/36155
>
> Implementation in C++/Go/Java:
>   https://github.com/apache/arrow/pull/36946
> Documentations:
>   
> http://crossbow.voltrondata.com/pr_docs/36946/format/Flight.html#downloading-data-by-running-a-heavy-query
>
>
> This is one of proposals in "[DISCUSS] Flight RPC/Flight
> SQL/ADBC enhancements":
>
>   https://lists.apache.org/thread/247z3t06mf132nocngc1jkp3oqglz7jp
>
> See also the "Flight RPC: Long-Running Queries" section in
> the design document for the proposals:
>
>   
> https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXTILDfkadx2g/edit#
>
>
> Changes since the original proposal:
>
> * Removed RetryInfo.cancel_descriptor because we can use the
>   existing CancelFlightInfo action.
> * Renamed RetryInfo.retry_descriptor to
>   RetryInfo.flight_descriptor because we have only one
>   descriptor by removing RetryInfo.cancel_descriptor.
>
>
> Background:
>
> Queries generally don't complete instantly (as much as we
> would like them to). So where can we put the 'query
> evaluation time'?
>
> * In GetFlightInfo: block and wait for the query to complete.
>   * Con: this is a long-running blocking call, which may
>     fail or time out. Then when the client retries, the
>     server has to redo all the work.
>   * Con: parts of the result may be ready before others, but
>     the client can't do anything until everything is ready.
>
> * In DoGet: return a fixed number of partitions
>   * Con: this makes handling worker failures hard. Systems
>     like Trino support fault-tolerant execution by replacing
>     workers at runtime. But GetFlightInfo has already
>     passed, so we can't notify the client of new workers2.
>   * Con: we have to know or fix the partitioning up front.
>
> Neither solution is optimal.
>
>
> Proposal:
>
> Add PollFlightInfo as a retryable version of GetFlightInfo.
> Clients can poll the current query status and start reading
> the currently available results so far before the query is
> completed.
>
> See documentation for details:
>   
> http://crossbow.voltrondata.com/pr_docs/36946/format/Flight.html#downloading-data-by-running-a-heavy-query
>
>
> Implementation:
>
> https://github.com/apache/arrow/pull/36946 is an
> implementation of this proposal. The pull request has the
> followings:
>
> 1. Format changes:
>    * format/Flight.proto
>      
> https://github.com/apache/arrow/pull/36946/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba
>
> 2. Documentation changes:
>    docs/source/format/Flight.rst
>    
> https://github.com/apache/arrow/pull/36946/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89
>
> 3. The C++ implementation and an integration test:
>    * cpp/src/arrow/flight/
>
> 4. The Go implementation and an integration test:
>    * go/arrow/flight/server.go
>    * go/arrow/internal/flight_integration/scenario.go
>
> 5. The Java implementation and an integration test:
>    * java/flight/flight-core/
>    * java/flight/flight-integration-tests/
>
>
> Next:
>
> I'll start a vote for this proposal after we reach a consensus
> on this proposal.
>
> It's the standard process for format change.
> See also:
> https://arrow.apache.org/docs/dev/format/Changing.html
>
>
> Thanks,
> -- 
> kou

Reply via email to