Hi,

Update: RetryInfo is renamed to PollInfo based on feedback
from Andrew. Thanks!!!

If there are no more comments, I'll start a vote for this
proposal a few days later.


Thanks,
-- 
kou

In <20230802.054524.268757111696908235....@clear-code.com>
  "[DISCUSS][Format][Flight] Long-running queries support" on Wed, 02 Aug 2023 
05:45:24 +0900 (JST),
  Sutou Kouhei <k...@clear-code.com> wrote:

> Hi,
> 
> I would like to propose adding support for long-running
> queries to Apache Arrow Flight. If anyone has comments for
> this proposal, please share them at here or the issue for
> this proposal:
>   https://github.com/apache/arrow/issues/36155
> 
> Implementation in C++/Go/Java:
>   https://github.com/apache/arrow/pull/36946
> Documentations:
>   
> http://crossbow.voltrondata.com/pr_docs/36946/format/Flight.html#downloading-data-by-running-a-heavy-query
> 
> 
> This is one of proposals in "[DISCUSS] Flight RPC/Flight
> SQL/ADBC enhancements":
> 
>   https://lists.apache.org/thread/247z3t06mf132nocngc1jkp3oqglz7jp
> 
> See also the "Flight RPC: Long-Running Queries" section in
> the design document for the proposals:
> 
>   
> https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXTILDfkadx2g/edit#
> 
> 
> Changes since the original proposal:
> 
> * Removed RetryInfo.cancel_descriptor because we can use the
>   existing CancelFlightInfo action.
> * Renamed RetryInfo.retry_descriptor to
>   RetryInfo.flight_descriptor because we have only one
>   descriptor by removing RetryInfo.cancel_descriptor.
> 
> 
> Background:
> 
> Queries generally don't complete instantly (as much as we
> would like them to). So where can we put the 'query
> evaluation time'?
> 
> * In GetFlightInfo: block and wait for the query to complete.
>   * Con: this is a long-running blocking call, which may
>     fail or time out. Then when the client retries, the
>     server has to redo all the work.
>   * Con: parts of the result may be ready before others, but
>     the client can't do anything until everything is ready.
> 
> * In DoGet: return a fixed number of partitions
>   * Con: this makes handling worker failures hard. Systems
>     like Trino support fault-tolerant execution by replacing
>     workers at runtime. But GetFlightInfo has already
>     passed, so we can't notify the client of new workers2.
>   * Con: we have to know or fix the partitioning up front.
> 
> Neither solution is optimal.
> 
> 
> Proposal:
> 
> Add PollFlightInfo as a retryable version of GetFlightInfo.
> Clients can poll the current query status and start reading
> the currently available results so far before the query is
> completed.
> 
> See documentation for details:
>   
> http://crossbow.voltrondata.com/pr_docs/36946/format/Flight.html#downloading-data-by-running-a-heavy-query
> 
> 
> Implementation:
> 
> https://github.com/apache/arrow/pull/36946 is an
> implementation of this proposal. The pull request has the
> followings:
> 
> 1. Format changes:
>    * format/Flight.proto
>      
> https://github.com/apache/arrow/pull/36946/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba
> 
> 2. Documentation changes:
>    docs/source/format/Flight.rst
>    
> https://github.com/apache/arrow/pull/36946/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89
> 
> 3. The C++ implementation and an integration test:
>    * cpp/src/arrow/flight/
> 
> 4. The Go implementation and an integration test:
>    * go/arrow/flight/server.go
>    * go/arrow/internal/flight_integration/scenario.go
> 
> 5. The Java implementation and an integration test:
>    * java/flight/flight-core/
>    * java/flight/flight-integration-tests/
> 
> 
> Next:
> 
> I'll start a vote for this proposal after we reach a consensus
> on this proposal.
> 
> It's the standard process for format change.
> See also:
> https://arrow.apache.org/docs/dev/format/Changing.html
> 
> 
> Thanks,
> -- 
> kou

Reply via email to