Thanks Kou! For those interested, Spark recently landed a proposal with some similar goals in Spark Connect, calling it "reattachable execution": https://github.com/apache/spark/pull/42228
>From my understanding, Flight is more flexible here, decoupling the query >execution from fetching the result set, and allowing the result set to be >distributed/partitioned. (That said, Spark Connect's implementation lets you >resume from where you left off when fetching data, while currently Flight >doesn't standardize anything other than retrying the entire DoGet.) Both use >or plan to use timeouts to deal with cleaning up resources (while allowing >clients to explicitly free resources too). On Tue, Aug 1, 2023, at 16:45, Sutou Kouhei wrote: > Hi, > > I would like to propose adding support for long-running > queries to Apache Arrow Flight. If anyone has comments for > this proposal, please share them at here or the issue for > this proposal: > https://github.com/apache/arrow/issues/36155 > > Implementation in C++/Go/Java: > https://github.com/apache/arrow/pull/36946 > Documentations: > > http://crossbow.voltrondata.com/pr_docs/36946/format/Flight.html#downloading-data-by-running-a-heavy-query > > > This is one of proposals in "[DISCUSS] Flight RPC/Flight > SQL/ADBC enhancements": > > https://lists.apache.org/thread/247z3t06mf132nocngc1jkp3oqglz7jp > > See also the "Flight RPC: Long-Running Queries" section in > the design document for the proposals: > > > https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXTILDfkadx2g/edit# > > > Changes since the original proposal: > > * Removed RetryInfo.cancel_descriptor because we can use the > existing CancelFlightInfo action. > * Renamed RetryInfo.retry_descriptor to > RetryInfo.flight_descriptor because we have only one > descriptor by removing RetryInfo.cancel_descriptor. > > > Background: > > Queries generally don't complete instantly (as much as we > would like them to). So where can we put the 'query > evaluation time'? > > * In GetFlightInfo: block and wait for the query to complete. > * Con: this is a long-running blocking call, which may > fail or time out. Then when the client retries, the > server has to redo all the work. > * Con: parts of the result may be ready before others, but > the client can't do anything until everything is ready. > > * In DoGet: return a fixed number of partitions > * Con: this makes handling worker failures hard. Systems > like Trino support fault-tolerant execution by replacing > workers at runtime. But GetFlightInfo has already > passed, so we can't notify the client of new workers2. > * Con: we have to know or fix the partitioning up front. > > Neither solution is optimal. > > > Proposal: > > Add PollFlightInfo as a retryable version of GetFlightInfo. > Clients can poll the current query status and start reading > the currently available results so far before the query is > completed. > > See documentation for details: > > http://crossbow.voltrondata.com/pr_docs/36946/format/Flight.html#downloading-data-by-running-a-heavy-query > > > Implementation: > > https://github.com/apache/arrow/pull/36946 is an > implementation of this proposal. The pull request has the > followings: > > 1. Format changes: > * format/Flight.proto > > https://github.com/apache/arrow/pull/36946/files#diff-53b6c132dcc789483c879f667a1c675792b77aae9a056b257d6b20287bb09dba > > 2. Documentation changes: > docs/source/format/Flight.rst > > https://github.com/apache/arrow/pull/36946/files#diff-839518fb41e923de682e8587f0b6fdb00eb8f3361d360c2f7249284a136a7d89 > > 3. The C++ implementation and an integration test: > * cpp/src/arrow/flight/ > > 4. The Go implementation and an integration test: > * go/arrow/flight/server.go > * go/arrow/internal/flight_integration/scenario.go > > 5. The Java implementation and an integration test: > * java/flight/flight-core/ > * java/flight/flight-integration-tests/ > > > Next: > > I'll start a vote for this proposal after we reach a consensus > on this proposal. > > It's the standard process for format change. > See also: > https://arrow.apache.org/docs/dev/format/Changing.html > > > Thanks, > -- > kou