hvanhovell commented on code in PR #46570: URL: https://github.com/apache/spark/pull/46570#discussion_r1605509972
########## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ########## @@ -199,6 +200,17 @@ message AnalyzePlanRequest { // (Required) The logical plan to get the storage level. Relation relation = 1; } + + message Checkpoint { Review Comment: Caching is lazy, so it cannot trigger execution of the main query. Checkpoint can actually do this, which is weird for an analyze request. Moreover analyse does not work well with long running operations (i.e. an eager checkpoint) for the following reasons:: - Analyze needs a thread in the server's thread pool which in extreme cases can lead to exhaustion, or in a less extreme scenario can lead to reduced availability. - Long running request can be killed by intermediate proxies. Analyze does not support reattach which is an issue when the checkpoint takes too long. In that case you are basically dead in the water. The argument that ExecutePlanResponse does not support this is IMO a bit flimsy. Look for example at the `SqlCommandResult` this can be returned as part of the `ExecutePlanResponse` and it returns a relation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org