cloud-fan commented on code in PR #45150: URL: https://github.com/apache/spark/pull/45150#discussion_r1541316116
########## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ########## @@ -435,6 +438,16 @@ message ExecutePlanResponse { // the execution is complete. If the server sends onComplete without sending a ResultComplete, // it means that there is more, and the client should use ReattachExecute RPC to continue. } + + // This message is used to communicate progress about the query progress during the execution. + message ExecutionProgress { + int64 num_tasks = 1; + int64 num_completed_tasks = 2; Review Comment: I think it's inevitable that when we send new progress reporting messages periodically, the number of stages and total tasks will increase. Then the percentage calculated by the client may change dramatically. For example, the query has two stages and the AQE framework has submitted the first stage as a spark job. The new listener will be notified and report progress to the client. As the tasks keep being completed, the percentage at the client side will move to 100%. However, after the first stage is done and AQE submits the second stage, the percentage at the client side will suddenly drop to 50%. I think we need more work to estimate the total percentage. We need to integrate with the AQE framework at least. Classic Spark does not have total query execution percentage as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org