cloud-fan commented on code in PR #45150:
URL: https://github.com/apache/spark/pull/45150#discussion_r1541316116


##########
connector/connect/common/src/main/protobuf/spark/connect/base.proto:
##########
@@ -435,6 +438,16 @@ message ExecutePlanResponse {
     // the execution is complete. If the server sends onComplete without 
sending a ResultComplete,
     // it means that there is more, and the client should use ReattachExecute 
RPC to continue.
   }
+
+  // This message is used to communicate progress about the query progress 
during the execution.
+  message ExecutionProgress {
+    int64 num_tasks = 1;
+    int64 num_completed_tasks = 2;

Review Comment:
   I think it's inevitable that when we send new progress reporting messages 
periodically, the number of stages and total tasks will increase. Then the 
percentage calculated by the client may change dramatically.
   
   For example, the query has two stages and the AQE framework has submitted 
the first stage as a spark job. The new listener will be notified and report 
progress to the client. As the tasks keep being completed, the percentage at 
the client side will move to 100%. However, after the first stage is done and 
AQE submits the second stage, the percentage at the client side will suddenly 
drop to 50%.
   
   I think we need more work to estimate the total percentage. We need to 
integrate with the AQE framework at least. Classic Spark does not have total 
query execution percentage as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to