Xi Lyu created SPARK-52397:
------------------------------

             Summary: Idempotent ExecutePlan: second ExecutePlan with same 
operationId and plan should reattach
                 Key: SPARK-52397
                 URL: https://issues.apache.org/jira/browse/SPARK-52397
             Project: Spark
          Issue Type: Improvement
          Components: Connect
    Affects Versions: 4.0.0, 4.1.0
            Reporter: Xi Lyu


In Spark Connect, queries can fail with the error 
INVALID_HANDLE.OPERATION_ALREADY_EXISTS, when a client retries an ExecutePlan 
RPC—often due to transient network issues—causing the server to receive the 
same request multiple times. Since each ExecutePlan request includes an 
operation_id, the server interprets the duplicate as an attempt to create an 
already existing operation, which results in the OPERATION_ALREADY_EXISTS 
exception. This behavior interrupts query execution and breaks the user 
experience under otherwise recoverable conditions.

To resolve this, we should introduce idempotent handling of ExecutePlan on the 
server side. When a request with a previously seen operation_id and plan is 
received, instead of returning an error, the server now reattaches the response 
stream to the already running execution associated with that operation. This 
ensures that retries due to network flakiness no longer result in failed 
queries, thereby improving the resilience and robustness of query executions.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to