Xi Lyu created SPARK-52397: ------------------------------ Summary: Idempotent ExecutePlan: second ExecutePlan with same operationId and plan should reattach Key: SPARK-52397 URL: https://issues.apache.org/jira/browse/SPARK-52397 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0, 4.1.0 Reporter: Xi Lyu
In Spark Connect, queries can fail with the error INVALID_HANDLE.OPERATION_ALREADY_EXISTS, when a client retries an ExecutePlan RPC—often due to transient network issues—causing the server to receive the same request multiple times. Since each ExecutePlan request includes an operation_id, the server interprets the duplicate as an attempt to create an already existing operation, which results in the OPERATION_ALREADY_EXISTS exception. This behavior interrupts query execution and breaks the user experience under otherwise recoverable conditions. To resolve this, we should introduce idempotent handling of ExecutePlan on the server side. When a request with a previously seen operation_id and plan is received, instead of returning an error, the server now reattaches the response stream to the already running execution associated with that operation. This ensures that retries due to network flakiness no longer result in failed queries, thereby improving the resilience and robustness of query executions. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org