Juliusz Sompolski created SPARK-44835:
-----------------------------------------

             Summary: SparkConnect ReattachExecute could raise before 
ExecutePlan even attaches.
                 Key: SPARK-44835
                 URL: https://issues.apache.org/jira/browse/SPARK-44835
             Project: Spark
          Issue Type: Improvement
          Components: Connect
    Affects Versions: 3.5.0
            Reporter: Juliusz Sompolski


If a ReattachExecute is sent very quickly after ExecutePlan, the following 
could happen:
 * ExecutePlan didn't reach 
*executeHolder.runGrpcResponseSender(responseSender)* in 
SparkConnectExecutePlanHandler yet.
 * ReattachExecute races around and reaches 
*executeHolder.runGrpcResponseSender(responseSender)* in 
SparkConnectReattachExecuteHandler first.
 * When ExecutePlan reaches 
{*}executeHolder.runGrpcResponseSender(responseSender){*}, and 
executionObserver.attachConsumer(this) is called in ExecuteGrpcResponseSender 
of ExecutePlan, it will kick out the ExecuteGrpcResponseSender or 
ReattachExecute.

So even though ReattachExecute came later, it will get interrupted by the 
earlier ExecutePlan and finish with a *SparkSQLException(errorClass = 
"INVALID_CURSOR.DISCONNECTED", Map.empty)* (which was assumed to be a situation 
where a stale hanging RPC is replaced by a reconnection.

 

That would be very unlikely to happen in practice, because ExecutePlan 
shouldn't be abandoned so fast, but because of  
https://issues.apache.org/jira/browse/SPARK-44833 it is slightly more likely 
(though there there is also a 50ms sleep before retry, which again make it 
unlikely)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to