Juliusz Sompolski created SPARK-44835: -----------------------------------------
Summary: SparkConnect ReattachExecute could raise before ExecutePlan even attaches. Key: SPARK-44835 URL: https://issues.apache.org/jira/browse/SPARK-44835 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski If a ReattachExecute is sent very quickly after ExecutePlan, the following could happen: * ExecutePlan didn't reach *executeHolder.runGrpcResponseSender(responseSender)* in SparkConnectExecutePlanHandler yet. * ReattachExecute races around and reaches *executeHolder.runGrpcResponseSender(responseSender)* in SparkConnectReattachExecuteHandler first. * When ExecutePlan reaches {*}executeHolder.runGrpcResponseSender(responseSender){*}, and executionObserver.attachConsumer(this) is called in ExecuteGrpcResponseSender of ExecutePlan, it will kick out the ExecuteGrpcResponseSender or ReattachExecute. So even though ReattachExecute came later, it will get interrupted by the earlier ExecutePlan and finish with a *SparkSQLException(errorClass = "INVALID_CURSOR.DISCONNECTED", Map.empty)* (which was assumed to be a situation where a stale hanging RPC is replaced by a reconnection. That would be very unlikely to happen in practice, because ExecutePlan shouldn't be abandoned so fast, but because of https://issues.apache.org/jira/browse/SPARK-44833 it is slightly more likely (though there there is also a 50ms sleep before retry, which again make it unlikely) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org