Alex Khakhlyuk created SPARK-52673:
--------------------------------------

             Summary: Add grpc RetryInfo handling to Spark Connect retry 
policies
                 Key: SPARK-52673
                 URL: https://issues.apache.org/jira/browse/SPARK-52673
             Project: Spark
          Issue Type: Improvement
          Components: Connect
    Affects Versions: 4.1
            Reporter: Alex Khakhlyuk


Spark Connect Client has a set of retry policies that specify which errors 
coming from the Server can be retried.
This change adds the capability for the Spark Connect Client to use 
server-provided retry information. The server can include `RetryInfo` gRPC 
message containing `retry_delay` field in its error response. The Client will 
now use `RetryInfo` message to classify the error as retriable and will use 
`retry_delay` to calculate the next time to wait. This behavior is in line with 
the gRPC standard for client-server communication.

The change is needed for two reasons:
1) If the Server is under heavy load or a task takes more time, it can tell the 
client to wait longer using the `retry_delay` field.
2) If the Server needs to introduce a new retryable error, it can simply 
include `RetryInfo` in the error message. The error message will be retried 
automatically by the client. No changes to the client-side retry policies are 
needed to retry the new error.

 

The changes should be introduced both to the Python and Scala clients.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to