Alex Khakhlyuk created SPARK-52673:
--------------------------------------
Summary: Add grpc RetryInfo handling to Spark Connect retry
policies
Key: SPARK-52673
URL: https://issues.apache.org/jira/browse/SPARK-52673
Project: Spark
Issue Type: Improvement
Components: Connect
Affects Versions: 4.1
Reporter: Alex Khakhlyuk
Spark Connect Client has a set of retry policies that specify which errors
coming from the Server can be retried.
This change adds the capability for the Spark Connect Client to use
server-provided retry information. The server can include `RetryInfo` gRPC
message containing `retry_delay` field in its error response. The Client will
now use `RetryInfo` message to classify the error as retriable and will use
`retry_delay` to calculate the next time to wait. This behavior is in line with
the gRPC standard for client-server communication.
The change is needed for two reasons:
1) If the Server is under heavy load or a task takes more time, it can tell the
client to wait longer using the `retry_delay` field.
2) If the Server needs to introduce a new retryable error, it can simply
include `RetryInfo` in the error message. The error message will be retried
automatically by the client. No changes to the client-side retry policies are
needed to retry the new error.
The changes should be introduced both to the Python and Scala clients.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]