[
https://issues.apache.org/jira/browse/SPARK-52673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Khakhlyuk updated SPARK-52673:
-----------------------------------
Target Version/s: (was: 4.1.0)
Affects Version/s: 4.1.0
(was: 4.1)
> Add grpc RetryInfo handling to Spark Connect retry policies
> -----------------------------------------------------------
>
> Key: SPARK-52673
> URL: https://issues.apache.org/jira/browse/SPARK-52673
> Project: Spark
> Issue Type: Improvement
> Components: Connect
> Affects Versions: 4.1.0
> Reporter: Alex Khakhlyuk
> Priority: Major
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> Spark Connect Client has a set of retry policies that specify which errors
> coming from the Server can be retried.
> This change adds the capability for the Spark Connect Client to use
> server-provided retry information. The server can include `RetryInfo` gRPC
> message containing `retry_delay` field in its error response. The Client will
> now use `RetryInfo` message to classify the error as retriable and will use
> `retry_delay` to calculate the next time to wait. This behavior is in line
> with the gRPC standard for client-server communication.
> The change is needed for two reasons:
> 1) If the Server is under heavy load or a task takes more time, it can tell
> the client to wait longer using the `retry_delay` field.
> 2) If the Server needs to introduce a new retryable error, it can simply
> include `RetryInfo` in the error message. The error message will be retried
> automatically by the client. No changes to the client-side retry policies are
> needed to retry the new error.
>
> The changes should be introduced both to the Python and Scala clients.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]