[
https://issues.apache.org/jira/browse/HDDS-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-9323:
---------------------------------
Labels: pull-request-available (was: )
> Better datanode exclude list handling for long-lived clients
> ------------------------------------------------------------
>
> Key: HDDS-9323
> URL: https://issues.apache.org/jira/browse/HDDS-9323
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Client
> Reporter: Ethan Rose
> Assignee: Dave Teng
> Priority: Major
> Labels: pull-request-available
>
> Currently it is possible that a long lived client can add most or all nodes
> of a small cluster to its exclude list, and further writes using that client
> instance will fail. There are two ways this can be improved:
> # A timeout to remove nodes from the exclude list after so that they can be
> retried. For EC, this exists and is configured to 10 minutes by default.
> Ratis does not currently have this but it should be added.
> # Allow the write to fall back to nodes in the exclude list if that is all
> that is available. This could be implemented on the server side, or as a
> retry from the client based on the server's initial response.
> These issues are especially relevant for S3 gateway, which uses a persistent
> Ozone client to connect to the cluster while it is up.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]