[ 
https://issues.apache.org/jira/browse/HDDS-9651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai resolved HDDS-9651.
------------------------------------
    Fix Version/s: 1.4.0
       Resolution: Implemented

> Restore random selection of datanode when reading RATIS objects
> ---------------------------------------------------------------
>
>                 Key: HDDS-9651
>                 URL: https://issues.apache.org/jira/browse/HDDS-9651
>             Project: Apache Ozone
>          Issue Type: Task
>            Reporter: Kirill Sizov
>            Assignee: Ivan Brusentsev
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>
> *Observations*
> During performance testing, an issue was identified where data written as 
> RATIS/THREE was primarily read from a single datanode, typically the pipeline 
> leader. The experiment involved the following steps:
> * Upload a set of files to a cluster using RATIS/THREE pipelines.
> * Attempt to read all those files.
> * Monitor datanode metrics.
> Notably, all reads were performed from the leader datanodes.
> *Motivation*
> Reading all the data from a single datanode creates an uneven load on the 
> cluster and reduces the potential total read throughput.
> *Reasons*
> Upon researching this issue, two root causes were identified:
> # The RATIS object reading is currently implemented using a gRPC client 
> (specifically, the {{BlockInputStream.getChunkInfos()}} method switches the 
> pipeline type to {{STANDALONE}}, and the {{XceiverClientManager}} selects a 
> gRPC client). 
> Unfortunately, this configuration does not allow the Ozone Manager to 
> determine the client's host address since the 
> {{OmMetadatareader.getClientAddress()}} method only works for Hadoop RPC 
> interactions. This failure to detect the client's address results in an empty 
> {{nodesInOrder}} field in the response received by the client, causing the 
> datanodes to be read in the default order in the pipeline, with the leader 
> node being read first.
> #  As the SCM cannot locate the client relative to the cluster topology (see 
> {{ScmBlockProtocolServer.sortDatanodes}}), the client receives a Pipeline 
> with a null {{nodesInOrder}} field, resulting in reads being done primarily 
> from the leader datanode.
> Once these issues are resolved, it is expected that read throughput will 
> improve, and the cluster's load when reading RATIS/THREE objects will become 
> more balanced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to