[ 
https://issues.apache.org/jira/browse/HDDS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-15643:
------------------------------------
    Fix Version/s: 2.3.0
       Resolution: Fixed
           Status: Resolved  (was: Patch Available)

> ECFileChecksumHelper: redundant OM lookupKey RPC and per-file gRPC connection 
> creation for EC checksum collection
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-15643
>                 URL: https://issues.apache.org/jira/browse/HDDS-15643
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Andrey Yarovoy
>            Assignee: Andrey Yarovoy
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.3.0
>
>
> *Description:*
> Checksum collection for EC files has three structural inefficiencies that 
> make each file's cost far higher than necessary. All three are present in the 
> current code and compound under any non-trivial OM latency.
> *Bug 1 — Double {{lookupKey}} RPC per file ({{{}BaseFileChecksumHelper{}}})*
> The 7-arg constructor (which accepts a pre-fetched {{{}OmKeyInfo{}}}) 
> delegates to {{{}this(6-arg){}}}. The 6-arg constructor calls 
> {{fetchBlocks()}} before returning, and {{fetchBlocks()}} checks {{if 
> (keyInfo == null)}} to decide whether to issue a {{lookupKey}} RPC. Because 
> {{this.keyInfo = keyInfo}} executes only after the delegation returns, 
> {{keyInfo}} is always null at the time of that check — so a redundant 
> {{lookupKey}} is fired for every file regardless of whether the caller 
> already supplied one.
> *Bug 2 — New gRPC connection opened for every file 
> ({{{}ECFileChecksumHelper{}}})*
> {{getChunkInfos()}} builds a 3-node STANDALONE pipeline to read the stripe 
> checksum (replica index 1 plus the two parity nodes). It calls 
> {{{}pipeline.toBuilder().setNodes(nodes).build(){}}}. 
> {{Pipeline.Builder.setNodes()}} detects that the 3-node set differs from the 
> 5-node EC {{nodeStatus}} and unconditionally calls 
> {{{}PipelineID.randomId(){}}}, generating a fresh random UUID per file. Since 
> {{XceiverClientManager}} keys its gRPC connection cache on pipeline ID, the 
> cache never hits and a new connection is opened for every file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to