[ 
https://issues.apache.org/jira/browse/HDDS-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ren Koike updated HDDS-15581:
-----------------------------
    Description: 
{{ozone debug replicas chunk-info}} returns incorrect {{blockData.size}} (and 
chunk metadata) for EC-replicated keys. For example, a partial-stripe key, 
every internal block/replica is reported with the size (typically 1 MiB) of a 
specific block in the datanode, instead of the expected per-replica sizes.

*Steps to reproduce*
 # Create an EC key with {{rs-3-2-1024k}} (or similar), e.g. size 1,148,576 
bytes (between 1 MiB and 2 MiB).
 # Run:
ozone debug replicas chunk-info o3://<volume>/<bucket>/<key>
 # Inspect {{blockData.size}} for each entry in {{{}keyLocations{}}}.

Expected (EC 3+2, 1,148,576 bytes):
||Replica||Expected size||
|Data 1|1,048,576|
|Data 2|100,000|
|Data 3|0|
|Parity 4, 5|1,048,576 each|

Actual: all replicas show 1,048,576.

*Root cause*

This is a regression introduced whenHDDS-13445 replaced 
{{getBlockFromAllNodes()}} with a per-datanode loop that calls 
{{ContainerProtocolCalls.getBlock(). }}{{getBlock()}}{{ uses 
}}{{{}tryEachDatanode(){}}}{{{}, which always queries the same datanode (the 
pipeline’s “closest” / first node), not the datanode from the loop variable. 
Each iteration then writes that datanode’s block metadata under a different 
hostname/file path, duplicating the same replica’s data 5× (for EC 3+2).{}}}

*Proposed fix*
 * Add {{ContainerProtocolCalls.getBlockFromDatanode(..., datanode, 
replicaIndexes)}} that uses the existing private {{getBlock(..., datanode, 
...)}} without {{{}tryEachDatanode{}}}.
 * Use it in {{ChunkKeyHandler}} with the loop’s {{{}datanodeDetails{}}}.

Avoid restoring {{getBlockFromAllNodes()}} to prevent holding all block 
metadata in memory for large keys.

  was:
{{ozone debug replicas chunk-info}} returns incorrect {{blockData.size}} (and 
chunk metadata) for EC-replicated keys. For example, a partial-stripe key, 
every internal block/replica is reported with the size (typically 1 MiB) of a 
specific block in the datanode, instead of the expected per-replica sizes.

*Steps to reproduce*
 # Create an EC key with {{rs-3-2-1024k}} (or similar), e.g. size 1,148,576 
bytes (between 1 MiB and 2 MiB).
 # Run:
ozone debug replicas chunk-info o3://<volume>/<bucket>/<key>
 # Inspect {{blockData.size}} for each entry in {{{}keyLocations{}}}.

Expected (EC 3+2, 1,148,576 bytes):
||Replica||Expected size||
|Data 1|1,048,576|
|Data 2|100,000|
|Data 3|0|
|Parity 4, 5|1,048,576 each|

Actual: all replicas show 1,048,576.

*Root cause*

This is a regression introduced 
when[HDDS-13445|https://issues.apache.org/jira/browse/HDDS-13445] replaced 
{{getBlockFromAllNodes()}} with a per-datanode loop that calls 
{{ContainerProtocolCalls.getBlock(). }}{{getBlock()}}{{ uses 
}}{{{}tryEachDatanode(){}}}{{{}, which always queries the same datanode (the 
pipeline’s “closest” / first node), not the datanode from the loop variable. 
Each iteration then writes that datanode’s block metadata under a different 
hostname/file path, duplicating the same replica’s data 5× (for EC 3+2).{}}}

{{{}{}}}{*}Proposed fix{*}
 * Add {{ContainerProtocolCalls.getBlockFromDatanode(..., datanode, 
replicaIndexes)}} that uses the existing private {{getBlock(..., datanode, 
...)}} without {{{}tryEachDatanode{}}}.
 * Use it in {{ChunkKeyHandler}} with the loop’s {{{}datanodeDetails{}}}.

Avoid restoring {{getBlockFromAllNodes()}} to prevent holding all block 
metadata in memory for large keys.

{{}}


> ozone debug replicas chunk-info reports wrong block size for EC keys
> --------------------------------------------------------------------
>
>                 Key: HDDS-15581
>                 URL: https://issues.apache.org/jira/browse/HDDS-15581
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ren Koike
>            Assignee: Ren Koike
>            Priority: Major
>
> {{ozone debug replicas chunk-info}} returns incorrect {{blockData.size}} (and 
> chunk metadata) for EC-replicated keys. For example, a partial-stripe key, 
> every internal block/replica is reported with the size (typically 1 MiB) of a 
> specific block in the datanode, instead of the expected per-replica sizes.
> *Steps to reproduce*
>  # Create an EC key with {{rs-3-2-1024k}} (or similar), e.g. size 1,148,576 
> bytes (between 1 MiB and 2 MiB).
>  # Run:
> ozone debug replicas chunk-info o3://<volume>/<bucket>/<key>
>  # Inspect {{blockData.size}} for each entry in {{{}keyLocations{}}}.
> Expected (EC 3+2, 1,148,576 bytes):
> ||Replica||Expected size||
> |Data 1|1,048,576|
> |Data 2|100,000|
> |Data 3|0|
> |Parity 4, 5|1,048,576 each|
> Actual: all replicas show 1,048,576.
> *Root cause*
> This is a regression introduced whenHDDS-13445 replaced 
> {{getBlockFromAllNodes()}} with a per-datanode loop that calls 
> {{ContainerProtocolCalls.getBlock(). }}{{getBlock()}}{{ uses 
> }}{{{}tryEachDatanode(){}}}{{{}, which always queries the same datanode (the 
> pipeline’s “closest” / first node), not the datanode from the loop variable. 
> Each iteration then writes that datanode’s block metadata under a different 
> hostname/file path, duplicating the same replica’s data 5× (for EC 3+2).{}}}
> *Proposed fix*
>  * Add {{ContainerProtocolCalls.getBlockFromDatanode(..., datanode, 
> replicaIndexes)}} that uses the existing private {{getBlock(..., datanode, 
> ...)}} without {{{}tryEachDatanode{}}}.
>  * Use it in {{ChunkKeyHandler}} with the loop’s {{{}datanodeDetails{}}}.
> Avoid restoring {{getBlockFromAllNodes()}} to prevent holding all block 
> metadata in memory for large keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to