[
https://issues.apache.org/jira/browse/HDDS-15301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-15301:
----------------------------------
Labels: pull-request-available (was: )
> Malformed PutBlock request can mark container UNHEALTHY
> -------------------------------------------------------
>
> Key: HDDS-15301
> URL: https://issues.apache.org/jira/browse/HDDS-15301
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Client, Ozone Datanode
> Reporter: Chu Cheng Li
> Assignee: Chu Cheng Li
> Priority: Major
> Labels: pull-request-available
>
> h2. Summary
> A malformed client {{PutBlock}} request can cause the datanode to mark the
> target container {{{}UNHEALTHY{}}}. The request should be rejected as a
> client-side malformed request, but currently it is mapped to
> {{{}IO_EXCEPTION{}}}, which {{HddsDispatcher}} treats as a container write
> failure.
> This means a bad/misbehaving client can poison the active container and close
> the pipeline.
> h2. Environment
> * Ozone version: {{2.2.0-SNAPSHOT}}
> * Cluster: {{MiniOzoneCluster}}
> * Pipeline: single-node Ratis pipeline
> * Client: custom Rust Ozone client reproducing Java client incremental chunk
> list semantics
> h2. Repro
> Send an incremental {{PutBlock}} request where {{BlockData.size}} does not
> equal the sum of chunks included in the request.
> Example from the failing request:
>
> {code:java}
> putBlock {
> blockData {
> blockID {
> containerID: 2
> localID: 117883640217600002
> blockCommitSequenceId: 0
> }
> metadata { key: "incremental" }
> chunks {
> chunkName: "117883640217600002_chunk_16"
> offset: 16777216
> len: 1048576
> metadata { key: "full" }
> checksumData { type: NONE bytesPerChecksum: 0 }
> }
> size: 17825792
> }
> eof: false
> } {code}
>
> The request includes only one {{1 MiB}} chunk, but {{size}} is {{{}17 MiB{}}}.
> h2. Actual Behavior
> The datanode rejects the protobuf with a {{{}CodecException{}}}:
> {code:java}
> Caused by: org.apache.hadoop.hdds.utils.db.CodecException:
> Size mismatch: size (=17825792) != sum of chunks (=1048576){code}
> That exception is caught in {{KeyValueHandler.handlePutBlock}} as an
> {{IOException}} and returned as {{{}IO_EXCEPTION{}}}:
> {code:java}
> Operation: PutBlock, Message: Put Key failed, Result: IO_EXCEPTION{code}
> {{}}
> Then {{HddsDispatcher}} treats the failed write as a container write failure:
> {code:java}
> Marked container UNHEALTHY from OPEN: KeyValueContainerData #2{code}
> After that, subsequent writes fail with:
> {code:java}
> Container 2 in UNHEALTHY state{code}
> {{}}
> SCM closes the pipeline, and clients may later see retry/failover noise such
> as:
> {code:java}
> not leader; suggested_leader_present=false
> exhausted retry-window resend attempts {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]