Chu Cheng Li created HDDS-15301:
-----------------------------------

             Summary: Malformed PutBlock request can mark container UNHEALTHY
                 Key: HDDS-15301
                 URL: https://issues.apache.org/jira/browse/HDDS-15301
             Project: Apache Ozone
          Issue Type: Bug
          Components: Ozone Client, Ozone Datanode
            Reporter: Chu Cheng Li
            Assignee: Chu Cheng Li


h2. Summary

A malformed client {{PutBlock}} request can cause the datanode to mark the 
target container {{{}UNHEALTHY{}}}. The request should be rejected as a 
client-side malformed request, but currently it is mapped to 
{{{}IO_EXCEPTION{}}}, which {{HddsDispatcher}} treats as a container write 
failure.

This means a bad/misbehaving client can poison the active container and close 
the pipeline.
h2. Environment
 * Ozone version: {{2.2.0-SNAPSHOT}}
 * Cluster: {{MiniOzoneCluster}}
 * Pipeline: single-node Ratis pipeline
 * Client: custom Rust Ozone client reproducing Java client incremental chunk 
list semantics

h2. Repro

Send an incremental {{PutBlock}} request where {{BlockData.size}} does not 
equal the sum of chunks included in the request.

Example from the failing request:
 
{code:java}
putBlock {
  blockData {
    blockID { 
      containerID: 2 
      localID: 117883640217600002 
      blockCommitSequenceId: 0 
    }
    metadata { key: "incremental" }
    chunks {
      chunkName: "117883640217600002_chunk_16"
      offset: 16777216
      len: 1048576
      metadata { key: "full" }
      checksumData { type: NONE bytesPerChecksum: 0 }
    }
    size: 17825792
  }
  eof: false
} {code}
 

The request includes only one {{1 MiB}} chunk, but {{size}} is {{{}17 MiB{}}}.
h2. Actual Behavior

The datanode rejects the protobuf with a {{{}CodecException{}}}:
{code:java}
Caused by: org.apache.hadoop.hdds.utils.db.CodecException:
Size mismatch: size (=17825792) != sum of chunks (=1048576){code}

That exception is caught in {{KeyValueHandler.handlePutBlock}} as an 
{{IOException}} and returned as {{{}IO_EXCEPTION{}}}:
{code:java}
 Operation: PutBlock, Message: Put Key failed, Result: IO_EXCEPTION{code}
{{}}
Then {{HddsDispatcher}} treats the failed write as a container write failure:
{code:java}
 Marked container UNHEALTHY from OPEN: KeyValueContainerData #2{code}
After that, subsequent writes fail with: 
{code:java}
 Container 2 in UNHEALTHY state{code}
{{}}
SCM closes the pipeline, and clients may later see retry/failover noise such as:
{code:java}
not leader; suggested_leader_present=false
exhausted retry-window resend attempts {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to