Kaushal Khator created HDFS-17862:
-------------------------------------
Summary: Race condition between DirectoryScanner and append
operations causes block corruption on single-replica blocks
Key: HDFS-17862
URL: https://issues.apache.org/jira/browse/HDFS-17862
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode
Affects Versions: 3.4.1
Reporter: Kaushal Khator
A race condition exists between the DirectoryScanner reconciliation thread and
HDFS append operations that can cause blocks to be incorrectly marked as
corrupt when they have only a single replica. This issue occurs when the
DirectoryScanner runs while an append operation is in progress on a FINALIZED
replica.
*Root Cause*
The race condition occurs due to the following sequence:
1. HDFS append operations are performed directly on replicas in the
{{FINALIZED}} state without transitioning them to a non-finalized state. It
uses ReplicaInPipeline layered on top of {{FINALIZED}} replicas for appends,
but the underlying state remains {{{}FINALIZED{}}}.
2. The DirectoryScanner's {{checkAndUpdate}} logic is designed to skip only
non-finalized replicas during reconciliation.
3. The system does not expose an append-in-progress state that DirectoryScanner
could use to skip such blocks
4. When DirectoryScanner runs during an active append, it detects a length
mismatch between in-memory metadata and on-disk block size. This can occur when
the new .meta file is not fully written when the scanner runs.
5. The scanner incorrectly interprets this transient state as corruption and
marks the block as corrupt.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]