[
https://issues.apache.org/jira/browse/HADOOP-19847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yue.wang updated HADOOP-19847:
------------------------------
Description:
The {{logAllocatedBlock}} method in {{FSNamesystem.getAdditionalBlock}} is
currently called while holding global lock. Flame graph analysis shows this
logging path (via SLF4J/Log4j appenders) contributes non-trivial latency,
blocking other NameNode operations.
Since {{logAllocatedBlock}} is only for audit/diagnostic logging and does not
modify shared state, we can safely move it after releasing global lock to
reduce lock hold time and improve write throughput.
This change preserves all existing logging behavior while eliminating
unnecessary lock contention from I/O-bound logging operations.
Flame graph:
!logAllocatedBlock takes lots of time.png|width=1083,height=378!
was:
The {{logAllocatedBlock}} method in {{FSNamesystem.getAdditionalBlock}} is
currently called while holding {{{}fsLock{}}}. Flame graph analysis shows this
logging path (via SLF4J/Log4j appenders) contributes non-trivial latency,
blocking other NameNode operations.
Since {{logAllocatedBlock}} is only for audit/diagnostic logging and does not
modify shared state, we can safely move it after releasing {{fsLock}} to reduce
lock hold time and improve write throughput.
This change preserves all existing logging behavior while eliminating
unnecessary lock contention from I/O-bound logging operations.
Flame graph:
!logAllocatedBlock takes lots of time.png|width=1083,height=378!
> Move logAllocatedBlock out of lock in FSNamesystem.getAdditionalBlock to
> reduce latency
> ---------------------------------------------------------------------------------------
>
> Key: HADOOP-19847
> URL: https://issues.apache.org/jira/browse/HADOOP-19847
> Project: Hadoop Common
> Issue Type: Improvement
> Components: hdfs
> Affects Versions: 3.4.3
> Reporter: yue.wang
> Priority: Major
> Labels: HDFS
> Attachments: logAllocatedBlock takes lots of time.png
>
>
> The {{logAllocatedBlock}} method in {{FSNamesystem.getAdditionalBlock}} is
> currently called while holding global lock. Flame graph analysis shows this
> logging path (via SLF4J/Log4j appenders) contributes non-trivial latency,
> blocking other NameNode operations.
>
> Since {{logAllocatedBlock}} is only for audit/diagnostic logging and does not
> modify shared state, we can safely move it after releasing global lock to
> reduce lock hold time and improve write throughput.
>
> This change preserves all existing logging behavior while eliminating
> unnecessary lock contention from I/O-bound logging operations.
>
> Flame graph:
> !logAllocatedBlock takes lots of time.png|width=1083,height=378!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]