[
https://issues.apache.org/jira/browse/HDDS-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788141#comment-17788141
]
Wei-Chiu Chuang commented on HDDS-8769:
---------------------------------------
To fix this one, we need
* HDDS-8047 to send only incremental chunk lists,
* HDDS-8040 to increase Ratis log file size and
* HDDS-9130 to reduce number of transactions requires.
probably also need an extra review to reduce PutBlock and WriteChunk
transaction serialized size too.
> [hsync] disk usage thread aborts if ratis log rolls very quickly
> ----------------------------------------------------------------
>
> Key: HDDS-8769
> URL: https://issues.apache.org/jira/browse/HDDS-8769
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Wei-Chiu Chuang
> Priority: Major
>
> The Ratis log file corresponding to a HBase WAL block rolls very quickly.
> The disk usage thread aborts because of the change of log file name, and then
> the DN is unable to get correct disk usage.
> {noformat}
> 2023-06-05 08:44:55,462
> [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
> INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> 37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
> created new log segment
> /var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186383
> 2023-06-05 08:44:55,514
> [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315-server-thread16] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> 37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
> Rolling segment log-186383_186396 to index:186396
> 2023-06-05 08:44:55,516
> [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
> INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> 37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
> Rolled log segment from
> /var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186383
> to
> /var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_186383-186396
> 2023-06-05 08:44:55,517
> [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
> INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> 37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
> created new log segment
> /var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186397
> 2023-06-05 08:44:55,570
> [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315-server-thread18] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> 37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
> Rolling segment log-186397_186411 to index:186411
> 2023-06-05 08:44:55,572
> [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
> INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> 37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
> Rolled log segment from
> /var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186397
> to
> /var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_186397-186411
> 2023-06-05 08:44:55,573
> [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
> INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> 37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
> created new log segment
> /var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186412
> 2023-06-05 08:44:55,644
> [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315-server-thread18] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> 37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
> Rolling segment log-186412_186434 to index:186434
> 2023-06-05 08:44:55,646
> [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
> INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> 37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
> Rolled log segment from
> /var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186412
> to
> /var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_186412-186434
> 2023-06-05 08:44:55,647
> [37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker]
> INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> 37d8fb56-9f29-4cd6-b9e1-dcdbef05b315@group-133D49B637D1-SegmentedRaftLogWorker:
> created new log segment
> /var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186435
> 2023-06-05 08:44:55,673 [DiskUsage-/var/lib/hadoop-ozone/datanode/ratis/data-
> ] WARN org.apache.hadoop.hdds.fs.CachingSpaceUsageSource: Error refreshing
> space usage for /var/lib/hadoop-ozone/datanode/ratis/data
> java.io.UncheckedIOException: ExitCodeException exitCode=1: du: cannot access
> ‘/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186383’:
> No such file or directory
> at org.apache.hadoop.hdds.fs.DU$DUShell.getUsed(DU.java:94)
> at
> org.apache.hadoop.hdds.fs.AbstractSpaceUsageSource.time(AbstractSpaceUsageSource.java:56)
> at org.apache.hadoop.hdds.fs.DU.getUsedSpace(DU.java:63)
> at
> org.apache.hadoop.hdds.fs.CachingSpaceUsageSource.refresh(CachingSpaceUsageSource.java:140)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: ExitCodeException exitCode=1: du: cannot access
> ‘/var/lib/hadoop-ozone/datanode/ratis/data/39885220-c182-47d3-ade0-133d49b637d1/current/log_inprogress_186383’:
> No such file or directory
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008)
> at org.apache.hadoop.util.Shell.run(Shell.java:901)
> at org.apache.hadoop.hdds.fs.DU$DUShell.getUsed(DU.java:91)
> ... 10 more
> {noformat}
> The workaround is use DF instead of DU to calculate disk usage
> (hdds.datanode.du.factory=org.apache.hadoop.hdds.fs.DedicatedDiskSpaceUsageFactory)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]