[jira] [Created] (HDFS-16113) Improve CallQueueManager#swapQueue() execution performance
JiangHua Zhu created HDFS-16113: --- Summary: Improve CallQueueManager#swapQueue() execution performance Key: HDFS-16113 URL: https://issues.apache.org/jira/browse/HDFS-16113 Project: Hadoop HDFS Issue Type: Improvement Reporter: JiangHua Zhu In CallQueueManager#swapQueue(), there are some codes: CallQueueManager#swapQueue() { .. while (!queueIsReallyEmpty(oldQ)) {} .. } In queueIsReallyEmpty(): .. for (int i = 0; i
[jira] [Created] (HDFS-16112) Fix flaky unit test TestDecommissioningStatusWithBackoffMonitor
tomscut created HDFS-16112: -- Summary: Fix flaky unit test TestDecommissioningStatusWithBackoffMonitor Key: HDFS-16112 URL: https://issues.apache.org/jira/browse/HDFS-16112 Project: Hadoop HDFS Issue Type: Wish Reporter: tomscut The unit test TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatus and TestDecommissioningStatus#testDecommissionStatus recently seems a little flaky, we should fix them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16108) Incorrect log placeholders used in JournalNodeSyncer
[ https://issues.apache.org/jira/browse/HDFS-16108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei resolved HDFS-16108. Fix Version/s: 3.4.0 Resolution: Fixed > Incorrect log placeholders used in JournalNodeSyncer > > > Key: HDFS-16108 > URL: https://issues.apache.org/jira/browse/HDFS-16108 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > When Journal sync thread is using incorrect log placeholders at 2 places: > # When it fails to create dir for downloading log segments > # When it fails to move tmp editFile to current dir > Since these failure logs are important to debug JN sync issues, we should fix > these incorrect placeholders. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16109) Fix flaky some unit tests since they offen timeout
[ https://issues.apache.org/jira/browse/HDFS-16109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved HDFS-16109. -- Fix Version/s: 3.3.2 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Committed to trunk and branch-3.3. Thank you [~tomscut] for your contribution. > Fix flaky some unit tests since they offen timeout > -- > > Key: HDFS-16109 > URL: https://issues.apache.org/jira/browse/HDFS-16109 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 40m > Remaining Estimate: 0h > > Increase timeout for TestBootstrapStandby, TestFsVolumeList and > TestDecommissionWithBackoffMonitor since they offen timeout. > > TestBootstrapStandby: > {code:java} > [ERROR] Tests run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: > 159.474 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] Tests > run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 159.474 s <<< > FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby[ERROR] > testRateThrottling(org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby) > Time elapsed: 31.262 s <<< > ERROR!org.junit.runners.model.TestTimedOutException: test timed out after > 3 milliseconds at java.io.RandomAccessFile.writeBytes(Native Method) at > java.io.RandomAccessFile.write(RandomAccessFile.java:512) at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:947) > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:910) > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:699) > at > org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:642) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:387) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:243) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1224) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:795) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:673) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:760) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1014) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:989) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1763) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2261) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2231) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby.testRateThrottling(TestBootstrapStandby.java:297) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) > {code} > TestFsVolumeList: > {code:java} > [ERROR] Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: > 190.294 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] > Tests run: 12, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 190.294 s > <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList[ERROR] > testAddRplicaProcessorForAddingReplicaInMap(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList) > Time elapsed: 60.028 s <<< > ERROR!org.junit.runners.model.TestTimedOutException: test timed out after > 6 milliseconds at sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/ No changes -1 overall The following subsystems voted -1: blanks compile golang pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml Failed junit tests : hadoop.yarn.csi.client.TestCsiClient hadoop.tools.dynamometer.TestDynamometerInfra hadoop.tools.dynamometer.TestDynamometerInfra compile: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/patch-compile-root.txt [1.2M] cc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/patch-compile-root.txt [1.2M] golang: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/patch-compile-root.txt [1.2M] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/patch-compile-root.txt [1.2M] blanks: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/blanks-eol.txt [13M] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/blanks-tabs.txt [2.0M] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/results-checkstyle-root.txt [16M] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/results-pathlen.txt [16K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/results-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/results-shellcheck.txt [28K] xml: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/xml.txt [24K] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/results-javadoc-javadoc-root.txt [408K] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-csi.txt [20K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer_hadoop-dynamometer-infra.txt [8.0K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/558/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt [24K] Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16111) Add a configuration to RoundRobinVolumeChoosingPolicy to avoid picking an almost full volume to place a replica.
Zhihai Xu created HDFS-16111: Summary: Add a configuration to RoundRobinVolumeChoosingPolicy to avoid picking an almost full volume to place a replica. Key: HDFS-16111 URL: https://issues.apache.org/jira/browse/HDFS-16111 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Zhihai Xu Assignee: Zhihai Xu When we upgraded our hadoop cluster from hadoop 2.6.0 to hadoop 3.2.2, we got failed volume on a lot of datanodes, which cause some missing blocks at that time. Although later on we recovered all the missing blocks by symlinking the path (dfs/dn/current) on the failed volume to a new directory and copying all the data to the new directory, we missed our SLA and it delayed our upgrading process on our production cluster for several hours. When this issue happened, we saw a lot of this exceptions happened before the volumed failed on the datanode: [DataXceiver for client at /[XX.XX.XX.XX:XXX|http://10.104.103.159:33986/] [Receiving block BP-XX-XX.XX.XX.XX-XX:blk_X_XXX]] datanode.DataNode (BlockReceiver.java:(289)) - IOException in BlockReceiver constructor :Possible disk error: Failed to create /XXX/dfs/dn/current/BP-XX-XX.XX.XX.XX-X/tmp/blk_XX. Cause is java.io.IOException: No space left on device at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:1012) at org.apache.hadoop.hdfs.server.datanode.FileIoProvider.createFile(FileIoProvider.java:302) at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createFileWithExistsCheck(DatanodeUtil.java:69) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createTmpFile(BlockPoolSlice.java:292) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTmpFile(FsVolumeImpl.java:532) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createTemporary(FsVolumeImpl.java:1254) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1598) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:212) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1314) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:768) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:291) at java.lang.Thread.run(Thread.java:748) We found this issue happened due to the following two reasons: First the upgrade process added some extra disk storage on the each disk volume of the data node: BlockPoolSliceStorage.doUpgrade (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java#L445) is the main upgrade function in the datanode, it will add some extra storage. The extra storage added is all new directories created in /current//current, although all block data file and block meta data file are hard-linked with /current//previous after upgrade. Since there will be a lot of new directories created, this will use some disk space on each disk volume. Second there is a potential bug when picking a disk volume to write a new block file(replica). By default, Hadoop uses RoundRobinVolumeChoosingPolicy, The code to select a disk will check whether the available space on the selected disk is more than the size bytes of block file to store (https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/RoundRobinVolumeChoosingPolicy.java#L86) But when creating a new block, there will be two files created: one is the block file blk_, the other is block metadata file blk__.meta, this is the code when finalizing a block, both block file size and meta data file size will be updated: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java#L391 the current code only considers the size of block file and doesn't consider the size of block metadata file, when choosing a disk in RoundRobinVolumeChoosingPolicy. There can be a lot of on-going blocks received at the same time, the default maximum number of DataXceiver threads is 4096. This will underestimate the total size needed to write a block, which will potentially cause the above disk full error(No space left on device). Since the size of the block metadata file is not fixed,
[jira] [Created] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient
tomscut created HDFS-16110: -- Summary: Remove unused method reportChecksumFailure in DFSClient Key: HDFS-16110 URL: https://issues.apache.org/jira/browse/HDFS-16110 Project: Hadoop HDFS Issue Type: Wish Reporter: tomscut Assignee: tomscut Remove unused method reportChecksumFailure and fix some code styles by the way in DFSClient. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/ No changes -1 overall The following subsystems voted -1: asflicense hadolint mvnsite pathlen unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.util.TestDiskCheckerWithDiskIo hadoop.fs.TestFileUtil hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat hadoop.hdfs.server.federation.resolver.order.TestLocalResolver hadoop.hdfs.server.federation.router.TestRouterQuota hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker hadoop.yarn.server.resourcemanager.TestClientRMService hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter hadoop.tools.TestDistCpSystem hadoop.yarn.sls.TestSLSRunner hadoop.resourceestimator.service.TestResourceEstimatorService hadoop.resourceestimator.solver.impl.TestLpSolver cc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/diff-compile-javac-root.txt [496K] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/diff-checkstyle-root.txt [16M] hadolint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/diff-patch-hadolint.txt [4.0K] mvnsite: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/patch-mvnsite-root.txt [584K] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/pathlen.txt [12K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/diff-patch-pylint.txt [48K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/diff-patch-shellcheck.txt [56K] shelldocs: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/diff-patch-shelldocs.txt [48K] whitespace: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/whitespace-eol.txt [12M] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/whitespace-tabs.txt [1.3M] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/patch-javadoc-root.txt [32K] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [240K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [424K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt [12K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt [40K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt [124K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt [96K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt [104K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/patch-unit-hadoop-tools_hadoop-distcp.txt [28K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/patch-unit-hadoop-tools_hadoop-azure.txt [20K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/349/artifact/out/patch-unit-hadoop