Re: [DISCUSS] lots of yetus failures in hdfs
There are two places where you can tune the memory [1] & [2] I haven't checked again, but I think it is the same old problem, I mentioned that in [3] in the last paragraph where there was some windows failure report. I did drop a comment [4] on that PR telling about those issues, there were some comments on the Jira but I didn't follow... So, technically it should be an HDFS induced mess, else I would have reverted by now :-) Good Luck!!! -Ayush [1] https://github.com/apache/hadoop/blob/2ee0bf953492b66765d3d2c902407fbf9bceddec/hadoop-project/pom.xml#L172 [2] https://github.com/apache/hadoop/blob/2ee0bf953492b66765d3d2c902407fbf9bceddec/dev-support/docker/Dockerfile#L77 [3] https://lists.apache.org/thread/hmzl61ow0sbs10p0hky17xxhsggbhc3g [4] https://github.com/apache/hadoop/pull/6664#issuecomment-2082356393 On Fri, 7 Jun 2024 at 08:04, Xiaoqiao He wrote: > > Thanks Steve. Try to trigger CI manually and let's wait what it will say. > BTW, the flaky tests seem not related to UT logic itself, but most of them > throw OOM. Not sure if @Ayush Saxena knows how to re-config or tune > the memory of Yetus? > > Best Regards, > - He Xiaoqiao > > On Fri, Jun 7, 2024 at 3:59 AM Steve Loughran > wrote: >> >> PR's which trigger hdfs builds seem to hit a lot of hdfs test failures >> https://github.com/apache/hadoop/pull/6675 >> >> Are these regressions or are the tests flaky? >> >> I don't want commit patches which break things, yet hdfs tests seem >> unreliable and so I'm dangerously tempted to +1 anyway... - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Re: [DISCUSS] lots of yetus failures in hdfs
Thanks Steve. Try to trigger CI manually and let's wait what it will say. BTW, the flaky tests seem not related to UT logic itself, but most of them throw OOM. Not sure if @Ayush Saxena knows how to re-config or tune the memory of Yetus? Best Regards, - He Xiaoqiao On Fri, Jun 7, 2024 at 3:59 AM Steve Loughran wrote: > PR's which trigger hdfs builds seem to hit a lot of hdfs test failures > https://github.com/apache/hadoop/pull/6675 > > Are these regressions or are the tests flaky? > > I don't want commit patches which break things, yet hdfs tests seem > unreliable and so I'm dangerously tempted to +1 anyway... >
[jira] [Created] (HDFS-17546) Implementing Timeout for HostFileReader when FS hangs
Simbarashe Dzinamarira created HDFS-17546: - Summary: Implementing Timeout for HostFileReader when FS hangs Key: HDFS-17546 URL: https://issues.apache.org/jira/browse/HDFS-17546 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Simbarashe Dzinamarira Assignee: Simbarashe Dzinamarira Certain implementations of Hadoop have the dfs.hosts file residing on NAS/NFS and potentially with symlinks. If the FS hangs for any reason, the refreshNodes call would infinitely hang on the HostsFileReader until the FS returns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[DISCUSS] lots of yetus failures in hdfs
PR's which trigger hdfs builds seem to hit a lot of hdfs test failures https://github.com/apache/hadoop/pull/6675 Are these regressions or are the tests flaky? I don't want commit patches which break things, yet hdfs tests seem unreliable and so I'm dangerously tempted to +1 anyway...
[jira] [Created] (HDFS-17545) [ARR] router async rpc client.
Jian Zhang created HDFS-17545: - Summary: [ARR] router async rpc client. Key: HDFS-17545 URL: https://issues.apache.org/jira/browse/HDFS-17545 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jian Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17544) [ARR] The router client rpc protocol supports asynchrony.
Jian Zhang created HDFS-17544: - Summary: [ARR] The router client rpc protocol supports asynchrony. Key: HDFS-17544 URL: https://issues.apache.org/jira/browse/HDFS-17544 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jian Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17542) EC: Optimize the EC block reconstruction.
Chenyu Zheng created HDFS-17542: --- Summary: EC: Optimize the EC block reconstruction. Key: HDFS-17542 URL: https://issues.apache.org/jira/browse/HDFS-17542 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chenyu Zheng Assignee: Chenyu Zheng The current reconstruction process of EC blocks is based on the original contiguous blocks. It is mainly implemented through the work constructed by computeReconstructionWorkForBlocks. It can be roughly divided into three processes: * scheduleReconstruction * chooseTargets * validateReconstructionWork For ordinary contiguous blocks: * (1) scheduleReconstruction Select srcNodes as the source of the copy block according to the status of each replica of the block. * (2) chooseTargets Select the target of the copy. * (3) validateReconstructionWork Add the copy command to srcNode, srcNode receives the command through heartbeat, and executes the block copy from src to target. For EC blocks: (1) and (2) are nearly same. However, in (3), block copying or block reconstruction may occur, or no work may be generated, such as when some storage are busy. If no work is generated, it will lead to the problem described in HDFS-17516. Even if no block copying or block reconstruction is generated, pendingReconstruction and neededReconstruction will still be updated until the block times out, which wastes the scheduling opportunity. In order to be compatible with the original contiguous blocks and decide the specific action in (3), unnecessary liveBlockIndices, liveBusyBlockIndices, and excludeReconstructedIndices are introduced. We know many bug is related here. These can be avoided. Improvements: * Move the work of deciding whether to copy or reconstruct blocks from (3) to (1). Such improvements are more conducive to implementing the explicit specification of the reconstruction block index mentioned in HDFS-16874, and do not need to pass liveBlockIndices, liveBusyBlockIndice. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/ No changes -1 overall The following subsystems voted -1: asflicense hadolint mvnsite pathlen unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.fs.TestFileUtil hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain hadoop.hdfs.server.datanode.TestDirectoryScanner hadoop.hdfs.server.namenode.ha.TestPipelinesFailover hadoop.hdfs.TestDFSInotifyEventInputStream hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes hadoop.hdfs.server.federation.router.TestRouterQuota hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat hadoop.hdfs.server.federation.resolver.order.TestLocalResolver hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.mapreduce.lib.input.TestLineRecordReader hadoop.mapred.TestLineRecordReader hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter hadoop.resourceestimator.service.TestResourceEstimatorService hadoop.resourceestimator.solver.impl.TestLpSolver hadoop.yarn.sls.TestSLSRunner hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceAllocator hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceHandlerImpl hadoop.yarn.server.resourcemanager.TestClientRMService hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker cc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/diff-compile-javac-root.txt [488K] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/diff-checkstyle-root.txt [14M] hadolint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/diff-patch-hadolint.txt [4.0K] mvnsite: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-mvnsite-root.txt [572K] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/pathlen.txt [12K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/diff-patch-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/diff-patch-shellcheck.txt [72K] whitespace: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/whitespace-eol.txt [12M] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/whitespace-tabs.txt [1.3M] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-javadoc-root.txt [36K] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [220K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [460K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt [36K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt [16K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt [104K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-tools_hadoop-azure.txt [20K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-tools_hadoop-resourceestimator.txt [16K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt