Re: [DISCUSS] lots of yetus failures in hdfs

2024-06-06 Thread Ayush Saxena
There are two places where you can tune the memory [1] & [2]

I haven't checked again, but I think it is the same old problem, I
mentioned that in [3] in the last paragraph where there was some
windows failure report. I did drop a comment [4] on that PR telling
about those issues, there were some comments on the Jira but I didn't
follow...

So, technically it should be an HDFS induced mess, else I would have
reverted by now :-)

Good Luck!!!

-Ayush


[1] 
https://github.com/apache/hadoop/blob/2ee0bf953492b66765d3d2c902407fbf9bceddec/hadoop-project/pom.xml#L172
[2] 
https://github.com/apache/hadoop/blob/2ee0bf953492b66765d3d2c902407fbf9bceddec/dev-support/docker/Dockerfile#L77
[3] https://lists.apache.org/thread/hmzl61ow0sbs10p0hky17xxhsggbhc3g
[4] https://github.com/apache/hadoop/pull/6664#issuecomment-2082356393

On Fri, 7 Jun 2024 at 08:04, Xiaoqiao He  wrote:
>
> Thanks Steve. Try to trigger CI manually and let's wait what it will say.
> BTW, the flaky tests seem not related to UT logic itself, but most of them
> throw OOM. Not sure if @Ayush Saxena knows how to re-config or tune
> the memory of Yetus?
>
> Best Regards,
> - He Xiaoqiao
>
> On Fri, Jun 7, 2024 at 3:59 AM Steve Loughran  
> wrote:
>>
>> PR's which trigger hdfs builds seem to hit a lot of hdfs test failures
>> https://github.com/apache/hadoop/pull/6675
>>
>> Are these regressions or are the tests flaky?
>>
>> I don't want commit patches which break things, yet hdfs tests seem
>> unreliable and so I'm dangerously tempted to +1 anyway...

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSS] lots of yetus failures in hdfs

2024-06-06 Thread Xiaoqiao He
Thanks Steve. Try to trigger CI manually and let's wait what it will say.
BTW, the flaky tests seem not related to UT logic itself, but most of them
throw OOM. Not sure if @Ayush Saxena  knows how to
re-config or tune
the memory of Yetus?

Best Regards,
- He Xiaoqiao

On Fri, Jun 7, 2024 at 3:59 AM Steve Loughran 
wrote:

> PR's which trigger hdfs builds seem to hit a lot of hdfs test failures
> https://github.com/apache/hadoop/pull/6675
>
> Are these regressions or are the tests flaky?
>
> I don't want commit patches which break things, yet hdfs tests seem
> unreliable and so I'm dangerously tempted to +1 anyway...
>


[jira] [Created] (HDFS-17546) Implementing Timeout for HostFileReader when FS hangs

2024-06-06 Thread Simbarashe Dzinamarira (Jira)
Simbarashe Dzinamarira created HDFS-17546:
-

 Summary: Implementing Timeout for HostFileReader when FS hangs
 Key: HDFS-17546
 URL: https://issues.apache.org/jira/browse/HDFS-17546
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Simbarashe Dzinamarira
Assignee: Simbarashe Dzinamarira


Certain implementations of Hadoop have the dfs.hosts file residing on NAS/NFS 
and potentially with symlinks. If the FS hangs for any reason, the refreshNodes 
call would infinitely hang on the HostsFileReader until the FS returns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[DISCUSS] lots of yetus failures in hdfs

2024-06-06 Thread Steve Loughran
PR's which trigger hdfs builds seem to hit a lot of hdfs test failures
https://github.com/apache/hadoop/pull/6675

Are these regressions or are the tests flaky?

I don't want commit patches which break things, yet hdfs tests seem
unreliable and so I'm dangerously tempted to +1 anyway...


[jira] [Created] (HDFS-17545) [ARR] router async rpc client.

2024-06-06 Thread Jian Zhang (Jira)
Jian Zhang created HDFS-17545:
-

 Summary: [ARR] router async rpc client.
 Key: HDFS-17545
 URL: https://issues.apache.org/jira/browse/HDFS-17545
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jian Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17544) [ARR] The router client rpc protocol supports asynchrony.

2024-06-06 Thread Jian Zhang (Jira)
Jian Zhang created HDFS-17544:
-

 Summary: [ARR] The router client rpc protocol supports asynchrony.
 Key: HDFS-17544
 URL: https://issues.apache.org/jira/browse/HDFS-17544
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jian Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-17542) EC: Optimize the EC block reconstruction.

2024-06-06 Thread Chenyu Zheng (Jira)
Chenyu Zheng created HDFS-17542:
---

 Summary: EC: Optimize the EC block reconstruction.
 Key: HDFS-17542
 URL: https://issues.apache.org/jira/browse/HDFS-17542
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chenyu Zheng
Assignee: Chenyu Zheng


The current reconstruction process of EC blocks is based on the original 
contiguous blocks. It is mainly implemented through the work constructed by 
computeReconstructionWorkForBlocks. It can be roughly divided into three 
processes:
 * scheduleReconstruction
 * chooseTargets
 * validateReconstructionWork

For ordinary contiguous blocks:

* (1) scheduleReconstruction

Select srcNodes as the source of the copy block according to the status of each 
replica of the block. 

* (2) chooseTargets

Select the target of the copy.

* (3) validateReconstructionWork

Add the copy command to srcNode, srcNode receives the command through 
heartbeat, and executes the block copy from src to target.

For EC blocks:
(1) and (2) are nearly same. However, in (3), block copying or block 
reconstruction may occur, or no work may be generated, such as when some 
storage are busy. If no work is generated, it will lead to the problem 
described in HDFS-17516. Even if no block copying or block reconstruction is 
generated, pendingReconstruction and neededReconstruction will still be updated 
until the block times out, which wastes the scheduling opportunity.
In order to be compatible with the original contiguous blocks and decide the 
specific action in (3), unnecessary liveBlockIndices, liveBusyBlockIndices, and 
excludeReconstructedIndices are introduced. We know many bug is related here. 
These can be avoided.

Improvements:
* Move the work of deciding whether to copy or reconstruct blocks from (3) to 
(1).

Such improvements are more conducive to implementing the explicit specification 
of the reconstruction block index mentioned in HDFS-16874, and do not need to 
pass liveBlockIndices, liveBusyBlockIndice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2024-06-06 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/

No changes




-1 overall


The following subsystems voted -1:
asflicense hadolint mvnsite pathlen unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.fs.TestFileUtil 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   
hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain 
   hadoop.hdfs.server.datanode.TestDirectoryScanner 
   hadoop.hdfs.server.namenode.ha.TestPipelinesFailover 
   hadoop.hdfs.TestDFSInotifyEventInputStream 
   hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap 
   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes 
   hadoop.hdfs.server.federation.router.TestRouterQuota 
   hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat 
   hadoop.hdfs.server.federation.resolver.order.TestLocalResolver 
   hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.mapreduce.lib.input.TestLineRecordReader 
   hadoop.mapred.TestLineRecordReader 
   hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter 
   hadoop.resourceestimator.service.TestResourceEstimatorService 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.yarn.sls.TestSLSRunner 
   
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceAllocator
 
   
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceHandlerImpl
 
   hadoop.yarn.server.resourcemanager.TestClientRMService 
   
hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker
 
  

   cc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/diff-compile-javac-root.txt
  [488K]

   checkstyle:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/diff-checkstyle-root.txt
  [14M]

   hadolint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   mvnsite:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-mvnsite-root.txt
  [572K]

   pathlen:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/diff-patch-shellcheck.txt
  [72K]

   whitespace:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/whitespace-eol.txt
  [12M]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/whitespace-tabs.txt
  [1.3M]

   javadoc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-javadoc-root.txt
  [36K]

   unit:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [220K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [460K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
  [36K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [16K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
  [104K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-tools_hadoop-azure.txt
  [20K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-tools_hadoop-resourceestimator.txt
  [16K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1415/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt