Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Hi Gabor Bota, I committed the fix of YARN-10347 to branch-3.1. I think this should be blocker for 3.1.4. Could you cherry-pick it to branch-3.1.4 and cut a new RC? Thanks, Masatake Iwasaki On 2020/07/08 23:31, Masatake Iwasaki wrote: Thanks Steve and Prabhu for the information. The cause turned out to be locking in CapacityScheduler#reinitialize. I think the method is called after transitioning to active stat if RM-HA is enabled. I filed YARN-10347 and created PR. Masatake Iwasaki On 2020/07/08 16:33, Prabhu Joseph wrote: Hi Masatake, The thread is waiting for a ReadLock, we need to check what the other thread holding WriteLock is blocked on. Can you get three consecutive complete jstack of ResourceManager during the issue. I got no issue if RM-HA is disabled. Looks RM is not able to access Zookeeper State Store. Can you check if there is any connectivity issue between RM and Zookeeper. Thanks, Prabhu Joseph On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki wrote: Thanks for putting this up, Gabor Bota. I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA enabled. ResourceManager reproducibly blocks on submitApplication while launching example MR jobs. Does anyone run into the same issue? The same configuration worked for 3.1.3. I got no issue if RM-HA is disabled. "IPC Server handler 1 on default port 8032" #167 daemon prio=5 os_prio=0 tid=0x7fe91821ec50 nid=0x3b9 waiting on condition [0x7fe901bac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x85d37a40> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) Masatake Iwasaki On 2020/06/26 22:51, Gabor Bota wrote: Hi folks, I have put together a release candidate (RC2) for Hadoop 3.1.4. The RC is available at: http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ The RC tag in git is here: https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 The maven artifacts are staged at https://repository.apache.org/content/repositories/orgapachehadoop-1269/ You can find my public key at: https://dist.apache.org/repos/dist/release/hadoop/common/KEYS and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C Please try the release and vote. The vote will run for 5 weekdays, until July 6. 2020. 23:00 CET. The release includes the revert of HDFS-14941, as it caused HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. (https://issues.apache.org/jira/browse/HDFS-15421) The release includes HDFS-15323, as requested. (https://issues.apache.org/jira/browse/HDFS-15323) Thanks, Gabor - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org -
Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/742/ No changes -1 overall The following subsystems voted -1: docker Powered by Apache Yetushttps://yetus.apache.org - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17120) Fix failure of docker image creation due to pip2 install error on branch-3.1
Masatake Iwasaki created HADOOP-17120: - Summary: Fix failure of docker image creation due to pip2 install error on branch-3.1 Key: HADOOP-17120 URL: https://issues.apache.org/jira/browse/HADOOP-17120 Project: Hadoop Common Issue Type: Bug Affects Versions: 3.1.4 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki {noformat} The command '/bin/sh -c pip2 install configparser==4.0.2 pylint==1.9.2' returned a non-zero code: 1 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17119) Netty upgrade to 9.4.x causes MR app fail with IOException
Bilwa S T created HADOOP-17119: -- Summary: Netty upgrade to 9.4.x causes MR app fail with IOException Key: HADOOP-17119 URL: https://issues.apache.org/jira/browse/HADOOP-17119 Project: Hadoop Common Issue Type: Bug Reporter: Bilwa S T Assignee: Bilwa S T I think we should catch IOException here instead of BindException in HttpServer2#bindForPortRange {code:java} for(Integer port : portRanges) { if (port == startPort) { continue; } Thread.sleep(100); listener.setPort(port); try { bindListener(listener); return; } catch (BindException ex) { // Ignore exception. Move to next port. ioException = ex; } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17118) TestFileCreation#testServerDefaultsWithMinimalCaching fails intermittently
Ahmed Hussein created HADOOP-17118: -- Summary: TestFileCreation#testServerDefaultsWithMinimalCaching fails intermittently Key: HADOOP-17118 URL: https://issues.apache.org/jira/browse/HADOOP-17118 Project: Hadoop Common Issue Type: Bug Reporter: Ahmed Hussein {{TestFileCreation.testServerDefaultsWithMinimalCaching}} fails intermittently on trunk {code:bash} [ERROR] Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 103.413 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestFileCreation [ERROR] testServerDefaultsWithMinimalCaching(org.apache.hadoop.hdfs.TestFileCreation) Time elapsed: 2.435 s <<< FAILURE! java.lang.AssertionError: expected:<402653184> but was:<268435456> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.hadoop.hdfs.TestFileCreation.testServerDefaultsWithMinimalCaching(TestFileCreation.java:279) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-17117) Typos in hadoop-aws documentation
[ https://issues.apache.org/jira/browse/HADOOP-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved HADOOP-17117. Fix Version/s: 3.4.0 3.3.1 Hadoop Flags: Reviewed Assignee: Sebastian Nagel Resolution: Fixed Merged the PR into trunk and branch-3.3. Thank you [~snagel]. > Typos in hadoop-aws documentation > - > > Key: HADOOP-17117 > URL: https://issues.apache.org/jira/browse/HADOOP-17117 > Project: Hadoop Common > Issue Type: Bug > Components: documentation, fs/s3 >Reporter: Sebastian Nagel >Assignee: Sebastian Nagel >Priority: Trivial > Fix For: 3.3.1, 3.4.0 > > > There are couple of typos in the hadoop-aws documentation (markdown). I'll > open a PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Thanks Steve and Prabhu for the information. The cause turned out to be locking in CapacityScheduler#reinitialize. I think the method is called after transitioning to active stat if RM-HA is enabled. I filed YARN-10347 and created PR. Masatake Iwasaki On 2020/07/08 16:33, Prabhu Joseph wrote: Hi Masatake, The thread is waiting for a ReadLock, we need to check what the other thread holding WriteLock is blocked on. Can you get three consecutive complete jstack of ResourceManager during the issue. I got no issue if RM-HA is disabled. Looks RM is not able to access Zookeeper State Store. Can you check if there is any connectivity issue between RM and Zookeeper. Thanks, Prabhu Joseph On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki wrote: Thanks for putting this up, Gabor Bota. I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA enabled. ResourceManager reproducibly blocks on submitApplication while launching example MR jobs. Does anyone run into the same issue? The same configuration worked for 3.1.3. I got no issue if RM-HA is disabled. "IPC Server handler 1 on default port 8032" #167 daemon prio=5 os_prio=0 tid=0x7fe91821ec50 nid=0x3b9 waiting on condition [0x7fe901bac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x85d37a40> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) Masatake Iwasaki On 2020/06/26 22:51, Gabor Bota wrote: Hi folks, I have put together a release candidate (RC2) for Hadoop 3.1.4. The RC is available at: http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ The RC tag in git is here: https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 The maven artifacts are staged at https://repository.apache.org/content/repositories/orgapachehadoop-1269/ You can find my public key at: https://dist.apache.org/repos/dist/release/hadoop/common/KEYS and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C Please try the release and vote. The vote will run for 5 weekdays, until July 6. 2020. 23:00 CET. The release includes the revert of HDFS-14941, as it caused HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. (https://issues.apache.org/jira/browse/HDFS-15421) The release includes HDFS-15323, as requested. (https://issues.apache.org/jira/browse/HDFS-15323) Thanks, Gabor - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional
[jira] [Created] (HADOOP-17117) Typos in hadoop-aws documentation
Sebastian Nagel created HADOOP-17117: Summary: Typos in hadoop-aws documentation Key: HADOOP-17117 URL: https://issues.apache.org/jira/browse/HADOOP-17117 Project: Hadoop Common Issue Type: Bug Components: documentation, fs/s3 Reporter: Sebastian Nagel There are couple of typos in the hadoop-aws documentation (markdown). I'll open a PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Hi Masatake, The thread is waiting for a ReadLock, we need to check what the other thread holding WriteLock is blocked on. Can you get three consecutive complete jstack of ResourceManager during the issue. >> I got no issue if RM-HA is disabled. Looks RM is not able to access Zookeeper State Store. Can you check if there is any connectivity issue between RM and Zookeeper. Thanks, Prabhu Joseph On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki wrote: > Thanks for putting this up, Gabor Bota. > > I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA enabled. > ResourceManager reproducibly blocks on submitApplication while launching > example MR jobs. > Does anyone run into the same issue? > > The same configuration worked for 3.1.3. > I got no issue if RM-HA is disabled. > > > "IPC Server handler 1 on default port 8032" #167 daemon prio=5 os_prio=0 > tid=0x7fe91821ec50 nid=0x3b9 waiting on condition [0x7fe901bac000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x85d37a40> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) > at > > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) > at > > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) > at > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) > at > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) > > > Masatake Iwasaki > > On 2020/06/26 22:51, Gabor Bota wrote: > > Hi folks, > > > > I have put together a release candidate (RC2) for Hadoop 3.1.4. > > > > The RC is available at: > http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ > > The RC tag in git is here: > > https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 > > The maven artifacts are staged at > > https://repository.apache.org/content/repositories/orgapachehadoop-1269/ > > > > You can find my public key at: > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > > and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C > > > > Please try the release and vote. The vote will run for 5 weekdays, > > until July 6. 2020. 23:00 CET. > > > > The release includes the revert of HDFS-14941, as it caused > > HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. > > (https://issues.apache.org/jira/browse/HDFS-15421) > > The release includes HDFS-15323, as requested. > > (https://issues.apache.org/jira/browse/HDFS-15323) > > > > Thanks, > > Gabor > > > > - > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > - > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org > >
[jira] [Resolved] (HADOOP-16781) Backport HADOOP-16612 "Track Azure Blob File System client-perceived latency" to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeetesh Mangwani resolved HADOOP-16781. --- Resolution: Won't Fix Previously we used to keep branch-2 updated with the changes we pushed to trunk. This way, the few customers on older versions continued to get support. Now branch-2 is not maintained anymore. > Backport HADOOP-16612 "Track Azure Blob File System client-perceived latency" > to branch-2 > - > > Key: HADOOP-16781 > URL: https://issues.apache.org/jira/browse/HADOOP-16781 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Bilahari T H >Assignee: Jeetesh Mangwani >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org