Hi Masatake,

     The thread is waiting for a ReadLock, we need to check what the other
thread holding WriteLock is blocked on.
Can you get three consecutive complete jstack of ResourceManager during the
issue.

>> I got no issue if RM-HA is disabled.

Looks RM is not able to access Zookeeper State Store. Can you check if
there is any connectivity issue between RM and Zookeeper.

Thanks,
Prabhu Joseph


On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki <iwasak...@oss.nttdata.co.jp>
wrote:

> Thanks for putting this up, Gabor Bota.
>
> I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA enabled.
> ResourceManager reproducibly blocks on submitApplication while launching
> example MR jobs.
> Does anyone run into the same issue?
>
> The same configuration worked for 3.1.3.
> I got no issue if RM-HA is disabled.
>
>
> "IPC Server handler 1 on default port 8032" #167 daemon prio=5 os_prio=0
> tid=0x00007fe91821ec50 nid=0x3b9 waiting on condition [0x00007fe901bac000]
>     java.lang.Thread.State: WAITING (parking)
>          at sun.misc.Unsafe.park(Native Method)
>          - parking to wait for  <0x0000000085d37a40> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>          at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>          at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>          at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>          at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>          at
>
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>          at
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521)
>          at
>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417)
>          at
>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342)
>          at
>
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678)
>          at
>
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
>          at
>
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
>          at
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
>          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
>          at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
>          at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
>          at java.security.AccessController.doPrivileged(Native Method)
>          at javax.security.auth.Subject.doAs(Subject.java:422)
>          at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)
>
>
> Masatake Iwasaki
>
> On 2020/06/26 22:51, Gabor Bota wrote:
> > Hi folks,
> >
> > I have put together a release candidate (RC2) for Hadoop 3.1.4.
> >
> > The RC is available at:
> http://people.apache.org/~gabota/hadoop-3.1.4-RC2/
> > The RC tag in git is here:
> > https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2
> > The maven artifacts are staged at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1269/
> >
> > You can find my public key at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> > and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C
> >
> > Please try the release and vote. The vote will run for 5 weekdays,
> > until July 6. 2020. 23:00 CET.
> >
> > The release includes the revert of HDFS-14941, as it caused
> > HDFS-15421. IBR leak causes standby NN to be stuck in safe mode.
> > (https://issues.apache.org/jira/browse/HDFS-15421)
> > The release includes HDFS-15323, as requested.
> > (https://issues.apache.org/jira/browse/HDFS-15323)
> >
> > Thanks,
> > Gabor
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>

Reply via email to