[ https://issues.apache.org/jira/browse/MYRIAD-155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
DarinJ resolved MYRIAD-155. --------------------------- Resolution: Duplicate Fix Version/s: Myriad 0.1.0 [MYRIAD-160] > Relaunched NM on same node caused NullPointerException while yarn containers > were running previously. > ----------------------------------------------------------------------------------------------------- > > Key: MYRIAD-155 > URL: https://issues.apache.org/jira/browse/MYRIAD-155 > Project: Myriad > Issue Type: Bug > Reporter: Sarjeet Singh > Fix For: Myriad 0.1.0 > > > This seems a yarn issue (YARN-2441) when the NM is re-launched on the same > node where previously the containers were active/running. > 15/10/15 10:43:18 INFO ipc.Server: Socket Reader #1 for port 31000: > readAndProcess from client 10.10.101.113 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43) > at > org.apache.hadoop.security.rpcauth.DigestAuthMethod$SaslDigestCallbackHandler.getPassword(DigestAuthMethod.java:212) > at > org.apache.hadoop.security.rpcauth.DigestAuthMethod$SaslDigestCallbackHandler.handle(DigestAuthMethod.java:238) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1393) > at > org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1370) > at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1283) > at > org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1246) > at > org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1896) > at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1764) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1528) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:774) > at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:640) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:611) > 15/10/15 10:43:22 INFO security.NMContainerTokenSecretManager: Updating node > address : qa101-116.qa.lab:31000 > The issue is that the "AM tries to connect to NM before NM finished > registering with RM". > Myriad can solve this by picking ports randomly from the list of > random ports it receives from Mesos to differentiate between the NMs from > RM's view. > We can randomly select the NM ports, instead selecting the first few ports as > implemented here: > https://github.com/apache/incubator-myriad/blob/master/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMPorts.java#L46 -- This message was sent by Atlassian JIRA (v6.3.4#6332)