Sidharta Seethana created SLIDER-1121:
-----------------------------------------

             Summary: Slider AM has a race condition in port allocation
                 Key: SLIDER-1121
                 URL: https://issues.apache.org/jira/browse/SLIDER-1121
             Project: Slider
          Issue Type: Bug
            Reporter: Sidharta Seethana
            Priority: Critical


/cc [~vinodkv], [~gsaha]

When two (or more) slider AMs are launched on a given node, it looks like both 
AMs could attempt to bind to the same port, resulting in AM crash(es). See 
below for an example. 

{code}
16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Cluster provider type is agent
16/05/13 02:34:29 INFO appmaster.SliderAppMaster: RM is at 
y016.boo.hoo.com/172.26.32.116:8030
16/05/13 02:34:29 INFO appmaster.SliderAppMaster: AM for ID 26
16/05/13 02:34:29 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool 
size is 500
16/05/13 02:34:29 INFO impl.ContainerManagementProtocolProxy: 
yarn.client.max-cached-nodemanagers-proxies : 0
16/05/13 02:34:29 INFO ipc.CallQueueManager: Using callQueue class 
java.util.concurrent.LinkedBlockingQueue
16/05/13 02:34:29 INFO ipc.Server: Starting Socket Reader #1 for port 1025
16/05/13 02:34:29 INFO ipc.Server: IPC Server Responder: starting
16/05/13 02:34:29 INFO ipc.Server: IPC Server listener on 1025: starting
16/05/13 02:34:29 INFO appmaster.SliderAppMaster: AM Server is listening at 
y053.boo.hoo.com:1025
16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Starting Yarn registry
16/05/13 02:34:29 INFO imps.CuratorFrameworkImpl: Starting
16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client 
environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client 
environment:host.name=y053.boo.hoo.com

16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client 
environment:java.version=1.8.0_60
16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client 
environment:java.vendor=Oracle Corporation
16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client 
environment:java.home=/usr/jdk64/jdk1.8.0_60/jre
{code}

{code}
at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:914)
        ... 10 more
16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Triggering shutdown of the 
AM: stop:  exit code = 56, FAILED: Port in use: 0.0.0.0:1025;
16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Process has exited with exit 
code 0 mapped to 0 -ignoring
16/05/13 02:34:29 INFO workflow.WorkflowCompositeService: Child service 
completed Service RoleLaunchService in state RoleLaunchService: STOPPED
16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Setting stopInitiated flag to 
true
16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Container release timeout in 
millis = 0
16/05/13 02:34:29 INFO state.AppState: Releasing 1 containers
16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Application completed. 
Signalling finish to RM
16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Unregistering AM 
status=FAILED message=Port in use: 0.0.0.0:1025
16/05/13 02:34:29 INFO impl.AMRMClientImpl: Waiting for application to be 
successfully unregistered.
Exception: Port in use: 0.0.0.0:1025
16/05/13 02:34:29 ERROR main.ServiceLauncher: Exception: Port in use: 
0.0.0.0:1025
java.net.BindException: Port in use: 0.0.0.0:1025
        at 
org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:920)
        at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856)
        at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:274)
        at 
org.apache.slider.server.appmaster.SliderAppMaster.deployWebApplication(SliderAppMaster.java:1106)
        at 
org.apache.slider.server.appmaster.SliderAppMaster.createAndRunCluster(SliderAppMaster.java:992)
        at 
org.apache.slider.server.appmaster.SliderAppMaster.runService(SliderAppMaster.java:580)
        at 
org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)
        at 
org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475)
        at 
org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403)
{code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to