[ https://issues.apache.org/jira/browse/SLIDER-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Billie Rinaldi reassigned SLIDER-1121: -------------------------------------- Assignee: Billie Rinaldi > Slider AM has a race condition in port allocation > ------------------------------------------------- > > Key: SLIDER-1121 > URL: https://issues.apache.org/jira/browse/SLIDER-1121 > Project: Slider > Issue Type: Bug > Affects Versions: Slider 0.91 > Reporter: Sidharta Seethana > Assignee: Billie Rinaldi > Priority: Critical > > /cc [~vinodkv], [~gsaha] > When two (or more) slider AMs are launched on a given node, it looks like > both AMs could attempt to bind to the same port, resulting in AM crash(es). > See below for an example. > {code} > 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Cluster provider type is > agent > 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: RM is at > y016.boo.hoo.com/172.26.32.116:8030 > 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: AM for ID 26 > 16/05/13 02:34:29 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool > size is 500 > 16/05/13 02:34:29 INFO impl.ContainerManagementProtocolProxy: > yarn.client.max-cached-nodemanagers-proxies : 0 > 16/05/13 02:34:29 INFO ipc.CallQueueManager: Using callQueue class > java.util.concurrent.LinkedBlockingQueue > 16/05/13 02:34:29 INFO ipc.Server: Starting Socket Reader #1 for port 1025 > 16/05/13 02:34:29 INFO ipc.Server: IPC Server Responder: starting > 16/05/13 02:34:29 INFO ipc.Server: IPC Server listener on 1025: starting > 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: AM Server is listening at > y053.boo.hoo.com:1025 > 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Starting Yarn registry > 16/05/13 02:34:29 INFO imps.CuratorFrameworkImpl: Starting > 16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client > environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT > 16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client > environment:host.name=y053.boo.hoo.com > 16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client > environment:java.version=1.8.0_60 > 16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client > environment:java.vendor=Oracle Corporation > 16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client > environment:java.home=/usr/jdk64/jdk1.8.0_60/jre > {code} > {code} > at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:914) > ... 10 more > 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Triggering shutdown of the > AM: stop: exit code = 56, FAILED: Port in use: 0.0.0.0:1025; > 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Process has exited with > exit code 0 mapped to 0 -ignoring > 16/05/13 02:34:29 INFO workflow.WorkflowCompositeService: Child service > completed Service RoleLaunchService in state RoleLaunchService: STOPPED > 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Setting stopInitiated flag > to true > 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Container release timeout > in millis = 0 > 16/05/13 02:34:29 INFO state.AppState: Releasing 1 containers > 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Application completed. > Signalling finish to RM > 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Unregistering AM > status=FAILED message=Port in use: 0.0.0.0:1025 > 16/05/13 02:34:29 INFO impl.AMRMClientImpl: Waiting for application to be > successfully unregistered. > Exception: Port in use: 0.0.0.0:1025 > 16/05/13 02:34:29 ERROR main.ServiceLauncher: Exception: Port in use: > 0.0.0.0:1025 > java.net.BindException: Port in use: 0.0.0.0:1025 > at > org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:920) > at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856) > at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:274) > at > org.apache.slider.server.appmaster.SliderAppMaster.deployWebApplication(SliderAppMaster.java:1106) > at > org.apache.slider.server.appmaster.SliderAppMaster.createAndRunCluster(SliderAppMaster.java:992) > at > org.apache.slider.server.appmaster.SliderAppMaster.runService(SliderAppMaster.java:580) > at > org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188) > at > org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475) > at > org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)