[ 
https://issues.apache.org/jira/browse/SLIDER-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285354#comment-15285354
 ] 

Jonathan Maron commented on SLIDER-1121:
----------------------------------------

I suppose in that instance you'd just have to call WebApp.port() to get the 
actual value of the bound port (and overwrite any config items that may have 
used the requested port if they're not the same)

> Slider AM has a race condition in port allocation
> -------------------------------------------------
>
>                 Key: SLIDER-1121
>                 URL: https://issues.apache.org/jira/browse/SLIDER-1121
>             Project: Slider
>          Issue Type: Bug
>    Affects Versions: Slider 0.90.2
>            Reporter: Sidharta Seethana
>            Assignee: Billie Rinaldi
>            Priority: Critical
>             Fix For: Slider 0.91
>
>         Attachments: SLIDER-1121.1.patch, SLIDER-1121.2.patch
>
>
> /cc [~vinodkv], [~gsaha]
> When two (or more) slider AMs are launched on a given node, it looks like 
> both AMs could attempt to bind to the same port, resulting in AM crash(es). 
> See below for an example. 
> {code}
> 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Cluster provider type is 
> agent
> 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: RM is at 
> y016.boo.hoo.com/172.26.32.116:8030
> 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: AM for ID 26
> 16/05/13 02:34:29 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool 
> size is 500
> 16/05/13 02:34:29 INFO impl.ContainerManagementProtocolProxy: 
> yarn.client.max-cached-nodemanagers-proxies : 0
> 16/05/13 02:34:29 INFO ipc.CallQueueManager: Using callQueue class 
> java.util.concurrent.LinkedBlockingQueue
> 16/05/13 02:34:29 INFO ipc.Server: Starting Socket Reader #1 for port 1025
> 16/05/13 02:34:29 INFO ipc.Server: IPC Server Responder: starting
> 16/05/13 02:34:29 INFO ipc.Server: IPC Server listener on 1025: starting
> 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: AM Server is listening at 
> y053.boo.hoo.com:1025
> 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Starting Yarn registry
> 16/05/13 02:34:29 INFO imps.CuratorFrameworkImpl: Starting
> 16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client 
> environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
> 16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client 
> environment:host.name=y053.boo.hoo.com
> 16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client 
> environment:java.version=1.8.0_60
> 16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client 
> environment:java.vendor=Oracle Corporation
> 16/05/13 02:34:29 INFO zookeeper.ZooKeeper: Client 
> environment:java.home=/usr/jdk64/jdk1.8.0_60/jre
> {code}
> {code}
> at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:914)
>       ... 10 more
> 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Triggering shutdown of the 
> AM: stop:  exit code = 56, FAILED: Port in use: 0.0.0.0:1025;
> 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Process has exited with 
> exit code 0 mapped to 0 -ignoring
> 16/05/13 02:34:29 INFO workflow.WorkflowCompositeService: Child service 
> completed Service RoleLaunchService in state RoleLaunchService: STOPPED
> 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Setting stopInitiated flag 
> to true
> 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Container release timeout 
> in millis = 0
> 16/05/13 02:34:29 INFO state.AppState: Releasing 1 containers
> 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Application completed. 
> Signalling finish to RM
> 16/05/13 02:34:29 INFO appmaster.SliderAppMaster: Unregistering AM 
> status=FAILED message=Port in use: 0.0.0.0:1025
> 16/05/13 02:34:29 INFO impl.AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> Exception: Port in use: 0.0.0.0:1025
> 16/05/13 02:34:29 ERROR main.ServiceLauncher: Exception: Port in use: 
> 0.0.0.0:1025
> java.net.BindException: Port in use: 0.0.0.0:1025
>       at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:920)
>       at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856)
>       at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:274)
>       at 
> org.apache.slider.server.appmaster.SliderAppMaster.deployWebApplication(SliderAppMaster.java:1106)
>       at 
> org.apache.slider.server.appmaster.SliderAppMaster.createAndRunCluster(SliderAppMaster.java:992)
>       at 
> org.apache.slider.server.appmaster.SliderAppMaster.runService(SliderAppMaster.java:580)
>       at 
> org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:188)
>       at 
> org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:475)
>       at 
> org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:403)
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to