[ 
https://issues.apache.org/jira/browse/GIRAPH-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142785#comment-13142785
 ] 

Avery Ching commented on GIRAPH-72:
-----------------------------------

Yes, this is a possible problem.  In the past, I've tried to grab all the map 
slots of a given task tracker via the appropriate memory configuration.  Right 
now, it's kind of nice to have the ports correspond to the task partition for 
debugability.  Would love to hear any other ideas.  
                
> Running multiple Giraph jobs on the same cluster can lead to port collisions
> ----------------------------------------------------------------------------
>
>                 Key: GIRAPH-72
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-72
>             Project: Giraph
>          Issue Type: Bug
>          Components: lib, zookeeper
>    Affects Versions: 0.70.0
>         Environment: production hadoop cluster, in-process ZK.
>            Reporter: Jake Mannix
>
> Had a Giraph mini-hackathon at work today, and lots of us launched 
> simultaneous test jobs at the same time, and often ran into the following 
> collision:
> ------
> startSuperstep: WORKER_ONLY - Attempt=0, Superstep=-1
> 2-Nov-2011 23:40:08
> java.net.BindException: Problem binding to <hostname>/<hostIP>:30000 : 
> Address already in use
>       at org.apache.hadoop.ipc.Server.bind(Server.java:196)
>       at org.apache.hadoop.ipc.Server$Listener.(Server.java:259)
>       at org.apache.hadoop.ipc.Server.(Server.java:1039)
>       at org.apache.hadoop.ipc.RPC$Server.(RPC.java:492)
>       at org.apache.hadoop.ipc.RPC.getServer(RPC.java:454)
>       at 
> org.apache.giraph.comm.RPCCommunications.getRPCServer(RPCCommunications.java:99)
>       at 
> org.apache.giraph.comm.BasicRPCCommunications.(BasicRPCCommunications.java:362)
>       at org.apache.giraph.comm.RPCCommunications.(RPCCommunications.java:71)
>       at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:570)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.net.BindException: Address already in use
>       at sun.nio.ch.Net.bind(Native Method)
>       at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
>       at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>       at org.apache.hadoop.ipc.Server.bind(Server.java:194)
>       ... 12 more
> ----
> The job then simply hung.  What it should do, I'd imagine, is at a bare 
> minimum, catch this exception and allow the task to die quickly so it can get 
> retried on another machine, or better yet, allow for a command-line arg at 
> startup (and then passed into the Configuration) decide what ports to use.  
> Best yet, something automagic which allows multiple GraphMappers on the same 
> machine without manually picking ports (pick one at random and store it in 
> zookeeper?  but then what about the in-process zookeeper...) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to