Worker ports are not synched properly with its peers
----------------------------------------------------

                 Key: GIRAPH-154
                 URL: https://issues.apache.org/jira/browse/GIRAPH-154
             Project: Giraph
          Issue Type: Bug
          Components: bsp
    Affects Versions: 0.2.0
            Reporter: Zhiwei Gu
            Assignee: Zhiwei Gu


When worker trying multiple ports to setup the rpc server, the final port is 
not synched with it's peer workers properly, and resulted in peer workers send 
message to the default port.

Here is some logs:

############################################################################
Base port: 34900
############################################################################

############################################################################
log for worker 161:
############################################################################
IPC Server handler 98 on 36061: starting
BasicRPCCommunications: Started RPC communication server: 
gsta32085.tan.ygrid.yahoo.com/10.216.148.47:36061 with 100 handlers and 199 
flush threads on bind attempt 1
IPC Server handler 99 on 36061: starting
setup: Registering health of this worker...
getJobState: Job state already exists 
(/_hadoopBsp/job_201203130609_14838/_masterJobState)
getApplicationAttempt: Node 
/_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists!
getApplicationAttempt: Node 
/_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists!
registerHealth: Created my health node for attempt=0, superstep=-1 with 
/_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta32085.tan.ygrid.yahoo.com_161
 and workerInfo= Worker(hostname=gsta32085.tan.ygrid.yahoo.com, 
MRpartition=161, port=35061)
process: partitionAssignmentsReadyChanged (partitions are assigned)
startSuperstep: Ready for computation on superstep -1 since worker selection 
and vertex range assignments are done in 
/_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_partitionAssignments
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 0 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 1 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 2 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 3 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 4 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 5 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 6 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 7 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 8 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 9 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 10 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 11 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 12 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 13 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 14 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 15 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 16 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 17 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 18 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 19 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 20 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 21 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 22 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 23 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 24 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 25 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 26 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 27 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 28 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 29 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 30 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 31 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 32 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 33 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 34 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 35 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 36 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 37 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 38 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 39 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 40 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 41 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 42 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 43 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 44 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 45 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 46 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 47 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 48 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 49 time(s).
PriviledgedActionException as:job_201203130609_14838 (auth:SIMPLE) 
cause:java.net.ConnectException: Call to 
gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061 failed on connection 
exception: java.net.ConnectException: Connection refused
connectAllRPCProxys: Failed on attempt 0 of 5 to connect to 
(id=33,cur=Worker(hostname=gsta32085.tan.ygrid.yahoo.com, MRpartition=161, 
port=35061),prev=null,ckpt_file=null)
java.net.ConnectException: Call to 
gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061 failed on connection 
exception: java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
        at org.apache.hadoop.ipc.Client.call(Client.java:1071)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
        at $Proxy8.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:420)
        at 
org.apache.giraph.comm.RPCCommunications$1.run(RPCCommunications.java:159)
        at 
org.apache.giraph.comm.RPCCommunications$1.run(RPCCommunications.java:155)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082)
        at 
org.apache.giraph.comm.RPCCommunications.getRPCProxy(RPCCommunications.java:153)
        at 
org.apache.giraph.comm.RPCCommunications.getRPCProxy(RPCCommunications.java:51)
        at 
org.apache.giraph.comm.BasicRPCCommunications.startPeerConnectionThread(BasicRPCCommunications.java:599)
        at 
org.apache.giraph.comm.BasicRPCCommunications.connectAllRPCProxys(BasicRPCCommunications.java:542)
        at 
org.apache.giraph.comm.BasicRPCCommunications.setup(BasicRPCCommunications.java:513)
        at 
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:550)
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
        at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
        at org.apache.hadoop.ipc.Client.call(Client.java:1046)
        ... 25 more


############################################################################
log for worker 154
############################################################################
PriviledgedActionException as:job_201203130609_14838 (auth:SIMPLE) 
cause:java.net.ConnectException: Call to 
gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061 failed on connection 
exception: java.net.ConnectException: Connection refused
connectAllRPCProxys: Failed on attempt 4 of 5 to connect to 
(id=33,cur=Worker(hostname=gsta32085.tan.ygrid.yahoo.com, MRpartition=161, 
port=35061),prev=null,ckpt_file=null)
java.net.ConnectException: Call to 
gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061 failed on connection 
exception: java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
        at org.apache.hadoop.ipc.Client.call(Client.java:1071)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
        at $Proxy8.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:420)
        at 
org.apache.giraph.comm.RPCCommunications$1.run(RPCCommunications.java:159)
        at 
org.apache.giraph.comm.RPCCommunications$1.run(RPCCommunications.java:155)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082)
        at 
org.apache.giraph.comm.RPCCommunications.getRPCProxy(RPCCommunications.java:153)
        at 
org.apache.giraph.comm.RPCCommunications.getRPCProxy(RPCCommunications.java:51)
        at 
org.apache.giraph.comm.BasicRPCCommunications.startPeerConnectionThread(BasicRPCCommunications.java:599)
        at 
org.apache.giraph.comm.BasicRPCCommunications.connectAllRPCProxys(BasicRPCCommunications.java:542)
        at 
org.apache.giraph.comm.BasicRPCCommunications.setup(BasicRPCCommunications.java:513)
        at 
org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:550)
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
        at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
        at org.apache.hadoop.ipc.Client.call(Client.java:1046)
        ... 25 more



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to