[ 
https://issues.apache.org/jira/browse/HAMA-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724907#comment-13724907
 ] 

MaoYuan Xian commented on HAMA-789:
-----------------------------------

{code}
child() {
       ....
       final BSPTask task = (BSPTask) umbilical.getTask(taskid);
-      int peerPort = umbilical.getAssignedPortNum(taskid);
+      int peerPort = Constants.DEFAULT_PEER_PORT;
+      peerPort = BSPNetUtils.getNextAvailable(peerPort);
}
{code}
This part of codes should be alright, because the peerPort value is not really 
used by the subsequent codes, new added function initializeMessaging() find a 
random port to overwrite it.

But, seems we need an addition line of code like below:
{code}
    ...
    // This function call may change the peer address
    initializeMessaging();
    + conf.setInt(Constants.PEER_PORT, peerAddress.getPort());
    ...
    initializeSyncService(superstep, state);
    ...
{code}

Without the "conf.setInt(Constants.PEER_PORT, peerAddress.getPort()", the 
function initializeSyncService called after this will use a wrong port, maybe 
serveral children will register in the same zookeeper node.
                
> BspPeer launched fail because port is bound by others
> -----------------------------------------------------
>
>                 Key: HAMA-789
>                 URL: https://issues.apache.org/jira/browse/HAMA-789
>             Project: Hama
>          Issue Type: Bug
>          Components: bsp core
>    Affects Versions: 0.6.2
>            Reporter: MaoYuan Xian
>            Assignee: Suraj Menon
>             Fix For: 0.6.3
>
>         Attachments: HAMA-789.patch
>
>
> In GroomServer, we call BSPNetUtils.getNextAvailable to assigning the bsppeer 
> listening port. After figures out an available port, the GroomServer release 
> the port and launches the BspPeer(Child), then the child listens to this port.
> However, during the GroomServer release the port and peer listens the port, 
> if other process in the operation system binds to the same port occasionally, 
> the bsppeer will fail to start up because of "Address already in use" 
> exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to