[ 
https://issues.apache.org/jira/browse/TWILL-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453506#comment-15453506
 ] 

ASF GitHub Bot commented on TWILL-173:
--------------------------------------

Github user chtyim commented on a diff in the pull request:

    https://github.com/apache/twill/pull/9#discussion_r77082082
  
    --- Diff: 
twill-core/src/main/java/org/apache/twill/internal/kafka/EmbeddedKafkaServer.java
 ---
    @@ -65,9 +72,19 @@ protected void startUp() throws Exception {
             if (rootCause instanceof ZkTimeoutException) {
               // Potentially caused by race condition bug described in 
TWILL-139.
               LOG.warn("Timeout when connecting to ZooKeeper from KafkaServer. 
Attempt number {}.", tries, rootCause);
    +        } else if (rootCause instanceof BindException) {
    +          LOG.warn("Kafka failed to bind to port {}. Attempt number {}.", 
kafkaConfig.port(), tries, rootCause);
             } else {
               throw e;
             }
    +
    +        // Do a random sleep of < 200ms
    +        TimeUnit.MILLISECONDS.sleep(new Random().nextInt(200) + 1L);
    +
    +        // Generate a new port for the Kafka
    +        int port = Networks.getRandomPort();
    +        Preconditions.checkState(port > 0, "Failed to get random port.");
    +        properties.setProperty("port", Integer.toString(port));
    --- End diff --
    
    Currently it's set in the ApplicationMasterService. Probably better to have 
it left empty (or set to 0) in order to trigger this logic.


> Application Master failed with BindException occasionally
> ---------------------------------------------------------
>
>                 Key: TWILL-173
>                 URL: https://issues.apache.org/jira/browse/TWILL-173
>             Project: Apache Twill
>          Issue Type: Bug
>          Components: core, yarn
>    Affects Versions: 0.6.0-incubating, 0.7.0-incubating
>            Reporter: Terence Yim
>             Fix For: 0.8.0
>
>
> When the AM starts the embedded Kafka, it first generates a random port (by 
> creating a server socket), followed by provided that port for the Kafka 
> server to bind to. It is possible that after the random port was acquired and 
> before Kafka server bind to it, there is another process on the same box that 
> took that port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to