[ https://issues.apache.org/jira/browse/TWILL-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453506#comment-15453506 ]
ASF GitHub Bot commented on TWILL-173: -------------------------------------- Github user chtyim commented on a diff in the pull request: https://github.com/apache/twill/pull/9#discussion_r77082082 --- Diff: twill-core/src/main/java/org/apache/twill/internal/kafka/EmbeddedKafkaServer.java --- @@ -65,9 +72,19 @@ protected void startUp() throws Exception { if (rootCause instanceof ZkTimeoutException) { // Potentially caused by race condition bug described in TWILL-139. LOG.warn("Timeout when connecting to ZooKeeper from KafkaServer. Attempt number {}.", tries, rootCause); + } else if (rootCause instanceof BindException) { + LOG.warn("Kafka failed to bind to port {}. Attempt number {}.", kafkaConfig.port(), tries, rootCause); } else { throw e; } + + // Do a random sleep of < 200ms + TimeUnit.MILLISECONDS.sleep(new Random().nextInt(200) + 1L); + + // Generate a new port for the Kafka + int port = Networks.getRandomPort(); + Preconditions.checkState(port > 0, "Failed to get random port."); + properties.setProperty("port", Integer.toString(port)); --- End diff -- Currently it's set in the ApplicationMasterService. Probably better to have it left empty (or set to 0) in order to trigger this logic. > Application Master failed with BindException occasionally > --------------------------------------------------------- > > Key: TWILL-173 > URL: https://issues.apache.org/jira/browse/TWILL-173 > Project: Apache Twill > Issue Type: Bug > Components: core, yarn > Affects Versions: 0.6.0-incubating, 0.7.0-incubating > Reporter: Terence Yim > Fix For: 0.8.0 > > > When the AM starts the embedded Kafka, it first generates a random port (by > creating a server socket), followed by provided that port for the Kafka > server to bind to. It is possible that after the random port was acquired and > before Kafka server bind to it, there is another process on the same box that > took that port. -- This message was sent by Atlassian JIRA (v6.3.4#6332)