[ https://issues.apache.org/jira/browse/TWILL-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453529#comment-15453529 ]
ASF GitHub Bot commented on TWILL-173: -------------------------------------- Github user albertshau commented on a diff in the pull request: https://github.com/apache/twill/pull/9#discussion_r77083133 --- Diff: twill-core/src/main/java/org/apache/twill/internal/kafka/EmbeddedKafkaServer.java --- @@ -65,9 +72,19 @@ protected void startUp() throws Exception { if (rootCause instanceof ZkTimeoutException) { // Potentially caused by race condition bug described in TWILL-139. LOG.warn("Timeout when connecting to ZooKeeper from KafkaServer. Attempt number {}.", tries, rootCause); + } else if (rootCause instanceof BindException) { + LOG.warn("Kafka failed to bind to port {}. Attempt number {}.", kafkaConfig.port(), tries, rootCause); } else { throw e; } + + // Do a random sleep of < 200ms + TimeUnit.MILLISECONDS.sleep(new Random().nextInt(200) + 1L); + + // Generate a new port for the Kafka + int port = Networks.getRandomPort(); + Preconditions.checkState(port > 0, "Failed to get random port."); + properties.setProperty("port", Integer.toString(port)); --- End diff -- Should we only do this if its originally set to 0 or left empty? Without reading the code I would expect it to connect to the port I set, or not connect at all. Or is this not a concern because the port is not used to connect? > Application Master failed with BindException occasionally > --------------------------------------------------------- > > Key: TWILL-173 > URL: https://issues.apache.org/jira/browse/TWILL-173 > Project: Apache Twill > Issue Type: Bug > Components: core, yarn > Affects Versions: 0.6.0-incubating, 0.7.0-incubating > Reporter: Terence Yim > Fix For: 0.8.0 > > > When the AM starts the embedded Kafka, it first generates a random port (by > creating a server socket), followed by provided that port for the Kafka > server to bind to. It is possible that after the random port was acquired and > before Kafka server bind to it, there is another process on the same box that > took that port. -- This message was sent by Atlassian JIRA (v6.3.4#6332)