[
https://issues.apache.org/jira/browse/TWILL-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453489#comment-15453489
]
ASF GitHub Bot commented on TWILL-173:
--------------------------------------
Github user albertshau commented on a diff in the pull request:
https://github.com/apache/twill/pull/9#discussion_r77081138
--- Diff:
twill-core/src/main/java/org/apache/twill/internal/kafka/EmbeddedKafkaServer.java
---
@@ -65,9 +72,19 @@ protected void startUp() throws Exception {
if (rootCause instanceof ZkTimeoutException) {
// Potentially caused by race condition bug described in
TWILL-139.
LOG.warn("Timeout when connecting to ZooKeeper from KafkaServer.
Attempt number {}.", tries, rootCause);
+ } else if (rootCause instanceof BindException) {
+ LOG.warn("Kafka failed to bind to port {}. Attempt number {}.",
kafkaConfig.port(), tries, rootCause);
} else {
throw e;
}
+
+ // Do a random sleep of < 200ms
+ TimeUnit.MILLISECONDS.sleep(new Random().nextInt(200) + 1L);
+
+ // Generate a new port for the Kafka
+ int port = Networks.getRandomPort();
+ Preconditions.checkState(port > 0, "Failed to get random port.");
+ properties.setProperty("port", Integer.toString(port));
--- End diff --
so "port" is required to be in the Properties passed to KafkaConfig? Is
there any place where a port is being set in the Properties passed to the
constructor, and then used somewhere else?
> Application Master failed with BindException occasionally
> ---------------------------------------------------------
>
> Key: TWILL-173
> URL: https://issues.apache.org/jira/browse/TWILL-173
> Project: Apache Twill
> Issue Type: Bug
> Components: core, yarn
> Affects Versions: 0.6.0-incubating, 0.7.0-incubating
> Reporter: Terence Yim
> Fix For: 0.8.0
>
>
> When the AM starts the embedded Kafka, it first generates a random port (by
> creating a server socket), followed by provided that port for the Kafka
> server to bind to. It is possible that after the random port was acquired and
> before Kafka server bind to it, there is another process on the same box that
> took that port.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)