[ 
https://issues.apache.org/jira/browse/KAFKA-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231324#comment-17231324
 ] 

Tom Bentley commented on KAFKA-10713:
-------------------------------------

The problem is this regex in {{Utils}}
{noformat}
HOST_PORT_PATTERN = 
Pattern.compile(".*?\\[?([0-9a-zA-Z\\-%._:]*)\\]?:([0-9]+)");
{noformat}

The initial {{.*?}}, although a non-greedy quantifier, matches everything upto 
and including the last semicolon because the rest of the expression cannot 
match sooner. 
This pattern is supposed to match
{code}
host:port and protocol:\\host:port 
{code}
according to the code comment. [RFC 
2396|https://tools.ietf.org/html/rfc2396#section-3.1] says URL schema has to 
match {{scheme = alpha *( alpha | digit | "+" | "-" | "." )}}. That's ASCII 
alpha. Due to the historic laxity in parsing the protocol/scheme part _perhaps_ 
we should allow any unicode letter, rather than limiting to ASCII alpha, so 
avoid breaking anyone who has been using non-ASCII characters.

> Surprising behaviour when bootstrap servers are separated by semicolons
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-10713
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10713
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Mickael Maison
>            Assignee: Tom Bentley
>            Priority: Major
>
> When creating a Kafka client with {{bootstrap.servers}} set to 
> "kafka-0:9092;kafka-1:9092;kafka-2:9092", it has a strange behaviour.
> For once, there's no warning or error messages. The client will connect and 
> start working. However, it will only use the hostname after the last 
> semicolon as bootstrap server!
> The configuration {{bootstrap.servers}} is defined as a {{List}} in 
> {{AbstractConfig}}. So from a configuration point of view, 
> "kafka-0:9092;kafka-1:9092;kafka-2:9092" is a single entry.
> Then, {{Utils.getHost()}} returns "kafka-2" when parsing that string.
> {code:java}
> assertEquals("kafka-2", getHost("kafka-1:9092;kafka-1:9092;kafka-2:9092"));
> {code}
> So the client ends up with a single bootstrap server! 
> I believe semicolon are not valid characters in hostname/domain names, so we 
> should be able to provide better validation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to