[jira] [Commented] (KAFKA-7725) Add a delay for further CG rebalances, beyond KIP-134 group.initial.rebalance.delay.ms

2018-12-29 Thread Boyang Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730800#comment-16730800
 ] 

Boyang Chen commented on KAFKA-7725:


Thanks [~astubbs] for sharing this issue. Will add this to the KIP-345 Jira 
list.

> Add a delay for further CG rebalances, beyond KIP-134 
> group.initial.rebalance.delay.ms
> --
>
> Key: KAFKA-7725
> URL: https://issues.apache.org/jira/browse/KAFKA-7725
> Project: Kafka
>  Issue Type: New Feature
>  Components: clients, consumer, core
>Affects Versions: 2.1.0
>Reporter: Antony Stubbs
>Priority: Major
>
> KIP-134 group.initial.rebalance.delay.ms was a good start, but there are much 
> bigger problems where after a system is up and running, consumers can leave 
> and join in large amounts, causing rebalance storms. One example is 
> Mesosphere deploying new versions of an app - say there are 10 instances, 
> then 10 more instances are deployed with the new version, then the old 10 are 
> scaled down. Ideally this would be 1 or 2 rebalances, instead of 20.
> The trade off is that if the delay is 5 seconds, every consumer joining 
> within that window would extend it by another 5 seconds, potentially causing 
> partitions to never be processed. To mitigate this, either a max rebalance 
> delay could also be added, or multiple consumers joining won't extend the 
> rebalance delay, so that it's always a max of 5 seconds.
> Related: [KIP-345: Introduce static membership protocol to reduce consumer 
> rebalances|https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances]
> KAFKA-7018: persist memberId for consumer restart



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6863) Kafka clients should try to use multiple DNS resolved IP addresses if the first one fails

2018-12-29 Thread Chris Bogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Bogan updated KAFKA-6863:
---
Attachment: www.apache.orglicensesLICENSE-2.0.pdf

> Kafka clients should try to use multiple DNS resolved IP addresses if the 
> first one fails
> -
>
> Key: KAFKA-6863
> URL: https://issues.apache.org/jira/browse/KAFKA-6863
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 2.1.0
>
> Attachments: www.apache.orglicensesLICENSE-2.0.pdf
>
>
> Currently Kafka clients resolve a symbolic hostname using
>   {{new InetSocketAddress(String hostname, int port)}}
> which only picks one IP address even if the DNS has multiple records for the 
> hostname, as it calls
>  {{InetAddress.getAllByName(host)[0]}}
> For some environments where the hostnames are mapped by the DNS to multiple 
> IPs, e.g. in clouds where the IPs point to the external load balancers, it 
> would be preferable that the client, on failing to connect to one of the IPs, 
> would try the other ones before giving up the connection.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7755) Kubernetes - Kafka clients are resolving DNS entries only one time

2018-12-29 Thread Chris Bogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Bogan updated KAFKA-7755:
---
Attachment: pom.xml

> Kubernetes - Kafka clients are resolving DNS entries only one time
> --
>
> Key: KAFKA-7755
> URL: https://issues.apache.org/jira/browse/KAFKA-7755
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 2.1.0, 2.2.0, 2.1.1
> Environment: Kubernetes
>Reporter: Loïc Monney
>Priority: Blocker
> Attachments: pom.xml
>
>
> *Introduction*
>  Since 2.1.0 Kafka clients are supporting multiple DNS resolved IP addresses 
> if the first one fails. This change has been introduced by 
> https://issues.apache.org/jira/browse/KAFKA-6863. However this DNS resolution 
> is now performed only one time by the clients. This is not a problem if all 
> brokers have fixed IP addresses, however this is definitely an issue when 
> Kafka brokers are run on top of Kubernetes. Indeed, new Kubernetes pods will 
> receive another IP address, so as soon as all brokers will have been 
> restarted clients won't be able to reconnect to any broker.
> *Impact*
>  Everyone running Kafka 2.1 or later on top of Kubernetes is impacted when a 
> rolling restart is performed.
> *Root cause*
>  Since https://issues.apache.org/jira/browse/KAFKA-6863 Kafka clients are 
> resolving DNS entries only once.
> *Proposed solution*
>  In 
> [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/ClusterConnectionStates.java#L368]
>  Kafka clients should perform the DNS resolution again when all IP addresses 
> have been "used" (when _index_ is back to 0)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7728) Add JoinReason to the join group request for better rebalance handling

2018-12-29 Thread Boyang Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730594#comment-16730594
 ] 

Boyang Chen commented on KAFKA-7728:


[~enether] Thanks for the thoughts! I think the compatibility should be 
considered as the JoinGroupRequest version will be bumped. I already come up 
some common join reasons for a potential enum type:
{code:java}
public enum JoinGroupReason {
  BLIND("blind"), // Join request from a start-up consumer 
  SELF_META_CHANGE("self_meta_change"), // The consumer metadata has changed
  TOPIC_METAD_CHANGE("topic_meta_change"); // The topic metadata changed (must 
be from the leader)
}
{code}
 the self metadata change might be trivial to realize now, but I think it would 
be better to discuss more scenarios before we finalize anything. Let's 
brainstorm on more join reasons that will be helpful for the broker to make 
decision.

> Add JoinReason to the join group request for better rebalance handling
> --
>
> Key: KAFKA-7728
> URL: https://issues.apache.org/jira/browse/KAFKA-7728
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Boyang Chen
>Assignee: Mayuresh Gharat
>Priority: Major
>  Labels: consumer, mirror-maker, needs-kip
>
> Recently [~mgharat] and I discussed about the current rebalance logic on 
> leader join group request handling. So far we blindly trigger rebalance when 
> the leader rejoins. The caveat is that KIP-345 is not covering this effort 
> and if a consumer group is not using sticky assignment but using other 
> strategy like round robin, the redundant rebalance could still shuffle the 
> topic partitions around consumers. (for example mirror maker application)
> I checked on broker side and here is what we currently do:
>  
> {code:java}
> if (group.isLeader(memberId) || !member.matches(protocols))  
> // force a rebalance if a member has changed metadata or if the leader sends 
> JoinGroup. 
> // The latter allows the leader to trigger rebalances for changes affecting 
> assignment 
> // which do not affect the member metadata (such as topic metadata changes 
> for the consumer) {code}
> Based on the broker logic, we only need to trigger rebalance for leader 
> rejoin when the topic metadata change has happened. I also looked up the 
> ConsumerCoordinator code on client side, and found out the metadata 
> monitoring logic here:
> {code:java}
> public boolean rejoinNeededOrPending() {
> ...
> // we need to rejoin if we performed the assignment and metadata has changed
> if (assignmentSnapshot != null && 
> !assignmentSnapshot.equals(metadataSnapshot))
>   return true;
> }{code}
>  I guess instead of just returning true, we could introduce a new enum field 
> called JoinReason which could indicate the purpose of the rejoin. Thus we 
> don't need to do a full rebalance when the leader is just in rolling bounce.
> We could utilize this information I guess. Just add another enum field into 
> the join group request called JoinReason so that we know whether leader is 
> rejoining due to topic metadata change. If yes, we trigger rebalance 
> obviously; if no, we shouldn't trigger rebalance.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)