> On May 20, 2015, 5:15 p.m., Onur Karaman wrote: > > I only did a brief skim. This optimization tries to switch consumers over > > to a new coordinator without a rebalance. From my understanding, the > > consumers would detect a coordinator failure, discover the new coordinator > > to work with, and try heartbeating that new coordinator withouth a > > rebalance. > > > > So it seems to me that putting the logic in handleJoinGroup isn't right, as > > the rebalance is what we're trying to avoid. The code should be in > > handleHeartbeat. It should lookup zk for the group info, add it to > > CoordinatorMetadata, and start up a DelayedHeartbeat for every consumer of > > that group. > > > > **More importantly: given that this is just an optimization, and we haven't > > even seen the performance hit without this, I think KAFKA-2017 should be > > very low priority.** > > > > The following are higher priority: > > 1. Getting the consumer to properly handle error codes of the join group > > and heartbeat responses. > > 2. Getting the consumer to detect coordinator failures and switch over to > > another coordinator (my KAFKA-1334 patch just had the coordinator detect > > consumer failures). A nice benefit of completing this first is that if we > > decide that the rebalances on coordinator failover are an actual issue, > > this would greatly facilitate testing any coordinator failover logic. Right > > now, it's unclear how this rb's logic can be tested.
I added a ticket for 2: [KAFKA-2208](https://issues.apache.org/jira/browse/KAFKA-2208) - Onur ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34450/#review84539 ----------------------------------------------------------- On May 20, 2015, 4:13 p.m., Guozhang Wang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/34450/ > ----------------------------------------------------------- > > (Updated May 20, 2015, 4:13 p.m.) > > > Review request for kafka. > > > Bugs: KAFKA-2017 > https://issues.apache.org/jira/browse/KAFKA-2017 > > > Repository: kafka > > > Description > ------- > > 1. Upon receiving join-group, if the group metadata cannot be found in the > local cache try to read it from ZK; 2. Upon completing rebalance, update the > ZK with new group registry or delete the registry if the group becomes empty > > > Diffs > ----- > > core/src/main/scala/kafka/coordinator/ConsumerCoordinator.scala > af06ad45cdc46ac3bc27898ebc1a5bd5b1c7b19e > core/src/main/scala/kafka/coordinator/ConsumerGroupMetadata.scala > 47bdfa7cc86fd4e841e2b1d6bfd40f1508e643bd > core/src/main/scala/kafka/coordinator/CoordinatorMetadata.scala > c39e6de34ee531c6dfa9107b830752bd7f8fbe59 > core/src/main/scala/kafka/utils/ZkUtils.scala > 2618dd39b925b979ad6e4c0abd5c6eaafb3db5d5 > > Diff: https://reviews.apache.org/r/34450/diff/ > > > Testing > ------- > > > Thanks, > > Guozhang Wang > >