[ https://issues.apache.org/jira/browse/KAFKA-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697662#comment-14697662 ]
Andrew Olson commented on KAFKA-2172: ------------------------------------- [~jjkoshy] I've implemented a new assignment algorithm similar to what Bryan described above that appears to work reasonably well across a wide variety of scenarios - see KAFKA-2435. > Round-robin partition assignment strategy too restrictive > --------------------------------------------------------- > > Key: KAFKA-2172 > URL: https://issues.apache.org/jira/browse/KAFKA-2172 > Project: Kafka > Issue Type: Bug > Reporter: Jason Rosenberg > > The round-ropin partition assignment strategy, was introduced for the > high-level consumer, starting with 0.8.2.1. This appears to be a very > attractive feature, but it has an unfortunate restriction, which prevents it > from being easily utilized. That is that it requires all consumers in the > consumer group have identical topic regex selectors, and that they have the > same number of consumer threads. > It turns out this is not always the case for our deployments. It's not > unusual to run multiple consumers within a single process (with different > topic selectors), or we might have multiple processes dedicated for different > topic subsets. Agreed, we could change these to have separate group ids for > each sub topic selector (but unfortunately, that's easier said than done). > In several cases, we do at least have separate client.ids set for each > sub-consumer, so it would be incrementally better if we could at least loosen > the requirement such that each set of topics selected by a groupid/clientid > pair are the same. > But, if we want to do a rolling restart for a new version of a consumer > config, the cluster will likely be in a state where it's not possible to have > a single config until the full rolling restart completes across all nodes. > This results in a consumer outage while the rolling restart is happening. > Finally, it's especially problematic if we want to canary a new version for a > period before rolling to the whole cluster. > I'm not sure why this restriction should exist (as it obviously does not > exist for the 'range' assignment strategy). It seems it could be made to > work reasonably well with heterogenous topic selection and heterogenous > thread counts. The documentation states that "The round-robin partition > assignor lays out all the available partitions and all the available consumer > threads. It then proceeds to do a round-robin assignment from partition to > consumer thread." > If the assignor can "lay out all the available partitions and all the > available consumer threads", it should be able to uniformly assign partitions > to the available threads. In each case, if a thread belongs to a consumer > that doesn't have that partition selected, just move to the next available > thread that does have the selection, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)