Thanks for the KIP, Gokul!

I like the overall premise - I think it's more user-friendly to have
configs for this than to have users implement their own config policy -> so
unless it's very complex to implement, it seems worth it.
I agree that having the topic policy on the CreatePartitions path makes
sense as well.

Multi-tenancy was a good point. It would be interesting to see how easy it
is to extend the max partition limit to a per-user basis. Perhaps this can
be done in a follow-up KIP, as a natural extension of the feature.

I'm wondering whether there's a need to enforce this on internal topics,
though. Given they're internal and critical to the function of Kafka, I
believe we'd rather always ensure they're created, regardless if over some
user-set limit. It brings up the question of forward compatibility - what
happens if a user's cluster is at the maximum partition capacity, yet a new
release of Kafka introduces a new topic (e.g KIP-500)?

Best,
Stanislav

On Fri, Apr 24, 2020 at 2:39 PM Gokul Ramanan Subramanian <
gokul24...@gmail.com> wrote:

> Hi Tom.
>
> With KIP-578, we are not trying to model the load on each partition, and
> come up with an exact limit on what the cluster or broker can handle in
> terms of number of partitions. We understand that not all partitions are
> equal, and the actual load per partition varies based on the message size,
> throughput, whether the broker is a leader for that partition or not etc.
>
> What we are trying to achieve with KIP-578 is to disallow a pathological
> number of partitions that will surely put the cluster in bad shape. For
> example, in KIP-578's appendix, we have described a case where we could not
> delete a topic with 30k partitions, because the brokers could not
> handle all the work that needed to be done. We have also described how
> a producer performance test with 10k partitions observed basically 0
> throughput. In these cases, having a limit on number of partitions
> would allow the cluster to produce a graceful error message at topic
> creation time, and prevent the cluster from entering a pathological state.
> These are not just hypotheticals. We definitely see many of these
> pathological cases happen in production, and we would like to avoid them.
>
> The actual limit on number of partitions is something we do not want to
> suggest in the KIP. The limit will depend on various tests that owners of
> their clusters will have to perform, including perf tests, identifying
> topic creation / deletion times, etc. For example, the tests we did for the
> KIP-578 appendix were enough to convince us that we should not have
> anywhere close to 10k partitions on the setup we describe there.
>
> What we want to do with KIP-578 is provide the flexibility to set a limit
> on number of partitions based on tests cluster owners choose to perform.
> Cluster owners can do the tests however often they wish and dynamically
> adjust the limit on number of partitions. For example, we found in our
> production environment that we don't want to have more than 1k partitions
> on an m5.large EC2 broker instances, or more than 300 partitions on a
> t3.medium EC2 broker, for typical produce / consume use cases.
>
> Cluster owners are free to not configure the limit on number of partitions
> if they don't want to spend the time coming up with a limit. The limit
> defaults to INT32_MAX, which is basically infinity in this context, and
> should be practically backwards compatible with current behavior.
>
> Further, the limit on number of partitions should not come in the way of
> rebalancing tools under normal operation. For example, if the partition
> limit per broker is set to 1k, unless the number of partitions comes close
> to 1k, there should be no impact on rebalancing tools. Only when the number
> of partitions comes close to 1k, will rebalancing tools be impacted, but at
> that point, the cluster is already at its limit of functioning (per some
> definition that was used to set the limit in the first place).
>
> Finally, I want to end this long email by suggesting that the partition
> assignment algorithm itself does not consider the load on various
> partitions before assigning partitions to brokers. In other words, it
> treats all partitions as equal. The idea of having a limit on number of
> partitions is not mis-aligned with this tenet.
>
> Thanks.
>
> On Tue, Apr 21, 2020 at 9:39 AM Tom Bentley <tbent...@redhat.com> wrote:
>
> > Hi Gokul,
> >
> > the partition assignment algorithm needs to be aware of the partition
> > > limits.
> > >
> >
> > I agree, if you have limits then anything doing reassignment would need
> > some way of knowing what they were. But the thing is that I'm not really
> > sure how you would decide what the limits ought to be.
> >
> >
> > > To illustrate this, imagine that you have 3 brokers (1, 2 and 3),
> > > with 10, 20 and 30 partitions each respectively, and a limit of 40
> > > partitions on each broker enforced via a configurable policy class (the
> > one
> > > you recommended). While the policy class may accept a topic creation
> > > request for 11 partitions with a replication factor of 2 each (because
> it
> > > is satisfiable), the non-pluggable partition assignment algorithm (in
> > > AdminUtils.assignReplicasToBrokers and a few other places) has to know
> > not
> > > to assign the 11th partition to broker 3 because it would run out of
> > > partition capacity otherwise.
> > >
> >
> > I know this is only a toy example, but I think it also serves to
> illustrate
> > my point above. How has a limit of 40 partitions been arrived at? In real
> > life different partitions will impart a different load on a broker,
> > depending on all sorts of factors (which topics they're for, the
> throughput
> > and message size for those topics, etc). By saying that a broker should
> not
> > have more than 40 partitions assigned I think you're making a big
> > assumption that all partitions have the same weight. You're also limiting
> > the search space for finding an acceptable assignment. Cluster balancers
> > usually use some kind of heuristic optimisation algorithm for figuring
> out
> > assignments of partitions to brokers, and it could be that the best (or
> at
> > least a good enough) solution requires assigning the least loaded 41
> > partitions to one broker.
> >
> > The point I'm trying to make here is whatever limit is chosen it's
> probably
> > been chosen fairly arbitrarily. Even if it's been chosen based on some
> > empirical evidence of how a particular cluster behaves it's likely that
> > that evidence will become obsolete as the cluster evolves to serve the
> > needs of the business running it (e.g. some hot topic gets repartitioned,
> > messages get compressed with some new algorithm, some new topics need to
> be
> > created). For this reason I think the problem you're trying to solve via
> > policy (whether that was implemented in a pluggable way or not) is really
> > better solved by automating the cluster balancing and having that cluster
> > balancer be able to reason about when the cluster has too few brokers for
> > the number of partitions, rather than placing some limit on the sizing
> and
> > shape of the cluster up front and then hobbling the cluster balancer to
> > work within that.
> >
> > I think it might be useful to describe in the KIP how users would be
> > expected to arrive at values for these configs (both on day 1 and in an
> > evolving production cluster), when this solution might be better than
> using
> > a cluster balancer and/or why cluster balancers can't be trusted to avoid
> > overloading brokers.
> >
> > Kind regards,
> >
> > Tom
> >
> >
> > On Mon, Apr 20, 2020 at 7:22 PM Gokul Ramanan Subramanian <
> > gokul24...@gmail.com> wrote:
> >
> > > This is good reference Tom. I did not consider this approach at all. I
> am
> > > happy to learn about it now.
> > >
> > > However, I think that partition limits are not "yet another" policy
> > > configuration. Instead, they are fundamental to partition assignment.
> > i.e.
> > > the partition assignment algorithm needs to be aware of the partition
> > > limits. To illustrate this, imagine that you have 3 brokers (1, 2 and
> 3),
> > > with 10, 20 and 30 partitions each respectively, and a limit of 40
> > > partitions on each broker enforced via a configurable policy class (the
> > one
> > > you recommended). While the policy class may accept a topic creation
> > > request for 11 partitions with a replication factor of 2 each (because
> it
> > > is satisfiable), the non-pluggable partition assignment algorithm (in
> > > AdminUtils.assignReplicasToBrokers and a few other places) has to know
> > not
> > > to assign the 11th partition to broker 3 because it would run out of
> > > partition capacity otherwise.
> > >
> > > To achieve the ideal end that you are imagining (and I can totally
> > > understand where you are coming from vis-a-vis the extensibility of
> your
> > > solution wrt the one in the KIP), that would require extracting the
> > > partition assignment logic itself into a pluggable class, and for which
> > we
> > > could provide a custom implementation. I am afraid that would add
> > > complexity that I am not sure we want to undertake.
> > >
> > > Do you see sense in what I am saying?
> > >
> > > Thanks.
> > >
> > > On Mon, Apr 20, 2020 at 3:59 PM Tom Bentley <tbent...@redhat.com>
> wrote:
> > >
> > > > Hi Gokul,
> > > >
> > > > Leaving aside the question of how Kafka scales, I think the proposed
> > > > solution, limiting the number of partitions in a cluster or
> per-broker,
> > > is
> > > > a policy which ought to be addressable via the pluggable policies
> (e.g.
> > > > create.topic.policy.class.name). Unfortunately although there's a
> > policy
> > > > for topic creation, it's currently not possible to enforce a policy
> on
> > > > partition increase. It would be more flexible to be able enforce this
> > > kind
> > > > of thing via a pluggable policy, and it would also avoid the
> situation
> > > > where different people each want to have a config which addresses
> some
> > > > specific use case or problem that they're experiencing.
> > > >
> > > > Quite a while ago I proposed KIP-201 to solve this issue with
> policies
> > > > being easily circumvented, but it didn't really make any progress.
> I've
> > > > looked at it again in some detail more recently and I think something
> > > might
> > > > be possible following the work to make all ZK writes happen on the
> > > > controller.
> > > >
> > > > Of course, this is just my take on it.
> > > >
> > > > Kind regards,
> > > >
> > > > Tom
> > > >
> > > > On Thu, Apr 16, 2020 at 11:47 AM Gokul Ramanan Subramanian <
> > > > gokul24...@gmail.com> wrote:
> > > >
> > > > > Hi.
> > > > >
> > > > > For the sake of expediting the discussion, I have created a
> prototype
> > > PR:
> > > > > https://github.com/apache/kafka/pull/8499. Eventually, (if and)
> when
> > > the
> > > > > KIP is accepted, I'll modify this to add the full implementation
> and
> > > > tests
> > > > > etc. in there.
> > > > >
> > > > > Would appreciate if a Kafka committer could share their thoughts,
> so
> > > > that I
> > > > > can more confidently start the voting thread.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > On Thu, Apr 16, 2020 at 11:30 AM Gokul Ramanan Subramanian <
> > > > > gokul24...@gmail.com> wrote:
> > > > >
> > > > > > Thanks for your comments Alex.
> > > > > >
> > > > > > The KIP proposes using two configurations max.partitions and
> > > > > > max.broker.partitions. It does not enforce their use. The default
> > > > values
> > > > > > are pretty large (INT MAX), therefore should be non-intrusive.
> > > > > >
> > > > > > In multi-tenant environments and in partition assignment and
> > > > rebalancing,
> > > > > > the admin could (a) use the default values which would yield
> > similar
> > > > > > behavior to now, (b) set very high values that they know is
> > > sufficient,
> > > > > (c)
> > > > > > dynamically re-adjust the values should the business requirements
> > > > change.
> > > > > > Note that the two configurations are cluster-wide, so they can be
> > > > updated
> > > > > > without restarting the brokers.
> > > > > >
> > > > > > The quota system in Kafka seems to be geared towards limiting
> > traffic
> > > > for
> > > > > > specific clients or users, or in the case of replication, to
> > leaders
> > > > and
> > > > > > followers. The quota configuration itself is very similar to the
> > one
> > > > > > introduced in this KIP i.e. just a few configuration options to
> > > specify
> > > > > the
> > > > > > quota. The main difference is that the quota system is far more
> > > > > > heavy-weight because it needs to be applied to traffic that is
> > > flowing
> > > > > > in/out constantly. Whereas in this KIP, we want to limit number
> of
> > > > > > partition replicas, which gets modified rarely by comparison in a
> > > > typical
> > > > > > cluster.
> > > > > >
> > > > > > Hope this addresses your comments.
> > > > > >
> > > > > > On Thu, Apr 9, 2020 at 12:53 PM Alexandre Dupriez <
> > > > > > alexandre.dupr...@gmail.com> wrote:
> > > > > >
> > > > > >> Hi Gokul,
> > > > > >>
> > > > > >> Thanks for the KIP.
> > > > > >>
> > > > > >> From what I understand, the objective of the new configuration
> is
> > to
> > > > > >> protect a cluster from an overload driven by an excessive number
> > of
> > > > > >> partitions independently from the load handled on the partitions
> > > > > >> themselves. As such, the approach uncouples the data-path load
> > from
> > > > > >> the number of unit of distributions of throughput and intends to
> > > avoid
> > > > > >> the degradation of performance exhibited in the test results
> > > provided
> > > > > >> with the KIP by setting an upper-bound on that number.
> > > > > >>
> > > > > >> Couple of comments:
> > > > > >>
> > > > > >> 900. Multi-tenancy - one concern I would have with a cluster and
> > > > > >> broker-level configuration is that it is possible for a user to
> > > > > >> consume a large proportions of the allocatable partitions within
> > the
> > > > > >> configured limit, leaving other users with not enough partitions
> > to
> > > > > >> satisfy their requirements.
> > > > > >>
> > > > > >> 901. Quotas - an approach in Apache Kafka to set-up an
> upper-bound
> > > on
> > > > > >> resource consumptions is via client/user quotas. Could this
> > > framework
> > > > > >> be leveraged to add this limit?
> > > > > >>
> > > > > >> 902. Partition assignment - one potential problem with the new
> > > > > >> repartitioning scheme is that if a subset of brokers have
> reached
> > > > > >> their number of assignable partitions, yet their data path is
> > > > > >> under-loaded, new topics and/or partitions will be assigned
> > > > > >> exclusively to other brokers, which could increase the
> likelihood
> > of
> > > > > >> data-path load imbalance. Fundamentally, the isolation of the
> > > > > >> constraint on the number of partitions from the data-path
> > throughput
> > > > > >> can have conflicting requirements.
> > > > > >>
> > > > > >> 903. Rebalancing - as a corollary to 902, external tools used to
> > > > > >> balance ingress throughput may adopt an incremental approach in
> > > > > >> partition re-assignment to redistribute load, and could hit the
> > > limit
> > > > > >> on the number of partitions on a broker when a (too)
> conservative
> > > > > >> limit is used, thereby over-constraining the objective function
> > and
> > > > > >> reducing the migration path.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Alexandre
> > > > > >>
> > > > > >> Le jeu. 9 avr. 2020 à 00:19, Gokul Ramanan Subramanian
> > > > > >> <gokul24...@gmail.com> a écrit :
> > > > > >> >
> > > > > >> > Hi. Requesting you to take a look at this KIP and provide
> > > feedback.
> > > > > >> >
> > > > > >> > Thanks. Regards.
> > > > > >> >
> > > > > >> > On Wed, Apr 1, 2020 at 4:28 PM Gokul Ramanan Subramanian <
> > > > > >> > gokul24...@gmail.com> wrote:
> > > > > >> >
> > > > > >> > > Hi.
> > > > > >> > >
> > > > > >> > > I have opened KIP-578, intended to provide a mechanism to
> > limit
> > > > the
> > > > > >> number
> > > > > >> > > of partitions in a Kafka cluster. Kindly provide feedback on
> > the
> > > > KIP
> > > > > >> which
> > > > > >> > > you can find at
> > > > > >> > >
> > > > > >> > >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-578%3A+Add+configuration+to+limit+number+of+partitions
> > > > > >> > >
> > > > > >> > > I want to specially thank Stanislav Kozlovski who helped in
> > > > > >> formulating
> > > > > >> > > some aspects of the KIP.
> > > > > >> > >
> > > > > >> > > Many thanks,
> > > > > >> > >
> > > > > >> > > Gokul.
> > > > > >> > >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
Best,
Stanislav

Reply via email to