Re: kafka streams partition assignor strategy for version 2.5.1 - does it use sticky assignment

2023-04-16 Thread Pushkar Deole
Thanks John... however I have few more questions:

How does this configuration work along with static group membership
protocol? Or does this work only with dynamic group membership and not work
well when static membership is configured?

Secondly, I gather that streams doesn't immediately trigger rebalance when
stream is closed on the instance that is being shut down, until the
session.timeout expires. So how does this no downtime configuration you
mentioned work since there will be downtime until session.timeout expires?

On Sat, Apr 15, 2023 at 8:13 PM John Roesler  wrote:

> Hi Pushkar,
>
> In 2.5, Kafka Streams used an assignor that tried to strike a compromise
> between stickiness and workload balance, so you would observe some
> stickiness, but not all the time.
>
> In 2.6, we introduced the "high availability task assignor" (see KIP-441
> https://cwiki-test.apache.org/confluence/display/KAFKA/KIP-441%3A+Smooth+Scaling+Out+for+Kafka+Streams).
> This assignor is guaranteed to always assign tasks to the instance that is
> most caught up (typically, this would be the instance that was already the
> active processor, which is equivalent to stickiness). In the case of losing
> an instance (eg the pod gets replaced), any standby replica would be
> considered "most caught up" and would take over processing with very little
> downtime.
>
> The new assignor achieves balance over time by "warming up" tasks in the
> background on other instances and then swaps the assignment over to them
> when they are caught up.
>
> So, if you upgrade Streams, you should be able to configure at least one
> standby task and then be able to implement the "rolling replacement"
> strategy you described. If you are willing to wait until Streams gradually
> balances the assignment over time after each replacement, then you can
> cycle out the whole cluster without ever having downtime or developing
> workload skew. Note that there are several configuration parameters you can
> adjust to speed up the warm-up process:
> https://cwiki-test.apache.org/confluence/display/KAFKA/KIP-441%3A+Smooth+Scaling+Out+for+Kafka+Streams#KIP441:SmoothScalingOutforKafkaStreams-Parameters
> .
>
> I hope this helps!
> -John
>
> On 2023/04/14 17:41:19 Pushkar Deole wrote:
> > Any inputs on below query?
> >
> > On Wed, Apr 12, 2023 at 2:22 PM Pushkar Deole 
> wrote:
> >
> > > Hi All,
> > >
> > > We are using version 2.5.1 of kafka-streams with 3 application
> instances
> > > deployed as 3 kubernetes pods.
> > > It consumes from multiple topics, each with 6 partitions.
> > > I would like to know if streams uses sticky partition assignor strategy
> > > internally since we can't set it externally on streams.
> > >
> > > My scenario is like this: during rolling upgrades
> > > Step 1: 1 new pod comes up so there are 4 pods, with some partitions
> > > assigned to newly created pod and k8s then deletes one of older pods,
> so it
> > > is pod1, pod2, pod3 (older) and pod4 (newer). Then pod1 is deleted. So
> > > ultimately pod2, pod3, pod4
> > >
> > > Step 2: K8s then repeats same for another old pod i.e. create a new pod
> > > and then delete old pod. So pod2, pod3, pod4, pod5 and then delete
> pod2. So
> > > ultimately pod3, pod4 and pod5
> > >
> > > The question I have here is: will kafka streams try to sticky with the
> > > partitions assigned to newly created pods during all these rebalances
> i.e.
> > > the partitions assigned to pod4 in step 1 will still be retained during
> > > step 2 when another older pod gets deleted OR the partitions are
> reshuffled
> > > on each rebalance whenever older pods get deleted. So during step 2,
> when
> > > pod2 is deleted, the partitions assigned to pod4 in step 1 will also
> > > reshuffle again or it will be there and any new partitions will only be
> > > assigned?
> > >
> > >
> >
>


Re: kafka streams partition assignor strategy for version 2.5.1 - does it use sticky assignment

2023-04-15 Thread John Roesler
Hi Pushkar,

In 2.5, Kafka Streams used an assignor that tried to strike a compromise 
between stickiness and workload balance, so you would observe some stickiness, 
but not all the time.

In 2.6, we introduced the "high availability task assignor" (see KIP-441 
https://cwiki-test.apache.org/confluence/display/KAFKA/KIP-441%3A+Smooth+Scaling+Out+for+Kafka+Streams).
 This assignor is guaranteed to always assign tasks to the instance that is 
most caught up (typically, this would be the instance that was already the 
active processor, which is equivalent to stickiness). In the case of losing an 
instance (eg the pod gets replaced), any standby replica would be considered 
"most caught up" and would take over processing with very little downtime.

The new assignor achieves balance over time by "warming up" tasks in the 
background on other instances and then swaps the assignment over to them when 
they are caught up.

So, if you upgrade Streams, you should be able to configure at least one 
standby task and then be able to implement the "rolling replacement" strategy 
you described. If you are willing to wait until Streams gradually balances the 
assignment over time after each replacement, then you can cycle out the whole 
cluster without ever having downtime or developing workload skew. Note that 
there are several configuration parameters you can adjust to speed up the 
warm-up process: 
https://cwiki-test.apache.org/confluence/display/KAFKA/KIP-441%3A+Smooth+Scaling+Out+for+Kafka+Streams#KIP441:SmoothScalingOutforKafkaStreams-Parameters.

I hope this helps!
-John

On 2023/04/14 17:41:19 Pushkar Deole wrote:
> Any inputs on below query?
> 
> On Wed, Apr 12, 2023 at 2:22 PM Pushkar Deole  wrote:
> 
> > Hi All,
> >
> > We are using version 2.5.1 of kafka-streams with 3 application instances
> > deployed as 3 kubernetes pods.
> > It consumes from multiple topics, each with 6 partitions.
> > I would like to know if streams uses sticky partition assignor strategy
> > internally since we can't set it externally on streams.
> >
> > My scenario is like this: during rolling upgrades
> > Step 1: 1 new pod comes up so there are 4 pods, with some partitions
> > assigned to newly created pod and k8s then deletes one of older pods, so it
> > is pod1, pod2, pod3 (older) and pod4 (newer). Then pod1 is deleted. So
> > ultimately pod2, pod3, pod4
> >
> > Step 2: K8s then repeats same for another old pod i.e. create a new pod
> > and then delete old pod. So pod2, pod3, pod4, pod5 and then delete pod2. So
> > ultimately pod3, pod4 and pod5
> >
> > The question I have here is: will kafka streams try to sticky with the
> > partitions assigned to newly created pods during all these rebalances i.e.
> > the partitions assigned to pod4 in step 1 will still be retained during
> > step 2 when another older pod gets deleted OR the partitions are reshuffled
> > on each rebalance whenever older pods get deleted. So during step 2, when
> > pod2 is deleted, the partitions assigned to pod4 in step 1 will also
> > reshuffle again or it will be there and any new partitions will only be
> > assigned?
> >
> >
> 


Re: kafka streams partition assignor strategy for version 2.5.1 - does it use sticky assignment

2023-04-14 Thread Pushkar Deole
Any inputs on below query?

On Wed, Apr 12, 2023 at 2:22 PM Pushkar Deole  wrote:

> Hi All,
>
> We are using version 2.5.1 of kafka-streams with 3 application instances
> deployed as 3 kubernetes pods.
> It consumes from multiple topics, each with 6 partitions.
> I would like to know if streams uses sticky partition assignor strategy
> internally since we can't set it externally on streams.
>
> My scenario is like this: during rolling upgrades
> Step 1: 1 new pod comes up so there are 4 pods, with some partitions
> assigned to newly created pod and k8s then deletes one of older pods, so it
> is pod1, pod2, pod3 (older) and pod4 (newer). Then pod1 is deleted. So
> ultimately pod2, pod3, pod4
>
> Step 2: K8s then repeats same for another old pod i.e. create a new pod
> and then delete old pod. So pod2, pod3, pod4, pod5 and then delete pod2. So
> ultimately pod3, pod4 and pod5
>
> The question I have here is: will kafka streams try to sticky with the
> partitions assigned to newly created pods during all these rebalances i.e.
> the partitions assigned to pod4 in step 1 will still be retained during
> step 2 when another older pod gets deleted OR the partitions are reshuffled
> on each rebalance whenever older pods get deleted. So during step 2, when
> pod2 is deleted, the partitions assigned to pod4 in step 1 will also
> reshuffle again or it will be there and any new partitions will only be
> assigned?
>
>