Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

Bruno Cadonna Mon, 31 Oct 2022 11:15:30 -0700

Hi Sophie,

Thank you for the KIP!

1.

I do not understand how autoscaling should work with a Streams topologywith a stateful sub-topology that reads from the input topics. Thesimplest example is a topology that consists of only one statefulsub-topology. As far as I understand the upstream producer would routeexisting keys to different partitions after the partition expansion thanbefore the expansion. That means Streams would -- in general -- not readthe same keys with the same stream thread after the expansion. I thinkyou proposed the solution to this in your last e-mail with the following:


<quote>

Essentially whoever is responsible for calculating how many partitionsare needed should also be responsible for directing whichever new keysare supposed to go into those new partitions, then pass it along to theupstream producer to encode in the record itself.

</quote>

But I am not 100% sure if you really meant what I understand. If Iunderstand it correctly, you propose that the user is responsible toproduce the records with existing keys to the same partitions as beforethe expansion upstream. I think that is an important information thatshould be pointed out in the KIP.

2.

I would log an error and shutdown the Streams application if a custompartitioner is used anywhere in the topology. I think that would makethe limitations clearer and would reduce perceived unexpected behaviorby the users. Are there any specific reasons you propose to ignore itand log a warning?


Best,
Bruno

On 28.10.22 04:51, Sophie Blee-Goldman wrote:

Thanks all! I'll try to address everything but don't hesitate to call me
out if anything is missed

Colt/Lucas:

Thanks for clarifying, I think I understand your example now. Something I
didn't think to mention
earlier but hopefully clears up how this would be used in practice is that
the partitioning decision/
logic doesn't need to -- and perhaps explicitly should not be -- internal
to the StaticStreamPartitioner
interface alone. I would imagine a realistic scenario would have the
partition essentially determined
upstream of the actual application, specifically integrated with whatever
system (or person) is
making the decision to add new partition(s) in the first place. Then the
partitioner is just reading out
some field in the record key/value, possibly doing some translation to
derive the final partition number
from something like a userId if it's not encoded directly, and not actually
computing anything itself.
Does that make sense? Essentially whoever is responsible for calculating
how many partitions are
needed should also be responsible for directing whichever new keys are
supposed to go into those
new partitions, then pass it along to the upstream producer to encode in
the record itself.

In sum, I second what Lucas said about your scenario actually being a good
example of one way
to approach implementing static partitioning, ie based on time. It's just
that the semantics/logic to
interpret the target partition based on time would be external to the
application and not isolated in
the actual StaticStreamPartitioner class. Imo this makes perfect sense, as
something like IQ is
also going to be situated outside of the Streams application itself, so
presumably it can talk to
the system that is responsible for the partitioning logic for any partition
information it needs.

Bill/Sagar:

I've been going back and forth a lot on whether to open this feature up to
stateless applications or
even stateful ones as well, but feel like I've settled on having it
targeted towards both (but only) the
stateless and statically partitioned cases. Bill, my only concern about the
stateless apps was the
possibility for trouble when repartitioning a stateless application that
feeds into a stateful application
downstream. But now that I think about it, users would actually need to
ensure that any/all apps
in that pipeline could handle partition increases, so it would be
impossible for someone to mess up
something downstream with corrupted partitioning because any changes to the
output topics would
of course mean changing the input topics of those downstream apps, and they
would just shut down
if not set up to handle this -- that's the whole point of this KIP. So
I'm +1 on including the stateless folks

As for stateful applications, I feel pretty strongly that we should
discourage users from trying to use
the autoscaling feature when state is involved. However, as I touch on
again briefly in the API discussion
below, there's no way to truly prevent someone from abusing this feature if
they are determined to. So
the idea is really for us to stress and heavily document which kinds of
applications can and cannot
enable autoscaling and/or be repartitioned without resulting in significant
corruption of the results.

As for key skew, technically anything is possible -- but (a) we're
entrusting users to make smart choices
throughout this KIP, which includes being careful with the partitioning
logic, (b) the real-world use cases
I'm aware of that requested this feature were not even susceptible to skew
from repartitioning as their
architecture involved giving each key its own partition, and (c) if key
skew is going to become a problem,
I would consider that a question for the KIP that introduced partition
increases, not an issue with a KIP
that's just trying to make Streams compatible with this ability :)
But yes, it's always a possibility and nonetheless fair to be concerned.
It's worth calling out in the docs
somewhere and trying to help users avoid problems with this.

Walker:

Thanks, yes you are right that there will not be a default implementation
provided, and also right that
this should have been explicitly called out in the KIP. I've added a note
to address this.

That said, since we're expanding the feature to include/allow stateless
applications as well, I've
been mulling over a few possible alternatives or modifications to the
currently proposed APIs.

1. We could expand the scope of the new config to enable setting a default
partitioner across the application regardless of the static condition and
autoscaling feature. But
if the user passes in a custom partitioner that does implement the new
StaticStreamPartitioner
interface, then autoscaling will be enabled. Some further options within
this scenario:
   a. Would we still lock down the partitioning and prevent the static
partitioner from being overridden?
       My personal preference is "yes", though it is a bit awkward to have
different semantics depending
       on what kind of partitioner is passed in. Therefore I'd propose to
always enforce any partitioner
       that's passed in as the default, and not allow topology-level
overrides. Imo this would also make
       the new config safer from user error due to accidental
discrepancies throughout the topology
   b. How should we expose the feature for stateless apps? We could just
offer an OOTB implementation
       for stateless apps, which could implement the StreamPartitioner
interface directly to circumvent the
       awkwardness of implementing an interface whose condition (staticness)
it doesn't meet. The downside
       is that some stateless apps may still want customized partitioning
logic. Of course they can just extend
       the class, but again it just feels slightly awkward due to the
interface/class asymmetry. Alternatively, the
       StatelessStreamPartitioner could be an interface in parallel to the
StaticStreamPartitioner. However, I
       anticipate that the vast majority of stateless apps which may want
this feature do not use a custom
       partitioner, and would be annoyed at having to implement one just to
unlock autoscaling. So if we did
       go this route, we'd probably need a default implementation anyways.
       That last option would probably be the best user experience, even if
slightly more work for us/me to
       add.
2. Another option is to keep the config semantics the same but change the
name to something like
'autoscaling.partitioner.class'. Then we can do something similar to what's
discussed in 1b, with my
preference being to accept either a StaticStreamPartitioner OR
implementation of a StatelessStreamPartitioner
interface, for which an OOTB default partitioner would also be provided.
3. One last open question here is whether we should try enforcing the
statelessness of applications that try
to enable autoscaling via whatever API we land on for the stateless case.
Personally I'm in favor of this, and
users who really want to get around our roadblocks and muck up a stateful
app could still get through via
the static partitioner. This check would just be an additional guardrail
from accidental misuses, not intentional ones

What do you all think? Any strong preferences or concerns about any of
these API options? Should we expand
the config to be useful for any app with custom partitioning, or keep it
focused on the autoscaling feature? I do
worry a bit that when some users see a new config about enabling
autoscaling, they'll get excited and blindly plug
in the OOTB assignor to try it out without really understanding its
limitations and intended use. Maybe that's just
paranoid, I certainly hope so. Anyways I look forward to hearing all your
opinions on the public interface here.

Whew, that was a long one, but thanks again to everyone who's joined the
discussion so far! You've really helped
me to clarify my thoughts and vision for this feature. Looking forward to
your replies

Cheers,
Sophie

On Tue, Oct 25, 2022 at 1:45 PM Walker Carlson
<wcarl...@confluent.io.invalid> wrote:

Hey Sophie,

Thanks for the KIP. I think this could be useful for a lot of cases. I also
think that this could cause a lot of confusion.

Just to make sure we are doing our best to prevent people from
misusing this feature, I wanted to clarify a couple of things.
1) There will be only an interface and no "default" implementation that a
user can plug in for the static partitioner. I am considering when it comes
to testing we want to make sure that we do not make our testing
implementation avaible to a user.
2)  If a user wanted to use auto scaling for a stateless application it
should be as easy as implementing the StaticStreamsPartitioner. Their
implementation could even just wrap the default partitioner if they wanted,
right?  I can't think of any way we could detect and then warn them about
the output topic not being partitioned by keys if that were to happen, can
you?

Overall this looks good to me!

Walker

On Tue, Oct 25, 2022 at 12:27 PM Bill Bejeck <bbej...@gmail.com> wrote:

Hi Sophie,

Thanks for the KIP! I think this is a worthwhile feature to add.  I have
two main questions about how this new feature will work.


    1. You mention that for stateless applications auto-scaling is a

sticker

    situation.  But I was thinking that the auto-scaling would actually
benefit
    stateless applications the most, let me explain my thinking.  Let's

say

you
    have a stateless Kafka Streams application with one input topic and 2
    partitions, meaning you're limited to at most 2 stream threads.  In
order
    to increase the throughput, you increase the number of partitions of

the

    source topic to 4, so you can 4 stream threads.  In this case would

the

    auto-scaling feature automatically increase the number of tasks from 2
to
    4?  Since the application is stateless, say using a filter then a map
for
    example, the partition for the record doesn't matter, so it seems that
    stateless applications would stand to gain a great deal.
    2. For stateful applications I can see the immediate benefit from
    autoscaling and static partitioning.   But again going with a

partition

    expansion for increased throughput example, what would be the

mitigation

    strategy for a stateful application that eventually wants to take
advantage
    of the increased number of partitions? Otherwise keeping all keys on
their
    original partition means you could end up with "key skew" due to not
    allowing keys to distribute out to the new partitions.

One last comment, the KIP states "only the key, rather than the key and
value, are passed in to the partitioner", but the interface has it

taking a

key and a value as parameters.  Based on your comments earlier in this
thread I was thinking that the text needs to be updated.

Thanks,
Bill

On Fri, Oct 21, 2022 at 12:21 PM Lucas Brutschy
<lbruts...@confluent.io.invalid> wrote:

Hi all,

thanks, Sophie, this makes sense. I suppose then the way to help the

user

not apply this in the wrong setting is having good documentation and a

one

or two examples of good use cases.

I think Colt's time-based partitioning is a good example of how to use
this. It actually doesn't have to be time, the same will work with any
monotonically increasing identifier. I.e. the new partitions will only

get

records for users with a "large" user ID greater than some user ID
threshold hardcoded in the static partitioner. At least in this

restricted

use-case, lookups by user ID would still be possible.

Cheers,
Lucas

On Fri, Oct 21, 2022 at 5:37 PM Colt McNealy <c...@littlehorse.io>

wrote:

Sophie,

Regarding item "3" (my last paragraph from the previous email),

perhaps I

should give a more general example now that I've had more time to

clarify

my thoughts:

In some stateful applications, certain keys have to be findable

without

any

information about when the relevant data was created. For example, if

I'm

running a word-count app and I want to use Interactive Queries to

find

the

count for "foo", I would need to know whether "foo" first arrived

before

or

after time T before I could find the correct partition to look up the

data.

In this case, I don't think static partitioning is possible. Is this
use-case a non-goal of the KIP, or am I missing something?

Colt McNealy
*Founder, LittleHorse.io*


On Thu, Oct 20, 2022 at 6:37 PM Sophie Blee-Goldman
<sop...@confluent.io.invalid> wrote:

Thanks for the responses guys! I'll get the easy stuff out of the

way

first:

1) Fixed the KIP so that StaticStreamPartitioner extends

StreamPartitioner

2) I totally agree with you Colt, the record value might have

valuable

(no

pun) information
in it that is needed to compute the partition without breaking the

static

constraint. As in my
own example earlier, maybe the userId is a field in the value and

not

the

key itself. Actually
it was that exact thought that made me do a U-turn on this but I

forgot

to

update the thread
3) Colt, I'm not  sure I follow what you're trying to say in that

last

paragraph, can you expand?
4) Lucas, it's a good question as to what kind of guard-rails we

could

put

up to enforce or even
detect a violation of static partitioning. Most likely Streams

would

need

to track every key to
partition mapping in an internal state store, but we have no

guarantee

the

key space is bounded
and the store wouldn't grow out of control. Mostly however I

imagine

users

would be frustrated
to find out there's a secret, extra state store taking up space

when

you

enable autoscaling, and
it's not even to provide functionality but just to make sure users

aren't

doing something wrong.

I wish I had a better idea, but sadly I think the only practical

solution

here is to try and make this
condition as clear and obvious and easy to understand as possible,

perhaps

by providing an
example of what does and does not satisfy the constraint in the

javadocs.

I'll work on that
5) I covered a bit above the impracticality of storing a

potentially

unbounded keyspace, which
as you mention would need to be shared by all partitioners as well,

so

would agree that this
feels insurmountable. I'm leaning towards only enabling this

feature

for

the static partitioning
case at least in the first iteration, and we can see how things go

from

there -- for example, are
people generally able to implement it correctly? If we find that

the

feature is working well and
users are hungry for more, then it would be relatively

straightforward

to

open things up to
stateless applications, or even stateful applications which can

withstand

some "blips" in the
logic/correctness.

That said, *technically* the feature would be able to be turned on

for

any

such case as it is, since
as discussed above it's difficult to place true guardrails around

the

feature that can enforce
static partitioning. Perhaps we could put a short note in the
StaticStreamPartitioner docs that
explain how and when it's safe to break the static requirement, but

that

we

recommend against
doing so..

Thoughts?

-Sophie

On Thu, Oct 20, 2022 at 8:11 AM Colt McNealy <c...@littlehorse.io>

wrote:

Sophie,

Thank you for your detailed response. That makes sense (one

partition

per

user seems like a lot of extra metadata if you've got millions of

users,

but I'm guessing that was just for illustrative purposes).

In this case I'd like to question one small detail in your kip.

The

StaticPartitioner takes in just the key and not the value...in an
application I've been working on, the "value" is a long-lived

entity

(spanning hundreds of records over several days) that has

timestamp

information about the creation of the entity inside of it. The ID

itself

is

provided by the end-user of the system and as such isn't

guaranteed

to

have

timestamp info.

This is quite a corner case, but if the StaticStreamPartitioner

interface

were allowed to peak at the record value, it would be trivial to

implement

logic as follows:
```
entity = deserialize(record.value())

if entity.created_before(T):
   return hash(key) % old_partitions
else:
   return hash(key) % new_partitions
```

That said, you're a rockstar architect and have seen a lot more

system

design than I have (I'm 23 and only 3 years out of school...you

implemented

cooperative rebalancing 😀). So don't make that decision unless

you

can

see

other use-cases where it is appropriate.

Additionally, for my own use-case I'm not sure if static

partitioning

alone

(as opposed to re-partitioning and re-playing the changelogs into

new

stores) would enable auto-scaleout because my system uses Kafka

Streams

as

the data store *and* a secondary index...for example, when a user

wants

to

look up all entities where the variable `user_email==f...@bar.com

`,

we

have

an index store that has keys partitioned by and prefixed with

`user_email==

f...@bar.com`. Entities with that email (for example) could come

before

or

after time T.

Anyways, that's just my twopence, if I were a voting committer

I'd

vote

for

this KIP as-is.

Cheers,
Colt McNealy
*Founder, LittleHorse.io*


On Wed, Oct 19, 2022 at 4:07 PM Sophie Blee-Goldman
<sop...@confluent.io.invalid> wrote:

Thanks for your questions, I would say that your understanding

sounds

correct based
on what you described but I'll try to add some clarity. The

basic

idea

is

that, as you said,
any keys that are processed before time T will go to partition

1.

All

of

those keys should
then continue to be routed to partition 1 for the remainder of

the

app's

lifetime, if you care
about maintaining correct history/"state" for that key (I'll

come

back

to

this in the next
paragraph). After the time T, new keys that weren't processed

prior

to

may be routed to
either partition, provided they are similarly mapped to the

same

partition

forever after. It's
up to the user to enforce this, perhaps by trying to keep track

of

all

keys

but that is likely to
be impractical. This feature is generally more targeted at

cases

where

the

partition mapping
is "obvious" enough to compute without needing to maintain a

history

of

all

keys and their
original partition: for example, imagine an application that

processes

user

account information.
You can scale out to a partition per user, and add a new

partition

each

time someone opens
a new account. When they open that account they get a userID

number,

starting with #0 and
counting up from there. In that case, the partition for any

records

pertaining to a given account
would just be its userID.

I hope that clears up the kind of intended use case we're

targeting

with

this feature. That said,
another important and equally viable use case that I neglected

to

mention

in the KIP is fully
stateless applications. Technically this feature can produce

correct

results for applications that
are at least one of (a) statically partitioned, or (b)

completely

stateless. However, the stateless
case is a bit stickier since even if the Streams application

itself

doesn't

care about maintaining
the same mapping of key to partition, it could for example be

feeding

into

a downstream
application which *does* need to maintain state, and which

would

wind

up

"losing" the history for
any keys that changed partition.

I kind of felt like opening this feature up to stateless

applications

would

be asking for trouble and
make it too easy for people to shoot themselves in the foot.

That

said,

I'm

open to discussion on
this point if you feel like the benefits here outweigh the

risks.

I'm

also

happy to consider modifying
the API so that it could naturally be expanded to include

stateless

applications  in the future, even
if we decide against allowing that use case in the first

iteration

of

the

feature.

Thoughts?

Sophie

On Wed, Oct 19, 2022 at 7:46 AM Colt McNealy <

c...@littlehorse.io>

wrote:

Sophie,

Thank you for the KIP! Choosing the number of partitions in a

Streams

app

is a tricky task because of how difficult it is to

re-partition;

I'm

glad

you're working on an improvement. I've got two questions:

First, `StaticStreamsPartitioner` is an interface that we

(Streams

users)

must implement, I'm trying to understand how it would work.

For

example,

let's say there's some point in time 'T' before which we

have 1

partition.

Then we decide to increase the partition count to 2 at time

T.

From

my

understanding, all keys that had passed through the Streams

app

before

time

T must end up on partition 1 if they appear again in the

input

topics;

but

any new keys are allowed to be sent to partition 2. Is that

correct?

And

(pardon the naive question) how is this achieved without

keeping

track

of

all keys that have been seen at any point?

Secondly, will this feature work with applications that use

interactive

queries?

Thank you very much,
Colt McNealy
*Founder, LittleHorse.io*


On Tue, Oct 18, 2022 at 9:34 PM Sophie Blee-Goldman
<sop...@confluent.io.invalid> wrote:

Hey all,

I'd like to propose a new autoscaling feature for Kafka

Streams

applications which can follow the constraint of static

partitioning.

For

further details please refer to the KIP document:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-878%3A+Autoscaling+for+Statically+Partitioned+Streams


This feature will be targeted for 3.4 but may not be fully

implemented

until the following release, 3.5.

Please give this a read and let me know what you think!

Cheers,
Sophie

Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

Reply via email to