Hey Folks

I have a few questions around the operations of stateful processing while
scaling nodes up/down, and a possible KIP in question #4. Most of them have
to do with task processing during rebuilding of state stores after scaling
nodes up.

Scenario:
Single node/thread, processing 2 topics (10 partitions each):
User event topic (events) - ie: key:userId, value: ProductId
Product topic (entity) - ie: key: ProductId, value: productData

My topology looks like this:

KTable productTable = ... //materialize from product topic

KStream output = userStream
    .map(x => (x.value, x.key) ) //Swap the key and value around
    .join(productTable, ... ) //Joiner is not relevant here
    .to(...)  //Send it to some output topic


Here are my questions:
1) If I scale the processing node count up, partitions will be rebalanced
to the new node. Does processing continue as normal on the original node,
while the new node's processing is paused as the internal state stores are
rebuilt/reloaded? From my reading of the code (and own experience) I
believe this to be the case, but I am just curious in case I missed
something.

2) What happens to the userStream map task? Will the new node be able to
process this task while the state store is rebuilding/reloading? My reading
of the code suggests that this map process will be paused on the new node
while the state store is rebuilt. The effect of this is that it will lead
to a delay in events reaching the original node's partitions, which will be
seen as late-arriving events. Am I right in this assessment?

3) How does scaling up work with standby state-store replicas? From my
reading of the code, it appears that scaling a node up will result in a
reabalance, with the state assigned to the new node being rebuilt first
(leading to a pause in processing). Following this, the standy replicas are
populated. Am I correct in this reading?

4) If my reading in #3 is correct, would it be possible to pre-populate the
standby stores on scale-up before initiating active-task transfer? This
would allow seamless scale-up and scale-down without requiring any pauses
for rebuilding state. I am interested in kicking this off as a KIP if so,
but would appreciate any JIRAs or related KIPs to read up on prior to
digging into this.


Thanks

Adam Bellemare

Reply via email to