[ 
https://issues.apache.org/jira/browse/KAFKA-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381278#comment-16381278
 ] 

Guozhang Wang commented on KAFKA-6555:
--------------------------------------

First of all, as I mentioned before this change itself deserves a KIP 
discussion given its impact and scope so that we can bring this to a more broad 
audience's attention for feedbacks.

And here is my two cents: although this ticket may be orthogonal to 6144 and 
6145, they are trying to tackle the same general issue, which is during a 
rebalance, the stores may be unavailable for queries during restoration, and 
how long that unavailability would be depends on whether or not you have 
standby replicas, and if the rebalance is triggered by fail-over or scaling 
out, etc. But ultimately we'd like to reduce that unavailability window as much 
as possible by trading some data consistency off.

My take then is that, if eventually we are going to support KAFKA-6145 (which I 
think we should), then the case of scaling out should be well covered, and 
hence I'd prefer to just read from the restoring active task for simplicity, 
and additionally make it configurable as Damian suggested. So the situation 
becomes:

1. For scaling out scenario, KAFKA-6145 will make sure we have a 
close-to-latest replica when the actual rebalance happens, so this scenario is 
effectively reduced to a "controlled fail over" scenario.
2. For fail-over scenario, we can assume that the restoring state should be the 
closest to latest, and hence have a knob to allow users to read stale data from 
it during restoring to reduce unavailability gap.

I.e. we will drop KAFKA-6144, and only do this JIRA and KAFKA-6145 by the end 
of the day.

> Making state store queryable during restoration
> -----------------------------------------------
>
>                 Key: KAFKA-6555
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6555
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Ashish Surana
>            Assignee: Ashish Surana
>            Priority: Major
>
> State store in Kafka streams are currently only queryable when StreamTask is 
> in RUNNING state. The idea is to make it queryable even in the RESTORATION 
> (PARTITION_ASSIGNED) state as the time spend on restoration can be huge and 
> making the data inaccessible during this time could be downtime not suitable 
> for many applications.
> When the active partition goes down then one of the following occurs:
>  # One of the standby replica partition gets promoted to active: Replica task 
> has to restore the remaining state from the changelog topic before it can 
> become RUNNING. The time taken for this depends on how much the replica is 
> lagging behind. During this restoration time the state store for that 
> partition is currently not queryable resulting in the partition downtime. We 
> can make the state store partition queryable for the data already present in 
> the state store.
>  # When there is no replica or standby task, then active task will be started 
> in one of the existing node. That node has to build the entire state from the 
> changelog topic which can take lot of time depending on how big is the 
> changelog topic, and keeping state store not queryable during this time is 
> the downtime for the parition.
> It's very important improvement as it could simply improve the availability 
> of microservices developed using kafka streams.
> I am working on a patch for this change. Any feedback or comments are welcome.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to