Thanks Bill, John and Matthias. Glad you guys joined this discussion. I got a lot out of the discussion.
I would like to update KIP-216 base on John's suggestion to remove the category. --- Vito On Fri, Jan 17, 2020 at 2:30 AM Matthias J. Sax <matth...@confluent.io> wrote: > > Nevertheless, if we omit the categorization, it’s moot. > > Ack. > > I am fine to remove the middle tier. As John pointed out, it might be > weird to have only one concrete exception type per category. We can also > explain in detail how to handle each exception in their JavaDocs. > > > -Matthias > > On 1/16/20 6:38 AM, Bill Bejeck wrote: > > Vito, > > > > Thanks for the updates, the KIP LGTM. > > > > -Bill > > > > On Wed, Jan 15, 2020 at 11:31 PM John Roesler <vvcep...@apache.org> > wrote: > > > >> Hi Vito, > >> > >> Haha, your archive game is on point! > >> > >> What Matthias said in that email is essentially what I figured was the > >> rationale. It makes sense, but the point I was making is that this > really > >> doesn’t seem like a good way to structure a production app. On the other > >> hand, considering the exception fatal has a good chance of avoiding a > >> frustrating debug session if you just forgot to call start. > >> > >> Nevertheless, if we omit the categorization, it’s moot. > >> > >> It would be easy to add a categorization layer later if we want it, but > >> not very easy to change it if we get it wrong. > >> > >> Thanks for your consideration! > >> -John > >> > >> On Wed, Jan 15, 2020, at 21:14, Vito Jeng wrote: > >>> Hi John, > >>> > >>> About `StreamsNotStartedException is strange` -- > >>> The original idea came from Matthias, two years ago. :) > >>> You can reference here: > >>> > >> > https://mail-archives.apache.org/mod_mbox/kafka-dev/201806.mbox/%3c6c32083e-b63c-435b-521d-032d45cc5...@confluent.io%3e > >>> > >>> About omitting the categorization -- > >>> It looks reasonable. I'm fine with omitting the categorization but not > >> very > >>> sure it is a good choice. > >>> Does any other folks provide opinion? > >>> > >>> > >>> Hi, folks, > >>> > >>> Just update the KIP-216, please take a look. > >>> > >>> --- > >>> Vito > >>> > >>> > >>> On Thu, Jan 16, 2020 at 6:35 AM Vito Jeng <v...@is-land.com.tw> wrote: > >>> > >>>> > >>>> Hi, folks, > >>>> > >>>> Thank you suggestion, really appreciate it. :) > >>>> I understand your concern. I'll merge StreamsNotRunningException and > >>>> StateStoreNotAvailableException. > >>>> > >>>> > >>>> --- > >>>> Vito > >>>> > >>>> > >>>> On Thu, Jan 16, 2020 at 6:22 AM John Roesler <vvcep...@apache.org> > >> wrote: > >>>> > >>>>> Hey Vito, > >>>>> > >>>>> Yes, thanks for the KIP. Sorry the discussion has been so long. > >>>>> Hopefully, we can close it out soon. > >>>>> > >>>>> I agree we can drop StreamsNotRunningException in favor of > >>>>> just StateStoreNotAvailableException. > >>>>> > >>>>> Unfortunately, I have some higher-level concerns. The value > >>>>> of these exceptions is that they tell you how to handle the > >>>>> various situations that can arise while querying a distributed > >>>>> data store. > >>>>> > >>>>> Ideally, as a caller, I should be able to just catch "retriable" or > >>>>> "fatal" and handle them appropriately. Otherwise, there's no > >>>>> point in having categories, and we should just have all the > >>>>> exceptions extend InvalidStateStoreException. > >>>>> > >>>>> Presently, it's not possible to tell from just the > >>>>> "retriable"/"fatal" distinction what to do. You can tell > >>>>> from the descriptions of the various exceptions. E.g.: > >>>>> > >>>>> Retriable: > >>>>> * StreamsRebalancingException: the exact same call > >>>>> should just be retried until the rebalance is complete > >>>>> * StateStoreMigratedException: the store handle is > >>>>> now invalid, so you need to re-discover the instance > >>>>> and get a new handle on that instance. In other words, > >>>>> the query itself may be valid, but the particular method > >>>>> invocation on this particular instance has encountered > >>>>> a fatal exception. > >>>>> > >>>>> Fatal: > >>>>> * UnknownStateStoreException: this is truly fatal. No amount > >>>>> of retrying or re-discovering is going to get you a handle on a > >>>>> store that doesn't exist in the cluster. > >>>>> * StateStoreNotAvailableException: this is actually recoverable, > >>>>> since the store might exist in the cluster, but isn't available > on > >>>>> this particular instance (which is shut down or whatever). > >>>>> > >>>>> Personally, I'm not a fan of code bureaucracy, so I'm 100% fine > >>>>> with omitting the categorization and just having 5 subclasses > >>>>> of InvalidStateStoreException. Each of them would tell you > >>>>> how to handle them, and it's not too many to really > >>>>> understand and handle each one. > >>>>> > >>>>> If you really want to have a middle tier, I'd recommend: > >>>>> * RetryableStateStoreException: the exact same call > >>>>> should be repeated. > >>>>> * RecoverableStateStoreException: the store handle > >>>>> should be discarded and the caller should re-discover > >>>>> the location of the store and repeat the query on the > >>>>> correct instance. > >>>>> * FatalStateStoreException: the query/request is totally > >>>>> invalid and will never succeed. > >>>>> > >>>>> However, attempting to categorize the proposed exceptions > >>>>> reveals even problems with this categorization: > >>>>> Retriable: > >>>>> * StreamsRebalancingException > >>>>> Recoverable: > >>>>> * StateStoreMigratedException > >>>>> * StreamsNotRunningException > >>>>> Fatal: > >>>>> * UnknownStateStoreException > >>>>> > >>>>> But StreamsNotStartedException is strange... It means that > >>>>> one code path got a handle on a specific KafkaStreams object > >>>>> instance and sent it a query before another code path > >>>>> invoked the start() method on the exact same object instance. > >>>>> It seems like the most likely scenario is that whoever wrote > >>>>> the program just forgot to call start() before querying, in > >>>>> which case, retrying isn't going to help, and a fatal exception > >>>>> is more appropriate. I.e., it sounds like a "first 15 minutes > >>>>> experience" problem, and making it fatal would be more > >>>>> helpful. Even in a production context, there's no reason not > >>>>> to sequence your application startup such that you don't > >>>>> accept queries until after Streams is started. Thus, I guess > >>>>> I'd categorize it under "fatal". > >>>>> > >>>>> Regardless of whether you make it fatal or retriable, you'd > >>>>> still have a whole category with only one exception in it, > >>>>> and the other two categories only have two exceptions. > >>>>> Plus, as you pointed out in the KIP, you can't get all > >>>>> exceptions in all cases anyway: > >>>>> * store() can only throw NotStarted, NotRunning, > >>>>> and Unknown > >>>>> * actual store queries can only throw Rebalancing, > >>>>> Migrated, and NotRunning > >>>>> > >>>>> Thus, in practice also, there are exactly three categories > >>>>> and also exactly three exception types. It doesn't seem > >>>>> like there's a great advantage to the categories here. To > >>>>> avoid the categorization problem and also to clarify what > >>>>> exceptions can actually be thrown in different circumstances, > >>>>> it seems like we should just: > >>>>> * get rid of the middle tier and make all the exceptions > >>>>> extend InvalidStateStoreException > >>>>> * drop StateStoreNotAvailableException in favor of > >>>>> StreamsNotRunningException > >>>>> * clearly document on all public methods which exceptions > >>>>> need to be handled > >>>>> > >>>>> How do you feel about this? > >>>>> Thanks, > >>>>> -John > >>>>> > >>>>> On Wed, Jan 15, 2020, at 15:13, Bill Bejeck wrote: > >>>>>> Thanks for KIP Vito. > >>>>>> > >>>>>> Overall the KIP LGTM, but I'd have to agree with others on merging > >> the > >>>>>> `StreamsNotRunningException` and `StateStoreNotAvailableException` > >>>>> classes. > >>>>>> > >>>>>> Since in both cases, the thread state is in `PENDING_SHUTDOWN || > >>>>>> NOT_RUNNING || ERROR` I'm not even sure how we could distinguish > >> when to > >>>>>> use the different > >>>>>> exceptions. Maybe a good middle ground would be to have a detailed > >>>>>> exception message. > >>>>>> > >>>>>> The KIP freeze is close, so I think if we can agree on this, we can > >>>>> wrap up > >>>>>> the voting soon. > >>>>>> > >>>>>> Thanks, > >>>>>> Bill > >>>>>> > >>>>>> On Tue, Jan 14, 2020 at 2:12 PM Matthias J. Sax < > >> matth...@confluent.io> > >>>>>> wrote: > >>>>>> > >>>>>>> Vito, > >>>>>>> > >>>>>>> It's still unclear to me what the advantage is, to have both > >>>>>>> `StreamsNotRunningException` and > >> `StateStoreNotAvailableException`? > >>>>>>> > >>>>>>> For both cased, the state is `PENDING_SHUTDOWN / NOT_RUNNING / > >> ERROR` > >>>>>>> and thus, for a user point of view, why does it matter if the > >> store is > >>>>>>> closed on not? I don't understand why/how this information would > >> be > >>>>>>> useful? Do you have a concrete example in mind how a user would > >> react > >>>>>>> differently to both exceptions? > >>>>>>> > >>>>>>> > >>>>>>> @Vinoth: about `StreamsRebalancingException` -- to me, it seems > >> best > >>>>> to > >>>>>>> actually do this on a per-query basis, ie, have an overload > >>>>>>> `KafkaStreams#store(...)` that takes a boolean flag that allow to > >>>>>>> _disable_ the exception and opt-in to query a active store during > >>>>>>> recovery. However, as KIP-535 actually introduces this change in > >>>>>>> behavior, I think KIP-216 should not cover this, but KIP-535 > >> should be > >>>>>>> updated. I'll follow up on the other KIP thread to raise this > >> point. > >>>>>>> > >>>>>>> > >>>>>>> -Matthias > >>>>>>> > >>>>>>> On 1/11/20 12:26 AM, Vito Jeng wrote: > >>>>>>>> Hi, Matthias & Vinoth, > >>>>>>>> > >>>>>>>> Thanks for the feedback. > >>>>>>>> > >>>>>>>>> What is still unclear to me is, what we gain by having both > >>>>>>>>> `StreamsNotRunningException` and > >>>>> `StateStoreNotAvailableException`. Both > >>>>>>>>> exception are thrown when KafkaStreams is in state > >>>>> PENDING_SHUTDOWN / > >>>>>>>>> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know > >> if the > >>>>>>>>> state store is closed on not -- I can't query it anyway? Maybe > >> I > >>>>> miss > >>>>>>>>> something thought? > >>>>>>>> > >>>>>>>> Yes, both `StreamsNotRunningException` and > >>>>>>>> `StateStoreNotAvailableException` are fatal exception. > >>>>>>>> But `StateStoreNotAvailableException` is fatal exception about > >> state > >>>>>>> store > >>>>>>>> related. > >>>>>>>> I think it would be helpful that if user need to distinguish > >> these > >>>>> two > >>>>>>>> different case to handle it. > >>>>>>>> > >>>>>>>> I'm not very sure, does that make sense? > >>>>>>>> > >>>>>>>> > >>>>>>>> --- > >>>>>>>> Vito > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Jan 10, 2020 at 3:35 AM Vinoth Chandar < > >> vin...@apache.org> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> +1 on merging `StreamsNotRunningException` and > >>>>>>>>> `StateStoreNotAvailableException`, both exceptions are fatal > >>>>> anyway. IMO > >>>>>>>>> its best to have these exceptions be about the state store > >> (and not > >>>>>>> streams > >>>>>>>>> state), to easier understanding. > >>>>>>>>> > >>>>>>>>> Additionally, KIP-535 allows for querying of state stores in > >>>>> rebalancing > >>>>>>>>> state. So do we need the StreamsRebalancingException? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 2020/01/09 03:38:11, "Matthias J. Sax" < > >> matth...@confluent.io> > >>>>>>> wrote: > >>>>>>>>>> Sorry that I dropped the ball on this... > >>>>>>>>>> > >>>>>>>>>> Thanks for updating the KIP. Overall LGTM now. Feel free to > >> start > >>>>> a > >>>>>>> VOTE > >>>>>>>>>> thread. > >>>>>>>>>> > >>>>>>>>>> What is still unclear to me is, what we gain by having both > >>>>>>>>>> `StreamsNotRunningException` and > >>>>> `StateStoreNotAvailableException`. > >>>>>>> Both > >>>>>>>>>> exception are thrown when KafkaStreams is in state > >>>>> PENDING_SHUTDOWN / > >>>>>>>>>> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know > >> if > >>>>> the > >>>>>>>>>> state store is closed on not -- I can't query it anyway? > >> Maybe I > >>>>> miss > >>>>>>>>>> something thought? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -Matthias > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 11/3/19 6:07 PM, Vito Jeng wrote: > >>>>>>>>>>> Sorry for the late reply, thanks for the review. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> About `StateStoreMigratedException`: > >>>>>>>>>>>> > >>>>>>>>>>>> Why is it only thrown if the state is REBALANCING? A store > >>>>> might be > >>>>>>>>>>>> migrated during a rebalance, and Kafka Streams might resume > >>>>> back to > >>>>>>>>>>>> RUNNING state and afterward somebody tries to use an old > >> store > >>>>>>> handle. > >>>>>>>>>>>> Also, if state is REBALANCING, should we throw > >>>>>>>>>>>> `StreamThreadRebalancingException`? Hence, I think > >>>>>>>>>>>> `StateStoreMigratedException` does only make sense during > >>>>> `RUNNING` > >>>>>>>>> state. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Thank you point this, already updated. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Why do we need to distinguish between > >>>>>>> `KafkaStreamsNotRunningException` > >>>>>>>>>>>> and `StateStoreNotAvailableException`? > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> `KafkaStreamsNotRunningException` may be caused by various > >>>>> reasons, I > >>>>>>>>> think > >>>>>>>>>>> it would be helpful that the > >>>>>>>>>>> user can distinguish whether it is caused by the state store > >>>>> closed. > >>>>>>>>>>> (Maybe I am wrong...) > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Last, why do we distinguish between `KafkaStreams` instance > >> and > >>>>>>>>>>>> `StreamsThread`? To me, it seems we should always refer to > >> the > >>>>>>>>> instance, > >>>>>>>>>>>> because that is the level of granularity in which we > >>>>> enable/disable > >>>>>>>>> IQ atm. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Totally agree. Do you mean the naming of state store > >> exceptions? > >>>>>>>>>>> I don't have special reason to distinguish these two. > >>>>>>>>>>> Your suggestion look more reasonable for the exception > >> naming. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Last, for `StateStoreMigratedException`, I would add that a > >> user > >>>>> need > >>>>>>>>> to > >>>>>>>>>>>> rediscover the store and cannot blindly retry as the store > >>>>> handle is > >>>>>>>>>>>> invalid and a new store handle must be retrieved. That is a > >>>>>>> difference > >>>>>>>>>>>> to `StreamThreadRebalancingException` that allows for > >> "blind" > >>>>> retries > >>>>>>>>>>>> that either resolve (if the store is still on the same > >> instance > >>>>> after > >>>>>>>>>>>> rebalancing finishes, or changes to > >>>>> `StateStoreMigratedException` if > >>>>>>>>> the > >>>>>>>>>>>> store was migrated away during rebalancing). > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Nice, it's great! Thank you. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> The KIP already updated, please take a look. :) > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Oct 23, 2019 at 1:48 PM Matthias J. Sax < > >>>>>>> matth...@confluent.io > >>>>>>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Any update on this KIP? > >>>>>>>>>>>> > >>>>>>>>>>>> On 10/7/19 3:35 PM, Matthias J. Sax wrote: > >>>>>>>>>>>>> Sorry for the late reply. The 2.4 deadline kept us quite > >> busy. > >>>>>>>>>>>>> > >>>>>>>>>>>>> About `StateStoreMigratedException`: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Why is it only thrown if the state is REBALANCING? A store > >>>>> might be > >>>>>>>>>>>>> migrated during a rebalance, and Kafka Streams might resume > >>>>> back to > >>>>>>>>>>>>> RUNNING state and afterward somebody tries to use an old > >> store > >>>>>>>>> handle. > >>>>>>>>>>>>> Also, if state is REBALANCING, should we throw > >>>>>>>>>>>>> `StreamThreadRebalancingException`? Hence, I think > >>>>>>>>>>>>> `StateStoreMigratedException` does only make sense during > >>>>> `RUNNING` > >>>>>>>>>>>> state. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Why do we need to distinguish between > >>>>>>>>> `KafkaStreamsNotRunningException` > >>>>>>>>>>>>> and `StateStoreNotAvailableException`? > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Last, why do we distinguish between `KafkaStreams` > >> instance and > >>>>>>>>>>>>> `StreamsThread`? To me, it seems we should always refer to > >> the > >>>>>>>>> instance, > >>>>>>>>>>>>> because that is the level of granularity in which we > >>>>> enable/disable > >>>>>>>>> IQ > >>>>>>>>>>>> atm. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Last, for `StateStoreMigratedException`, I would add that a > >>>>> user > >>>>>>>>> need to > >>>>>>>>>>>>> rediscover the store and cannot blindly retry as the store > >>>>> handle is > >>>>>>>>>>>>> invalid and a new store handle must be retrieved. That is a > >>>>>>>>> difference > >>>>>>>>>>>>> to `StreamThreadRebalancingException` that allows for > >> "blind" > >>>>>>> retries > >>>>>>>>>>>>> that either resolve (if the store is still on the same > >> instance > >>>>>>> after > >>>>>>>>>>>>> rebalancing finishes, or changes to > >>>>> `StateStoreMigratedException` if > >>>>>>>>> the > >>>>>>>>>>>>> store was migrated away during rebalancing). > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 8/9/19 10:20 AM, Vito Jeng wrote: > >>>>>>>>>>>>>> My bad. The short link `https://shorturl.at/CDNT9` > <https://shorturl.at/CDNT9> > >> <https://shorturl.at/CDNT9> > >>>>> <https://shorturl.at/CDNT9> > >>>>>>> <https://shorturl.at/CDNT9> > >>>>>>>>> <https://shorturl.at/CDNT9> > >>>>>>>>>>>> <https://shorturl.at/CDNT9> > >>>>>>>>>>>>>> <https://shorturl.at/CDNT9> seems incorrect. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Please use the following instead: > >> https://shorturl.at/bkKQU > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> --- > >>>>>>>>>>>>>> Vito > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, Aug 9, 2019 at 10:53 AM Vito Jeng < > >>>>> v...@is-land.com.tw> > >>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, Matthias! > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> About `StreamThreadNotStartedException`: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thank you for explanation. I agree with your opinion. > >>>>>>>>>>>>>>> `CompositeReadOnlyXxxStore#get()` would never throw > >>>>>>>>>>>>>>> `StreamThreadNotStartedException`. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> For the case that corresponding thread crashes after we > >>>>> handed out > >>>>>>>>> the > >>>>>>>>>>>>>>> store handle. We may throw > >> `KafkaStreamsNotRunningException` > >>>>> or > >>>>>>>>>>>>>>> `StateStoreMigratedException`. > >>>>>>>>>>>>>>> In `StreamThreadStateStoreProvider`, we would throw > >>>>>>>>>>>>>>> `KafkaStreamsNotRunningException` when stream thread is > >> not > >>>>>>>>> running( > >>>>>>>>>>>>>>> https://shorturl.at/CDNT9) or throw > >>>>> `StateStoreMigratedException` > >>>>>>>>> when > >>>>>>>>>>>>>>> store is closed(https://shorturl.at/hrvAN). So I think > >> we > >>>>> do not > >>>>>>>>> need > >>>>>>>>>>>> to > >>>>>>>>>>>>>>> add a new type for this case. Does that make sense? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> About `KafkaStreamsNotRunningException` vs > >>>>>>>>>>>>>>> `StreamThreadNotRunningException`: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I understand your point. I rename > >>>>>>>>> `StreamThreadNotRunningException` to > >>>>>>>>>>>>>>> `KafkaStreamsNotRunningException`. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> About check unknown state store names: > >>>>>>>>>>>>>>> Thank you for the hint. I add a new type > >>>>>>>>> `UnknownStateStoreException` > >>>>>>>>>>>> for > >>>>>>>>>>>>>>> this case. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Also, we should still have fatal exception > >>>>>>>>>>>>>>> `StateStoreNotAvailableException`? Not sure why you > >> remove > >>>>> it? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thank you point this, already add it again. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The KIP already updated, please take a look. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> --- > >>>>>>>>>>>>>>> Vito > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > >