Hi Bill, Great questions; I'll do my best to reply inline:
On Fri, May 6, 2022 at 3:21 PM Bill Bejeck <bbej...@gmail.com> wrote: > Hi Jim, > > Thanks for the KIP. I have a couple of meta-questions as well: > > 1) Regarding pausing only a subset of running instances, I'm thinking there > may be a use case for pausing all of them. > Would it make sense to also allow for pausing all instances by adding a > method `pauseAll()` or something similar? > Honestly, I'm indifferent on this point. Presently, I think what I have proposed is the minimal change to get the ability to pause and resume processing. If adding a 'pauseAll()' is required, I'd be happy to do that! >From Guozhang's email, it sounds like this would require using the rebalance protocol to trigger the coordination. Would there be enough room in that approach to indicate that a named topology is to be paused across all nodes? > 2) Would pausing affect standby tasks? For example, imagine there are 3 > instances A, B, and C. > A user elects to pause instance C only but it hosts the standby tasks > for A. > Would the standby tasks on the paused application continue to read from > the changelog topic? > Yes, standby tasks would continue reading from the changelog topic. All consumers would continue reading to avoid getting dropped from their consumer groups. Cheers, Jim > Thanks! > Bill > > > On Fri, May 6, 2022 at 2:44 PM Jim Hughes <jhug...@confluent.io.invalid> > wrote: > > > Hi Guozhang, > > > > Thanks for the feedback; responses inline below: > > > > On Fri, May 6, 2022 at 1:09 PM Guozhang Wang <wangg...@gmail.com> wrote: > > > > > Hello Jim, > > > > > > Thanks for the proposed KIP. I have some meta questions about it: > > > > > > 1) Would an instance always pause/resume all of its current owned > > > topologies (i.e. the named topologies), or are there any scenarios > where > > we > > > only want to pause/resume a subset of them? > > > > > > > An instance may wish to pause some of its named topologies. I was unsure > > what to say about named topologies in the KIP since they seem to be an > > internal detail at the moment. > > > > I intend to add to KafkaStreamsNamedTopologyWrapper methods like: > > public void pauseNamedTopology(final String topologyToPause) > > public boolean isNamedTopologyPaused(final String topology) > > public void resumeNamedTopology(final String topologyToResume) > > > > > > > > > 2) From a user's perspective, do we want to always issue a > `pause/resume` > > > to all the instances or not? For example, we can define the semantics > of > > > the function as "you only need to call this function on any of the > > > application's instances, and all instances would then pause (via the > > > rebalance error codes)", or as "you would call this function for all > the > > > instances of an application". Which one are you referring to? > > > > > > > My initial intent is that one would call this function on any instances > of > > the application that one wishes to pause. This should allow more control > > (in case one wanted to pause a portion of the instances). On the other > > hand, this approach would put more work on the implementer to coordinate > > calling pause or resume across instances. > > > > If the other option is more suitable, happy to do that instead. > > > > > > > 3) With EOS, there's a transaction timeout which would determine how > > long a > > > transaction can stay idle before it's force-aborted on the broker > side. I > > > think when a pause is issued, that means we'd need to immediately > commit > > > the current transaction for EOS since we do not know how long we could > > > pause for. Is that right? If yes could you please clarify that in the > doc > > > as well. > > > > > > > Good point. My intent is for pause() to wait for the next iteration > > through `runOnce()` and then only skip over the processing for paused > tasks > > in `taskManager.process(numIterations, time)`. > > > > Do commits live inside that call or do they live across/outside of it? > In > > the former case, I think there shouldn't be any issues with EOS. > > Otherwise, we may need to work through some details to get EOS right. > > > > Once we figure that out, I can update the KIP. > > > > Thanks, > > > > Jim > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > On Wed, May 4, 2022 at 10:51 AM Jim Hughes > <jhug...@confluent.io.invalid > > > > > > wrote: > > > > > > > Hi all, > > > > > > > > I have written up a KIP for adding the ability to pause and resume > the > > > > processing of a topology in AK Streams. The KIP is here: > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832 > > > > > > > > Thanks in advance for your feedback! > > > > > > > > Cheers, > > > > > > > > Jim > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > >