Hi David and Justine, Thank you both for the detailed feedback.
David, 1. That makes sense. I revised the "Reading new fields" section with how we can downgrade to the highest known version and that this was confirmed via unit testing. I also attempted to dive deeper into using tagged fields and the rejected alternative. Please let me know what you think. 2. Under "Restrictions and Guidelines" I updated the paragraph to clearly state that we want to use tagged fields across all record types introduced in KIP-848 including OffsetCommitValue. 3. Would it be possible to bump the OffsetCommitValue record version to make it flexible along with the changes to parse with the highest known version? I'm not sure I understand why we cannot make both changes together. 4. I completely missed this. Added some notes at the end of "Restrictions and Guidelines". Unfortunately I can't think of a solution at the moment. Will get back to you. 5. I have a section under "Identifying New Record Types" that discusses this: > We can automate the cleanup by writing tombstones when the coordinator reads unrecognizable records. This may add duplicate work if tombstones were already added but not yet pruned by the log cleaner. This is a sure way to delete any unknown record types even if the operator does not follow the steps. 6. Thanks, I have expanded on the section on transactional offset commits. As for log compaction, my understanding was that we can control the process by forcing compaction. Is my understanding incorrect? 7. We throw an exception if ConfigResource.type is unknown which is true in our case because KIP-848 introduces a new GROUP resource type. We will need to add some sort of safeguard if the dynamic configs are not deleted before the server downgrade. I'll give you an update once I update the section. Justine, On the transactional offset commits, I have updated the KIP to reflect your points after our offline discussion (much appreciated). It seems that the work required to abort from the server side is fairly big and will require additional investigation if we are to go down this path. As for downgrade limitations, I missed that part so thanks for the catch (along with David). Unfortunately, the proposed design won't allow downgrades even after the new record types are deleted because the existing OffsetCommitValue record is not flexible and will not be able to parse the topicId tagged field. I'll think more about this and get back to you. Best, Jeff On Mon, Mar 20, 2023 at 2:16 PM Justine Olshan <[email protected]> wrote: > Hey Jeff and David, > > Thanks for the KIP! I was also looking into this a bit since I may want to > change the record format for KIP-890 as well (finally implementing the > record change from KIP-360 to better support the epoch bump. This will > potentially be helpful for me to implement that work. > > I discussed some aspects with David offline. Namely, the alternative > solution to rewrite the records in the old format upon downgrade. One plus > to this approach is that it should trigger compaction for all the existing > (new format) records. > The main issue we ran into was how to handle ongoing transactions. (Ie, > could we abort and rewrite with the old format) I think that's one of the > main downsides. I think it could potentially be possible to do this, but > we'd definitely need a server-side mechanism to abort and rewrite the > records. > > I think the trouble I was running into with the current solution is that we > can only downgrade to a version slightly before the new group coordinator. > If someone was running 3.2 and they upgrade to 3.6, but find a bug in 3.4 > (or any version between 3.5 and the one they upgraded from), then they can > only downgrade to 3.5. This puts us in a difficult spot. I guess in this > scenario, they will need to wait for the new format records to get cleared > away before downgrading again. Is that correct? > > Thanks, > Justine > > On Thu, Mar 16, 2023 at 10:41 AM David Jacot <[email protected]> > wrote: > > > Hi Jeff, > > > > Thanks for the KIP! I am really glad that we are finally addressing this > > gap in KIP-848. I have a few general comments. > > > > 1. Overall, I feel like the important bits are not bold enough in the > KIP. > > I think that it is good to explain the overall upgrade/downgrade process > > and to highlight where the issues are but I think that the core bits > should > > give more details. For instance, we should explain why relying on tag > > fields works to ignore fields added in future releases. My understanding > is > > that it works because the buffer for the tagged fields is serialized at > the > > end so reading with the old version, which is a prefix of the new one, > > works. > > > > 2. Moreover, it would be great if we could make the principle more > general. > > My hope is that we can keep reusing the principles introduced in the KIP > in > > future releases as well. For instance, let's say that we need to add a > new > > field to one of the new records introduced by KIP-848 or that we will > have > > to introduce a new record type as well. Would it work for those cases as > > well? > > > > 3. Regarding enabling support for tagged fields for the OffsetCommitValue > > record, it would be great if you could give more details on the steps to > > get there in the KIP. My understanding is that we would have to do the > > following: 1) Update the code which reads the records to fail back to the > > highest known version if the version stored in the log is unknown. Let's > > say that we do this in AK 3.5. 2) We need to turn on tagged fields for > the > > record. I think that we can only do this in AK 3.6+. > > > > 4. I may have missed this part but we should clearly explain the drawback > > of the proposed approach as well. Say that we enable tagged fields for > > OffsetCommitValue in AK 3.6. This means that it won't be possible to > > downgrade a cluster from 3.6 to a version earlier than 3.5. This is a > > significant limitation in my opinion because, I think, users don't > > necessarily upgrade to all versions. > > > > 5. In the proposal, it is not clear about whether the old software will > > delete unknown records or not. It is true that new records will be > deleted > > when the group is downgraded but this only works if the operator respects > > the process. > > > > 6. It would be great if we could extend the rejected alternative. The > > alternative sounds clearly better when you read it so we should really > > explain the reason to reject it. 1) One issue that you mention is that > the > > log must be compacted before downgrading and we don't really control this > > process. 2) Transactions may be difficult to handle. I suppose that it is > > possible to handle them though. Have you thought about this? > > > > 7. For the new dynamic configs, what happens if they are kept and the > > quorum controller is downgraded? > > > > Best, > > David > > > > On Thu, Mar 16, 2023 at 12:56 AM Jeff Kim <[email protected] > > > > wrote: > > > > > Hi folks, > > > > > > I would like to start a discussion thread for KIP-915: Next Gen Group > > > Coordinator Downgrade Path which proposes the downgrade design for the > > new > > > group coordinator introduced in KIP-848 > > > < > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol > > > > > > > . > > > > > > KIP: > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-915%3A+Next+Gen+Group+Coordinator+Downgrade+Path > > > > > > Thanks, > > > Jeff > > > > > >
