[ 
https://issues.apache.org/jira/browse/KAFKA-16543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hudeqi updated KAFKA-16543:
---------------------------
    Description: 
In the `cleanupGroupMetadata` method, tombstone messages is written to delete 
the group's MetadataKey only when the group is in the Dead state and the 
generation is greater than 0. The comment indicates: 'We avoid writing the 
tombstone when the generationId is 0, since this group is only using Kafka for 
offset storage.' This means that groups that only use Kafka for offset storage 
should not be deleted. However, there is a situation where, for example, Flink 
commit offsets with a generationId equal to -1. If the ApiKeys.DELETE_GROUPS is 
called to delete this group, Flink's group metadata will never be deleted. Yet, 
the logic above has already cleaned up commitKey by writing tombstone messages 
with `removedOffsets`. Therefore, the actual manifestation is: the group no 
longer exists (since the offsets have been cleaned up, there is no possibility 
of adding the group back to the `groupMetadataCache` unless offsets are 
committed again with the same group name), but the corresponding group metadata 
information still exists in __consumer_offsets. This leads to the problem that 
deleting the group does not completely clean up its related information.

The group's state is set to Dead only in the following three situations:
1. The group information is unloaded
2. The group is deleted by ApiKeys.DELETE_GROUPS
3. All offsets of the group have expired or removed.

Therefore, since the group is already in the Dead state and has been removed 
from the `groupMetadataCache`, why not directly clean up all the information of 
the group? Even if it is only used for storing offsets.

  was:
In the `cleanupGroupMetadata` method, tombstone messages is written to delete 
the group's MetadataKey only when the group is in the Dead state and the 
generation is greater than 0. The comment indicates: 'We avoid writing the 
tombstone when the generationId is 0, since this group is only using Kafka for 
offset storage.' This means that groups that only use Kafka for offset storage 
should not be deleted. However, there is a situation where, for example, Flink 
commit offsets with a generationId equal to -1. If the ApiKeys.DELETE_GROUPS is 
called to delete this group, Flink's group metadata will never be deleted. Yet, 
the logic above has already cleaned up commitKey by writing tombstone messages 
with removedOffsets. Therefore, the actual manifestation is: the group no 
longer exists (since the offsets have been cleaned up, there is no possibility 
of adding the group back to the `groupMetadataCache` unless offsets are 
committed again with the same group name), but the corresponding group metadata 
information still exists in __consumer_offsets. This leads to the problem that 
deleting the group does not completely clean up its related information.

The group's state is set to Dead only in the following three situations:
1. The group information is unloaded
2. The group is deleted by ApiKeys.DELETE_GROUPS
3. All offsets of the group have expired or removed.

Therefore, since the group is already in the Dead state and has been removed 
from the `groupMetadataCache`, why not directly clean up all the information of 
the group? Even if it is only used for storing offsets.


> There may be ambiguous deletions in the `cleanupGroupMetadata` when the 
> generation of the group is less than or equal to 0
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-16543
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16543
>             Project: Kafka
>          Issue Type: Bug
>          Components: group-coordinator
>    Affects Versions: 3.6.2
>            Reporter: hudeqi
>            Assignee: hudeqi
>            Priority: Major
>
> In the `cleanupGroupMetadata` method, tombstone messages is written to delete 
> the group's MetadataKey only when the group is in the Dead state and the 
> generation is greater than 0. The comment indicates: 'We avoid writing the 
> tombstone when the generationId is 0, since this group is only using Kafka 
> for offset storage.' This means that groups that only use Kafka for offset 
> storage should not be deleted. However, there is a situation where, for 
> example, Flink commit offsets with a generationId equal to -1. If the 
> ApiKeys.DELETE_GROUPS is called to delete this group, Flink's group metadata 
> will never be deleted. Yet, the logic above has already cleaned up commitKey 
> by writing tombstone messages with `removedOffsets`. Therefore, the actual 
> manifestation is: the group no longer exists (since the offsets have been 
> cleaned up, there is no possibility of adding the group back to the 
> `groupMetadataCache` unless offsets are committed again with the same group 
> name), but the corresponding group metadata information still exists in 
> __consumer_offsets. This leads to the problem that deleting the group does 
> not completely clean up its related information.
> The group's state is set to Dead only in the following three situations:
> 1. The group information is unloaded
> 2. The group is deleted by ApiKeys.DELETE_GROUPS
> 3. All offsets of the group have expired or removed.
> Therefore, since the group is already in the Dead state and has been removed 
> from the `groupMetadataCache`, why not directly clean up all the information 
> of the group? Even if it is only used for storing offsets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to