dajac commented on PR #17444: URL: https://github.com/apache/kafka/pull/17444#issuecomment-2411286258
> Agree, hashCode() is not good enough for it. We should use something like md5. Also, the hash function should generate same result for same data with different order, because different order doesn't impact target assignment result. MurmurHash would be better in my opinion and I think that we already use it in the code base. Regarding the order, we may have to sort the racks in order to be consistent. > Do you mean that we just want to have a hash value in ConsumerGroupPartitionMetadataValue, not a list of TopicMetadata? Exactly. I think that we could even thinking about removing that record and putting the hash into ConsumerGroupMetadataValue or somewhere else. Would it be possible? > Since you mention controller in option 1, I'm not quite sure whether we want to store epoch in TopicImage or TopicMetadata. For a new epoch in TopicImage, we may need to store rack information to it as well, or the epoch can't represent all changes (e.g. rack change). However, the TopicImage is not only used in group coordinator, it's also used in KRaft. It's may not be a good idea to couple it. For a new epoch in TopicMetadata, we still need to calculate subscription metadata when MetadataImage is updated. To save storage resources, we don't want to store duplicated information. Eventually, the way may not be more efficient than option 2. If you update the epoch based on replica changes, it would also catch the rack changes because you can only change the rack when you restart the broker. However, I agree that the coupling is not ideal. This is actually what I pointed out in the Jira. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
