gharris1727 commented on PR #13913: URL: https://github.com/apache/kafka/pull/13913#issuecomment-1639053363
The original MM2 KIPs use very few words to describe very large parts of it's functionality, often leaving things very under-specified, which I think is the case here. I don't think that the original proposal gives us enough to decide for or against this change. Personally, I think that if a user can get themselves into a situation where they: 1. Have ACL sync enabled 2. Can externally observe some difference between the source and target 3. The difference has existed for (2x) longer than the sync interval They are reasonable to conclude that the system is misbehaving, either because the source or target system is unhealthy, or MM2 is unhealthy, or MM2 has a bug in it. If caching causes the above situation to occur, I don't think that caching is a viable solution. I'd be interested in trying other strategies such as: 1. Intentionally un-batching these requests so as to spread them evenly across the poll interval 2. Performing a target read-before-write to replace (potentially expensive?) write calls with read calls 3. Waiting for previous requests to finish before initiating subsequent ones 4. Exponentially backing off after failures @hudeqi In your environment, are you noticing the load on the source system from the ACL reads? Do you have more MM2s connected to the target cluster or the source cluster? I'm wondering if (2) would actually be helpful, or if reads and writes have approximately the same cost. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org