[ https://issues.apache.org/jira/browse/GEODE-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan McMahon reassigned GEODE-6607: ----------------------------------- Assignee: Ryan McMahon > Possible client subscription data inconsistency due to race between > retrieving filter info and distributing event > ----------------------------------------------------------------------------------------------------------------- > > Key: GEODE-6607 > URL: https://issues.apache.org/jira/browse/GEODE-6607 > Project: Geode > Issue Type: Bug > Components: client queues > Reporter: Ryan McMahon > Assignee: Ryan McMahon > Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > It is possible for a client to miss events from subscription (either CQ or > register interest) due to the following scenario: > Four servers in a cluster, with redundant copies set to 2 for client > subscriptions. The client has its primary subscription endpoint with server > 1 and redundant copies are on servers 2 and 3. Server 2 is killed or lost > due to network partition, so we attempt to restore redundancy by copying the > client queue from server 3 to server 4. > Two things happen when server 4 gets the client queue from server 3. First, > we request the client's filter info which represents the CQ and register > interest info. Second, we actually perform the GII to get the image of the > queue. > A race can occur where an event is being distributed across the cluster > concurrently while server 4 is initializing the client queue. If the > distributed event is processed by server 4 before the filter info is > retrieved, then the event will not match the client subscription filter > because it doesn't exist yet. Then, if the event is processed by server 3 > after GII has started, the event will not be part of the client queue image. > Therefore, the event is never added to the client queue and is lost. > We have a special queue for handling events while a client is initializing, > but it is at too low of a level (MessageDispatcher) to be able to handle this > scenario. One possible solution is moving this special queue to a higher > level (CacheClientNotifier or CacheClientProxy) so the event is queued before > we even attempt to get filter info. Then, when initialization finishes, we > drain the queue, see if it matches the initialized client's filter, and send > it along if so. A similar solution could be done on the GII provider side > but it might be a bit messier. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)