[ https://issues.apache.org/jira/browse/GEODE-8644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460939#comment-17460939 ]
Xiaojian Zhou commented on GEODE-8644: -------------------------------------- The root cause is: When CME happened, notifyTimestampsToGateways() will be called in AbstractRegionMap. The gateway event with UPDATE_VERSION operation will be enqueued. At the server as secondary queue holder, this event is ignored, not to call handleSecondaryEvent(). But at the primary queue holder, this event will still be queued and add a unprocessedToken. Since there's no corresponding event will arrive at secondary queue to trigger removal of the token, when this scenario happen, the tokens will always be leaked. It's a very old code and behavior, as old as in 8.2. We did not find this problem earlier is due to 2 reasons: 1) It's a rarely happened race. 2) We did not have a test to purposely test unprocessedToken draining until GEODE-7643 introduced one. There're several ways to fix it: One alternative is not to enqueue this kind of event into primary queue, like what we did in secondary queue. But this alternative changed current logic and assumption and it's risky. So I choose only not to add into unprocessedTokens for this kind of event. This fix is very conservative. > SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() > intermittently fails when queues drain too slowly > ------------------------------------------------------------------------------------------------------------------------------- > > Key: GEODE-8644 > URL: https://issues.apache.org/jira/browse/GEODE-8644 > Project: Geode > Issue Type: Bug > Affects Versions: 1.15.0 > Reporter: Benjamin P Ross > Assignee: Xiaojian Zhou > Priority: Major > Labels: GeodeOperationAPI, needsTriage, pull-request-available > > Currently the test > SerialGatewaySenderQueueDUnitTest.unprocessedTokensMapShouldDrainCompletely() > relies on a 2 second delay to allow for queues to finish draining after > finishing the put operation. If queues take longer than 2 seconds to drain > the test will fail. We should change the test to wait for the queues to be > empty with a long timeout in case the queues never fully drain. -- This message was sent by Atlassian Jira (v8.20.1#820001)