[ https://issues.apache.org/jira/browse/GEODE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444387#comment-16444387 ]
ASF subversion and git services commented on GEODE-5056: -------------------------------------------------------- Commit 6cd24a877c6a19a1243d9582dd7b1f7edbe2859c in geode's branch refs/heads/release/1.6.0 from Xiaojian Zhou [ https://gitbox.apache.org/repos/asf?p=geode.git;h=6cd24a8 ] GEODE-5056: when found the dropped events at primary sender, send (#1794) QueueRemovalMessage for it (cherry picked from commit f7bb77c89a3d19673e8929275fc6c407a4b382bd) > ParallelGatewaySenderOperationsDUnitTest.testParallelPropagationSenderStartAfterStop_Scenario2 > intermittently fail > ------------------------------------------------------------------------------------------------------------------- > > Key: GEODE-5056 > URL: https://issues.apache.org/jira/browse/GEODE-5056 > Project: Geode > Issue Type: Bug > Components: wan > Reporter: xiaojian zhou > Assignee: xiaojian zhou > Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > Time Spent: 20m > Remaining Estimate: 0h > > After fixe GEODE-4942, I found there's at least one race condition is not > covered. > > [vm6] [debug 2018/04/11 16:47:35.189 PDT <PartitionedRegion Message > Processor2> tid=110] WAN: On primary bucket 57, setting the seq number as 1357 > > [vm7] [info 2018/04/11 16:47:35.150 PDT <RMI TCP Connection(1)-10.118.19.25> > tid=19] Started ParallelGatewaySender\{id=ln,remoteDsId=2,isRunning =true} > > [vm7] [debug 2018/04/11 16:47:35.189 PDT <P2P message reader for > 10.118.19.25(27489)<v3>:32781 shared ordered uid=7 port=59148> tid=95] WAN: > On secondary bucket 57, setting the seq number as 1357 > [vm7] [debug 2018/04/11 16:47:35.190 PDT <P2P message reader for > 10.118.19.25(27489)<v3>:32781 shared ordered uid=7 port=59148> tid=95] Key : > ----> 1357 > [vm6] [debug 2018/04/11 16:47:35.190 PDT <PartitionedRegion Message > Processor2> tid=110] register dropped event for primary queue. BucketId is > 57, shadowKey is 1357, prQ is /ln_PARALLEL_GATEWAY_SENDER_QUEUE > > ----- Note: vm6's sender is restarted and cleanup the map, before the > QueueRemvalMessage is sent out for the map. > [vm6] [info 2018/04/11 16:47:35.249 PDT <RMI TCP Connection(1)-10.118.19.25> > tid=19] Started ParallelGatewaySender\{id=ln,remoteDsId=2,isRunning =true} > [vm6] [debug 2018/04/11 16:47:35.437 PDT <BatchRemovalThread for > GatewaySender_ln_0> tid=118] BatchRemovalThread about to query the batch > removal map \{/ln_PARALLEL_GATEWAY_SENDER_QUEUE={96=[1396], 2=[1402], > 83=[1383], 6=[1406], 71=[1371], 87=[1387], 73=[1373], 90=[1390], 77=[1377], > 94=[1394]}} > [vm6] [debug 2018/04/11 16:47:35.753 PDT <BatchRemovalThread for > GatewaySender_ln_0> tid=118] BatchRemovalThread about to query the batch > removal map {/ln_PARALLEL_GATEWAY_SENDER_QUEUE={49=[1449], 65=[1465], > 83=[1483], 53=[1453], 71=[1471], 87=[1487], *57=[1457]*, 73=[1473], > 77=[1477], 62=[1462]}} > ---- shadowKey 1457 was created after the sender is restarted > > [vm6] [debug 2018/04/11 16:47:35.438 PDT <BatchRemovalThread for > GatewaySender_ln_0> tid=118] Sending (ParallelQueueRemovalMessage@2344969b > processorId=0 sender=10.118.19.25(27489)<v3>:32781) to 3 peers > ([10.118.19.25(27492)<v4>:32783@4(GEODE 1.6.0), > 10.118.19.25(27485)<v2>:32779@1(GEODE 1.6.0), > 10.118.19.25(27482)<v1>:32778@2(GEODE 1.6.0)]) via tcp/ip > [vm7] [debug 2018/04/11 16:47:35.439 PDT <P2P message reader for > 10.118.19.25(27489)<v3>:32781 shared unordered uid=4 port=59119> tid=52] > Received message 'ParallelQueueRemovalMessage@11583f5b processorId=0 > sender=10.118.19.25(27489)<v3>:32781' from <10.118.19.25(27489)<v3>:32781> > > i.e. the dropped key was in the map, but before sending a QueueRemovalMessage > the sender is closed and cleared the map. -- This message was sent by Atlassian JIRA (v7.6.3#76005)