[
https://issues.apache.org/jira/browse/GEODE-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352894#comment-17352894
]
Barrett Oglesby commented on GEODE-9299:
----------------------------------------
If I simulate this behavior with a sleep on key=5 in Put65, I see the same
extra event in the queue.
Keys 0-4 are processed normally in servers 1 and 2:
Server 1:
{noformat}
ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=0
ServerConnection on port 57561 Thread 1:
ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=0; shadowKey=113
ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=1
P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered
uid=8 dom #2 port=57607: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=1; shadowKey=114
ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=2
P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered
uid=8 dom #2 port=57607: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=2; shadowKey=115
ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=3
ServerConnection on port 57561 Thread 1:
ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=3; shadowKey=116
ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=4
P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered
uid=8 dom #2 port=57607: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=4; shadowKey=117
{noformat}
Server 2:
{noformat}
P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered
uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=0; shadowKey=113
P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered
uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=1; shadowKey=114
P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered
uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=2; shadowKey=115
P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered
uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=3; shadowKey=116
P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered
uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=4; shadowKey=117
{noformat}
The ServerConnection thread in server 1 sleeps before processing key=5:
{noformat}
ServerConnection on port 57561 Thread 1: Put65.cmdExecute processing key=5
ServerConnection on port 57561 Thread 1: Put65.cmdExecute sleeping key=5
{noformat}
The client times out and fails over to server2 and retries key=5 and continues
with keys 6-9. Notice the event with key=5 has shadowKey=118. Thats the key in
the queue.
{noformat}
ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing retried
key=5
P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered
uid=10 dom #2 port=57668: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=5; shadowKey=118
ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing key=6
ServerConnection on port 57587 Thread 2:
ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=6; shadowKey=119
ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing key=7
P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered
uid=10 dom #2 port=57668: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=7; shadowKey=120
ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing key=8
ServerConnection on port 57587 Thread 2:
ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=8; shadowKey=121
ServerConnection on port 57587 Thread 2: Put65.cmdExecute processing key=9
P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered
uid=10 dom #2 port=57668: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=9; shadowKey=122
{noformat}
Server 1 enqueues keys 5-9:
{noformat}
P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered
uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=5; shadowKey=118
P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered
uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=6; shadowKey=119
P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered
uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=7; shadowKey=120
P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered
uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=8; shadowKey=121
P2P message reader for 10.166.145.22(ln-2:85040)<v28>:41003 unshared ordered
uid=10 dom #1 port=57664: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=9; shadowKey=122
{noformat}
The ServerConnection thread in server 1 wakes up and processes key=5.
DistributedEventTracker.hasSeenEvent realizes its a duplicate for the data
region. That doesn't precent the enqueueing of the event. Notice the event is
the same EventID, but a different shadowKey (231). So, this is a second event
in the queue for the same event.
{noformat}
ServerConnection on port 57561 Thread 1: Put65.cmdExecute woke up key=5
ServerConnection on port 57561 Thread 1: DistributedEventTracker.hasSeenEvent
event=EntryEventImpl[op=UPDATE;region=/__PR/_B__data_5;key=5;id=EventID[id=25
bytes;threadID=1;sequenceID=5];tailKey=231]; highestSequenceNumber=5
ServerConnection on port 57561 Thread 1:
ParallelGatewaySenderEventProcessor.enqueueEvent put dataKey=5; shadowKey=231
{noformat}
Server 2 processes that duplicate event the same way:
{noformat}
P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered
uid=8 dom #1 port=57606: DistributedEventTracker.hasSeenEvent
event=EntryEventImpl[op=UPDATE;region=/__PR/_B__data_5;key=5;id=EventID[id=25
bytes;threadID=1;sequenceID=5];routing=;tailKey=231]; highestSequenceNumber=5
P2P message reader for 10.166.145.22(ln-1:85023)<v27>:41002 unshared ordered
uid=8 dom #1 port=57606: ParallelGatewaySenderEventProcessor.enqueueEvent put
dataKey=5; shadowKey=231
{noformat}
Dumping the queues on each server shows 11 events in each queue with 2 events
for key=5.
Server 1:
{noformat}
The queue for parallel GatewaySender ny contains the following 6 primary
entries:
queueKey=113; region=/data; operation=CREATE; eventKey=0;
value=0; possibleDuplicate=true;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10000|1;sequenceID=0;bucketID=0]
queueKey=116; region=/data; operation=CREATE; eventKey=3;
value=3; possibleDuplicate=true;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10003|1;sequenceID=3;bucketID=3]
queueKey=118; region=/data; operation=CREATE; eventKey=5;
value=5; possibleDuplicate=true;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10005|1;sequenceID=5;bucketID=5]
queueKey=120; region=/data; operation=CREATE; eventKey=7;
value=7; possibleDuplicate=true;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10007|1;sequenceID=7;bucketID=7]
queueKey=122; region=/data; operation=CREATE; eventKey=9;
value=9; possibleDuplicate=true;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10009|1;sequenceID=9;bucketID=9]
queueKey=231; region=/data; operation=UPDATE; eventKey=5;
value=5; possibleDuplicate=true;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10005|1;sequenceID=5;bucketID=5]
The queue for parallel GatewaySender ny contains the following 5 secondary
entries:
queueKey=114; region=/data; operation=CREATE; eventKey=1;
value=1; possibleDuplicate=false;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10001|1;sequenceID=1;bucketID=1]
queueKey=115; region=/data; operation=CREATE; eventKey=2;
value=2; possibleDuplicate=false;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10002|1;sequenceID=2;bucketID=2]
queueKey=117; region=/data; operation=CREATE; eventKey=4;
value=4; possibleDuplicate=false;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10004|1;sequenceID=4;bucketID=4]
queueKey=119; region=/data; operation=CREATE; eventKey=6;
value=6; possibleDuplicate=false;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10006|1;sequenceID=6;bucketID=6]
queueKey=121; region=/data; operation=CREATE; eventKey=8;
value=8; possibleDuplicate=false;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10008|1;sequenceID=8;bucketID=8]
{noformat}
Server 2:
{noformat}
The queue for parallel GatewaySender ny contains the following 5 primary
entries:
queueKey=114; region=/data; operation=CREATE; eventKey=1;
value=1; possibleDuplicate=true;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10001|1;sequenceID=1;bucketID=1]
queueKey=115; region=/data; operation=CREATE; eventKey=2;
value=2; possibleDuplicate=true;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10002|1;sequenceID=2;bucketID=2]
queueKey=117; region=/data; operation=CREATE; eventKey=4;
value=4; possibleDuplicate=true;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10004|1;sequenceID=4;bucketID=4]
queueKey=119; region=/data; operation=CREATE; eventKey=6;
value=6; possibleDuplicate=true;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10006|1;sequenceID=6;bucketID=6]
queueKey=121; region=/data; operation=CREATE; eventKey=8;
value=8; possibleDuplicate=true;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10008|1;sequenceID=8;bucketID=8]
The queue for parallel GatewaySender ny contains the following 6 secondary
entries:
queueKey=113; region=/data; operation=CREATE; eventKey=0;
value=0; possibleDuplicate=false;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10000|1;sequenceID=0;bucketID=0]
queueKey=116; region=/data; operation=CREATE; eventKey=3;
value=3; possibleDuplicate=false;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10003|1;sequenceID=3;bucketID=3]
queueKey=118; region=/data; operation=CREATE; eventKey=5;
value=5; possibleDuplicate=false;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10005|1;sequenceID=5;bucketID=5]
queueKey=120; region=/data; operation=CREATE; eventKey=7;
value=7; possibleDuplicate=false;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10007|1;sequenceID=7;bucketID=7]
queueKey=122; region=/data; operation=CREATE; eventKey=9;
value=9; possibleDuplicate=false;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10009|1;sequenceID=9;bucketID=9]
queueKey=231; region=/data; operation=UPDATE; eventKey=5;
value=5; possibleDuplicate=true;
eventId=EventID[10.166.145.22(:loner):57600:533faaaf;threadID=0x10005|1;sequenceID=5;bucketID=5]
{noformat}
If I start the remote site, I see logging in
DistributedEventTracker.hasSeenEvent that ignores the duplicate event, and the
region size is correct:
{noformat}
ServerConnection on port 5417 Thread 8: DistributedEventTracker.hasSeenEvent
event=EntryEventImpl[op=UPDATE;region=/__PR/_B__data_5;key=5;id=EventID[id=25
bytes;threadID=0x10005|1;sequenceID=5;bucketId=5];tailKey=231];
highestSequenceNumber=5
Region name=data; size=10
{noformat}
I think the way to address this issue is to change the equals check to greater
than or equals.
> CI Failure:
> WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover >
> testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: GEODE-9299
> URL: https://issues.apache.org/jira/browse/GEODE-9299
> Project: Geode
> Issue Type: Bug
> Components: wan
> Affects Versions: 1.15.0
> Reporter: Hale Bales
> Assignee: Barrett Oglesby
> Priority: Major
>
> {code:java}
> org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover
> > testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from_v1.12.2]
> FAILED
> java.lang.AssertionError: expected:<100> but was:<101>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:633)
> at
> org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.stopSenderAndVerifyEvents(WANRollingUpgradeDUnitTest.java:227)
> at
> org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover.testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover(WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover.java:98)
> {code}
> CI Failure:
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK11/builds/229#B
> Artifacts Available here:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0253/test-results/upgradeTest/1621635640/
--
This message was sent by Atlassian Jira
(v8.3.4#803005)