[
https://issues.apache.org/jira/browse/GEODE-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352893#comment-17352893
]
Barrett Oglesby commented on GEODE-9299:
----------------------------------------
The failing assertion is verifying the number of entries in the local secondary
queues is 100 (which matches the number of puts). Instead, it is 101.
{noformat}
int localServer1QueueSize = localServer1.invoke(() ->
getQueueRegionSize(senderId, false));
int localServer2QueueSize = localServer2.invoke(() ->
getQueueRegionSize(senderId, false));
assertEquals(numPuts, localServer1QueueSize + localServer2QueueSize);
{noformat}
Here is some logging that shows the behavior in this test.
Client Starts:
{noformat}
[vm3_v1.12.2] [info 2021/05/21 21:12:16.982 GMT <RMI TCP
Connection(2)-172.17.0.2> tid=0x22] Received method:
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$146/0x00000001008afc40.run
with 0 args on object:
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$146/0x00000001008afc40@59079e6c
[vm3_v1.12.2] [info 2021/05/21 21:12:17.599 GMT <RMI TCP
Connection(2)-172.17.0.2> tid=0x22] Using
org.apache.geode.logging.log4j.internal.impl.Log4jLoggingProvider from
ServiceLoader for service org.apache.geode.logging.internal.spi.LoggingProvider
[vm3_v1.12.2] [info 2021/05/21 21:12:24.490 GMT <poolTimer-DEFAULT-3>
tid=0x32] Updating membership port. Port changed from 0 to 46166. ID is now
7e72072330df(13685:loner):0:6094c590
[vm3_v1.12.2] [info 2021/05/21 21:12:24.526 GMT <RMI TCP
Connection(2)-172.17.0.2> tid=0x22] Got result: null
[vm3_v1.12.2] from
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$146/0x00000001008afc40.run
with 0 args on object:
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$146/0x00000001008afc40@59079e6c
(took 7538 ms)
{noformat}
Client does 100 puts in 22069ms with a SocketTimeoutException:
{noformat}
[vm3_v1.12.2] [info 2021/05/21 21:12:24.567 GMT <RMI TCP
Connection(2)-172.17.0.2> tid=0x22] Received method:
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$339/0x0000000100959840.run
with 0 args on object:
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$339/0x0000000100959840@2e8f97c1
[vm3_v1.12.2] [warn 2021/05/21 21:12:42.233 GMT <RMI TCP
Connection(2)-172.17.0.2> tid=0x22] Pool unexpected socket timed out on client
connection=Pooled Connection to 7e72072330df:21250:
Connection[7e72072330df:21250]@93891194)
[vm3_v1.12.2] [info 2021/05/21 21:12:46.638 GMT <RMI TCP
Connection(2)-172.17.0.2> tid=0x22] Got result: null
[vm3_v1.12.2] from
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$339/0x0000000100959840.run
with 0 args on object:
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$339/0x0000000100959840@2e8f97c1
(took 22069 ms)
{noformat}
The SocketTimeoutException means the client retried the put. That ends up being
2 puts for the same event.
Server 1 returns secondary queue size:
{noformat}
[vm1_v1.12.2] [info 2021/05/21 21:12:46.668 GMT <RMI TCP
Connection(3)-172.17.0.2> tid=0x22] Received method:
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$520/0x0000000100ad1040.run
with 0 args on object:
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$520/0x0000000100ad1040@79d1c376
[vm1_v1.12.2] [info 2021/05/21 21:12:47.598 GMT <RMI TCP
Connection(3)-172.17.0.2> tid=0x22] Got result: null
[vm1_v1.12.2] from
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$520/0x0000000100ad1040.run
with 0 args on object:
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$520/0x0000000100ad1040@79d1c376
(took 929 ms)
{noformat}
Server 2 returns secondary queue size:
{noformat}
[vm2_v1.12.2] [info 2021/05/21 21:12:47.617 GMT <RMI TCP
Connection(3)-172.17.0.2> tid=0x22] Received method:
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$517/0x0000000100ae2c40.run
with 0 args on object:
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$517/0x0000000100ae2c40@751350b6
[vm2_v1.12.2] [info 2021/05/21 21:12:47.782 GMT <RMI TCP
Connection(3)-172.17.0.2> tid=0x22] Got result: null
[vm2_v1.12.2] from
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$517/0x0000000100ae2c40.run
with 0 args on object:
org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest$$Lambda$517/0x0000000100ae2c40@751350b6
(took 161 ms)
{noformat}
The assertEquals check fails right after this, and the test shuts down.
Here is some more detail.
Server 1 buckets are created:
{noformat}
[vm1_v1.12.2] [info 2021/05/21 21:12:24.771 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_0
[vm1_v1.12.2] [info 2021/05/21 21:12:24.847 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_0
completed
[vm1_v1.12.2] [info 2021/05/21 21:12:25.418 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_1
[vm1_v1.12.2] [info 2021/05/21 21:12:25.439 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_1
completed
[vm1_v1.12.2] [info 2021/05/21 21:12:26.012 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_2
[vm1_v1.12.2] [info 2021/05/21 21:12:26.055 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_2
completed
[vm1_v1.12.2] [info 2021/05/21 21:12:26.823 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_3
[vm1_v1.12.2] [info 2021/05/21 21:12:26.842 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_3
completed
[vm1_v1.12.2] [info 2021/05/21 21:12:28.299 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_4
[vm1_v1.12.2] [info 2021/05/21 21:12:28.317 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_4
completed
[vm1_v1.12.2] [info 2021/05/21 21:12:42.903 GMT <ServerConnection on port
27499 Thread 1> tid=0x81] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_5
[vm1_v1.12.2] [info 2021/05/21 21:12:42.929 GMT <ServerConnection on port
27499 Thread 1> tid=0x81] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_5
completed
[vm1_v1.12.2] [info 2021/05/21 21:12:43.102 GMT <ServerConnection on port
27499 Thread 1> tid=0x81] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_6
[vm1_v1.12.2] [info 2021/05/21 21:12:43.107 GMT <ServerConnection on port
27499 Thread 1> tid=0x81] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_6
completed
[vm1_v1.12.2] [info 2021/05/21 21:12:43.661 GMT <ServerConnection on port
27499 Thread 1> tid=0x81] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_7
[vm1_v1.12.2] [info 2021/05/21 21:12:43.692 GMT <ServerConnection on port
27499 Thread 1> tid=0x81] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_7
completed
[vm1_v1.12.2] [info 2021/05/21 21:12:44.095 GMT <ServerConnection on port
27499 Thread 1> tid=0x81] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_8
[vm1_v1.12.2] [info 2021/05/21 21:12:44.098 GMT <ServerConnection on port
27499 Thread 1> tid=0x81] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_8
completed
[vm1_v1.12.2] [info 2021/05/21 21:12:44.366 GMT <ServerConnection on port
27499 Thread 1> tid=0x81] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_9
[vm1_v1.12.2] [info 2021/05/21 21:12:44.386 GMT <ServerConnection on port
27499 Thread 1> tid=0x81] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_9
completed
{noformat}
Server 2 buckets are created:
{noformat}
[vm2_v1.12.2] [info 2021/05/21 21:12:24.643 GMT <ServerConnection on port
21250 Thread 1> tid=0x6f] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_0
[vm2_v1.12.2] [info 2021/05/21 21:12:24.657 GMT <ServerConnection on port
21250 Thread 1> tid=0x6f] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_0
completed
[vm2_v1.12.2] [info 2021/05/21 21:12:25.506 GMT <ServerConnection on port
21250 Thread 1> tid=0x6f] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_1
[vm2_v1.12.2] [info 2021/05/21 21:12:25.539 GMT <ServerConnection on port
21250 Thread 1> tid=0x6f] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_1
completed
[vm2_v1.12.2] [info 2021/05/21 21:12:25.935 GMT <ServerConnection on port
21250 Thread 1> tid=0x6f] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_2
[vm2_v1.12.2] [info 2021/05/21 21:12:25.939 GMT <ServerConnection on port
21250 Thread 1> tid=0x6f] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_2
completed
[vm2_v1.12.2] [info 2021/05/21 21:12:27.963 GMT <ServerConnection on port
21250 Thread 1> tid=0x6f] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_3
[vm2_v1.12.2] [info 2021/05/21 21:12:28.107 GMT <ServerConnection on port
21250 Thread 1> tid=0x6f] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_3
completed
[vm2_v1.12.2] [info 2021/05/21 21:12:28.364 GMT <ServerConnection on port
21250 Thread 1> tid=0x6f] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_4
[vm2_v1.12.2] [info 2021/05/21 21:12:28.410 GMT <ServerConnection on port
21250 Thread 1> tid=0x6f] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_4
completed
[vm2_v1.12.2] [info 2021/05/21 21:12:42.804 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_5
[vm2_v1.12.2] [info 2021/05/21 21:12:42.809 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_5
completed
[vm2_v1.12.2] [info 2021/05/21 21:12:43.179 GMT <Pooled Waiting Message
Processor 2> tid=0x7a] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_6
[vm2_v1.12.2] [info 2021/05/21 21:12:43.240 GMT <Pooled Waiting Message
Processor 2> tid=0x7a] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_6
completed
[vm2_v1.12.2] [info 2021/05/21 21:12:43.543 GMT <Pooled Waiting Message
Processor 3> tid=0x7c] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_7
[vm2_v1.12.2] [info 2021/05/21 21:12:43.550 GMT <Pooled Waiting Message
Processor 3> tid=0x7c] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_7
completed
[vm2_v1.12.2] [info 2021/05/21 21:12:44.132 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_8
[vm2_v1.12.2] [info 2021/05/21 21:12:44.176 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_8
completed
[vm2_v1.12.2] [info 2021/05/21 21:12:44.297 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initializing region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_9
[vm2_v1.12.2] [info 2021/05/21 21:12:44.300 GMT <Pooled Waiting Message
Processor 1> tid=0x39] Initialization of region
_B__testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from__v1.12.2]__region_9
completed
{noformat}
There is a big gap between buckets 4 and 5 being created.
Server 2 has several interesting messages in the gap between 21:12:28 and
21:12:42:
{noformat}
[vm2_v1.12.2] [warn 2021/05/21 21:12:35.588 GMT <StatSampler> tid=0x3f]
Statistics sampling thread detected a wakeup delay of 5627 ms, indicating a
possible resource issue. Check the GC, memory, and CPU statistics.
[vm2_v1.12.2] [warn 2021/05/21 21:12:42.715 GMT <ServerConnection on port
21250 Thread 1> tid=0x6f] Server connection from
[identity(172.17.0.2(13685:loner):46166:6094c590,connection=1; port=46166]:
connection disconnect detected by EOF.
{noformat}
And, the client timed out at that time:
{noformat}
[vm3_v1.12.2] [warn 2021/05/21 21:12:42.233 GMT <RMI TCP
Connection(2)-172.17.0.2> tid=0x22] Pool unexpected socket timed out on client
connection=Pooled Connection to 7e72072330df:21250:
Connection[7e72072330df:21250]@93891194)
{noformat}
So, server 2 had a resource issue (CPU or GC most likely) that caused the
client to timeout.
> CI Failure:
> WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover >
> testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: GEODE-9299
> URL: https://issues.apache.org/jira/browse/GEODE-9299
> Project: Geode
> Issue Type: Bug
> Components: wan
> Affects Versions: 1.15.0
> Reporter: Hale Bales
> Assignee: Barrett Oglesby
> Priority: Major
>
> {code:java}
> org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover
> > testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover[from_v1.12.2]
> FAILED
> java.lang.AssertionError: expected:<100> but was:<101>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:633)
> at
> org.apache.geode.cache.wan.WANRollingUpgradeDUnitTest.stopSenderAndVerifyEvents(WANRollingUpgradeDUnitTest.java:227)
> at
> org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover.testSecondaryEventsNotReprocessedAfterOldSiteMemberFailover(WANRollingUpgradeSecondaryEventsNotReprocessedAfterOldSiteMemberFailover.java:98)
> {code}
> CI Failure:
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK11/builds/229#B
> Artifacts Available here:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.15.0-build.0253/test-results/upgradeTest/1621635640/
--
This message was sent by Atlassian Jira
(v8.3.4#803005)