[ https://issues.apache.org/jira/browse/IGNITE-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495894#comment-17495894 ]
Ignite TC Bot commented on IGNITE-16589: ---------------------------------------- {panel:title=Branch: [pull/9839/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/9839/head] Base: [master] : New Tests (32)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1} {color:#00008b}Cache 6{color} [[tests 2|https://ci.ignite.apache.org/viewLog.html?buildId=6426493]] * {color:#013220}IgniteCacheTestSuite6: CacheNoAffinityExchangeTest.testNoAffinityOnJoiningClientNode - PASSED{color} * {color:#013220}IgniteCacheTestSuite6: CacheNoAffinityExchangeTest.testNoAffinityOnClientCacheStart - PASSED{color} {color:#00008b}PDS 2{color} [[tests 30|https://ci.ignite.apache.org/viewLog.html?buildId=6426534]] * {color:#013220}IgnitePdsTestSuite2: CdcSelfTest.testReadBeforeGracefulShutdown[specificConsistentId=true,walMode=LOG_ONLY,metricExporter=org.apache.ignite.cdc.CdcSelfTest$$Lambda$17/1457157255@10f7c76,pageSz=8,192] - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: CdcSelfTest.testReadAllKeys[specificConsistentId=true,walMode=LOG_ONLY,metricExporter=org.apache.ignite.cdc.CdcSelfTest$$Lambda$17/1457157255@10f7c76,pageSz=8,192] - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: CdcSelfTest.testCdcSingleton[specificConsistentId=false,walMode=BACKGROUND,metricExporter=org.apache.ignite.cdc.CdcSelfTest$$Lambda$17/1457157255@10f7c76,pageSz=8,192] - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: CdcSelfTest.testMultiNodeConsumption[specificConsistentId=false,walMode=BACKGROUND,metricExporter=org.apache.ignite.cdc.CdcSelfTest$$Lambda$17/1457157255@10f7c76,pageSz=8,192] - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: CdcSelfTest.testReadAllKeys[specificConsistentId=false,walMode=LOG_ONLY,metricExporter=org.apache.ignite.cdc.CdcSelfTest$$Lambda$17/1457157255@10f7c76,pageSz=8,192] - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: CdcSelfTest.testCdcSingleton[specificConsistentId=true,walMode=LOG_ONLY,metricExporter=org.apache.ignite.cdc.CdcSelfTest$$Lambda$17/1457157255@10f7c76,pageSz=8,192] - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: CdcSelfTest.testMultiNodeConsumption[specificConsistentId=true,walMode=LOG_ONLY,metricExporter=org.apache.ignite.cdc.CdcSelfTest$$Lambda$17/1457157255@10f7c76,pageSz=8,192] - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: CdcSelfTest.testReReadWhenStateWasNotStored[specificConsistentId=true,walMode=LOG_ONLY,metricExporter=org.apache.ignite.cdc.CdcSelfTest$$Lambda$17/1457157255@10f7c76,pageSz=8,192] - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: CdcSelfTest.testCdcSingleton[specificConsistentId=false,walMode=LOG_ONLY,metricExporter=org.apache.ignite.cdc.CdcSelfTest$$Lambda$17/1457157255@10f7c76,pageSz=8,192] - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: CdcSelfTest.testMultiNodeConsumption[specificConsistentId=false,walMode=LOG_ONLY,metricExporter=org.apache.ignite.cdc.CdcSelfTest$$Lambda$17/1457157255@10f7c76,pageSz=8,192] - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: CdcSelfTest.testReReadWhenStateWasNotStored[specificConsistentId=false,walMode=LOG_ONLY,metricExporter=org.apache.ignite.cdc.CdcSelfTest$$Lambda$17/1457157255@10f7c76,pageSz=8,192] - PASSED{color} ... and 19 new tests {panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6426579&buildTypeId=IgniteTests24Java8_RunAll] > Failure handler kills server node on getting affinity from old topology > ----------------------------------------------------------------------- > > Key: IGNITE-16589 > URL: https://issues.apache.org/jira/browse/IGNITE-16589 > Project: Ignite > Issue Type: Bug > Reporter: Vyacheslav Koptilin > Assignee: Vyacheslav Koptilin > Priority: Major > Fix For: 2.13 > > Time Spent: 10m > Remaining Estimate: 0h > > In general, the following exception seems to be a bit overkill > {code:java} > [2022-02-21 > 10:34:53,347][ERROR][aff-#300%cache.CacheNoAffinityExchangeTest0%][IgniteTestResources] > Critical system error detected. Will be handled accordingly to configured > handler [hnd=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]], > failureCtx=FailureContext [type=CRITICAL_ERROR, > err=java.lang.IllegalStateException: Getting affinity for too old topology > version that is already out of history [locNode=TcpDiscoveryNode > [id=0917cb9d-2825-46eb-b210-1e2846f00000, consistentId=127.0.0.1:47500, > addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], > discPort=47500, order=1, intOrder=1, lastExchangeTime=1645428893228, > loc=true, ver=2.13.0#20220218-sha1:7e63c212, isClient=false], > grp=client-cache, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], > lastAffChangeTopVer=AffinityTopologyVersion [topVer=1, minorTopVer=2], > head=AffinityTopologyVersion [topVer=8, minorTopVer=0], > history=[AffinityTopologyVersion [topVer=7, minorTopVer=0], > AffinityTopologyVersion [topVer=8, minorTopVer=0]]]]] > java.lang.IllegalStateException: Getting affinity for too old topology > version that is already out of history [locNode=TcpDiscoveryNode > [id=0917cb9d-2825-46eb-b210-1e2846f00000, consistentId=127.0.0.1:47500, > addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], > discPort=47500, order=1, intOrder=1, lastExchangeTime=1645428893228, > loc=true, ver=2.13.0#20220218-sha1:7e63c212, isClient=false], > grp=client-cache, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], > lastAffChangeTopVer=AffinityTopologyVersion [topVer=1, minorTopVer=2], > head=AffinityTopologyVersion [topVer=8, minorTopVer=0], > history=[AffinityTopologyVersion [topVer=7, minorTopVer=0], > AffinityTopologyVersion [topVer=8, minorTopVer=0]]] > at > org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:849) > at > org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:796) > at > org.apache.ignite.internal.processors.cache.CacheGroupContext.processAffinityAssignmentRequest0(CacheGroupContext.java:1130) > at > org.apache.ignite.internal.processors.cache.CacheGroupContext.processAffinityAssignmentRequest(CacheGroupContext.java:1116) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1151) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:592) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:110) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:309) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907) > at > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242) > at > org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421) > at > org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > it seems that much more suitable way to handle this should be to kill/restart > the client node or return an error for the client to handle or to attempt > remapping the operation on the new topology. For now, it is not clear why > exactly the node must be shut down after this exception and documentation > doesn't explain how to avoid this. > The root cause of the issue is that AffinityRequest from the client node > refers to the "old" topology version, which has been already wiped on the > server-side. > The possible scenario is the following: > - the client wants to get a proxy for a cache that is already started on > server nodes > - client started custom exchange task (see > GridCacheProcessor.processCustomExchangeTask) > - before sending AffinityRequest the client node hanged due to a long GC > pause, for example > - cluster topology changed multiple times during that GC pause (the number > of changes is enough to clean up the old history of affinity assignments) > - server node received AffinityRequest from the client and, unfortunately, > cannot process it in the right way because of a lack of history. > IMHO we can respond to the client with an "empty" AffinityResponse which > should provide a cause of the problem. > In that case, the client node may try to reconnect to the cluster (if PME is > in progress), or retry an operation. -- This message was sent by Atlassian Jira (v8.20.1#820001)