[ https://issues.apache.org/jira/browse/IGNITE-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444326#comment-16444326 ]
Anton Vinogradov edited comment on IGNITE-7791 at 4/19/18 4:32 PM: ------------------------------------------------------------------- [~dpavlov], My idea is that current logic is correct. Only one issue that {{caches.clear();}} inside {{org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#initCachesOnLocalJoin}} kills survived cache group's descriptors (ignite-sys-cache it this reproducer). This kill affects only *rare* operation like delayed AffinityChangeRequest, that's why it was not fixed during BLT development. {{caches.clear();}} removal fixes the reproducer Maxim created and that's the fix I'm proposing. Solution merged to master fixes reproducer too, but looks like some kind of hotfix without any warranty that logic is not broken now. As far as I understand, [~Mmuzaf] proposed this solution not as final, but with question "Will this breaks something?" [~agoncharuk], could you please finally check that logic related to {{locJoinCachesCtx.removeSurvivedCacheGroups(survivedCacheGrps)}} was removed properly and this removal will not cause issues at production/design? was (Author: avinogradov): [~dpavlov], My idea is that current logic is correct. Only one issue that {{caches.clear();}} inside {{org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#initCachesOnLocalJoin}} kills survived cache group's descriptors (ignite-sys-cache it this reproducer). This kill affects only *rare* operation like delayed AffinityChangeRequest, that's why it was not fixed during BLT development. {{caches.clear();}} removal fixes the reproducer Maxim created and that's the fix I'm proposing. Solution merged to master fixes reproducer too, but looks like some kind of hotfix without any warranty that logic is not broken now. As far as I understand, [~Mmuzaf] proposed this solution not as final, but with question "Will this breaks something?" [~agoncharuk], could you please finally check that logic related to {{locJoinCachesCtx.removeSurvivedCacheGroups(survivedCacheGrps);}} was removed properly and this removal will not cause issues at production/design? > Ignite Client Nodes: failed test > IgniteClientReconnectCacheTest.testReconnectCacheDestroyedAndCreated() > ------------------------------------------------------------------------------------------------------- > > Key: IGNITE-7791 > URL: https://issues.apache.org/jira/browse/IGNITE-7791 > Project: Ignite > Issue Type: Bug > Reporter: Sergey Chugunov > Assignee: Maxim Muzafarov > Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.6 > > Attachments: IgniteClientReconnectCacheDelayExchangeTest.java > > > Test is flaky, success rate: 32.4%, test history is > [here|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-9174769196124217030&tab=testDetails] > Reproducible locally. > Test fails on waiting for disco cache content: > {noformat} > junit.framework.AssertionFailedError > at junit.framework.Assert.fail(Assert.java:55) > at junit.framework.Assert.assertTrue(Assert.java:22) > at junit.framework.Assert.assertTrue(Assert.java:31) > at junit.framework.TestCase.assertTrue(TestCase.java:201) > at > org.apache.ignite.internal.IgniteClientReconnectCacheTest.checkCacheDiscoveryData(IgniteClientReconnectCacheTest.java:1414) > at > org.apache.ignite.internal.IgniteClientReconnectCacheTest.testReconnectCacheDestroyedAndCreated(IgniteClientReconnectCacheTest.java:897) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at junit.framework.TestCase.runTest(TestCase.java:176) > at > org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2001) > at > org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:133) > at > org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1916) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This fail may be caused by another exception earlier in the log: > {noformat} > [2018-02-22 > 14:32:57,972][ERROR][exchange-worker-#3025%internal.IgniteClientReconnectCacheTest3%][GridDhtPartitionsExchangeFuture] > Failed to reinitialize local partitions (preloading will be stopped): > GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, > minorTopVer=1], discoEvt=DiscoveryCustomEvent > [customMsg=CacheAffinityChangeMessage > [id=3cea94db161-760b7b18-b776-4ee7-ae5a-64f9494eaa36, > topVer=AffinityTopologyVersion [topVer=3, minorTopVer=0], exchId=null, > partsMsg=null, exchangeNeeded=true], affTopVer=AffinityTopologyVersion > [topVer=6, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=1b7f7330-69d8-4bd0-b2e9-717736100000, addrs=[127.0.0.1], > sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, > lastExchangeTime=1519299177395, loc=false, ver=2.5.0#19700101-sha1:00000000, > isClient=false], topVer=6, nodeId8=658aeb36, msg=null, > type=DISCOVERY_CUSTOM_EVT, tstamp=1519299177971]], nodeId=1b7f7330, > evt=DISCOVERY_CUSTOM_EVT] > java.lang.AssertionError: 236160867 > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$CachesInfo.group(CacheAffinitySharedManager.java:2636) > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$8.applyx(CacheAffinitySharedManager.java:989) > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$8.applyx(CacheAffinitySharedManager.java:983) > at > org.apache.ignite.internal.util.lang.IgniteInClosureX.apply(IgniteInClosureX.java:38) > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.forAllCacheGroups(CacheAffinitySharedManager.java:1118) > at > org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onChangeAffinityMessage(CacheAffinitySharedManager.java:983) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAffinityChangeRequest(GridDhtPartitionsExchangeFuture.java:992) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:648) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2337) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)