[ https://issues.apache.org/jira/browse/IGNITE-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nikolay Izhikov updated IGNITE-8098: ------------------------------------ Fix Version/s: (was: 2.7) 2.8 > Getting affinity for topology version earlier than affinity is calculated > because of data race > ---------------------------------------------------------------------------------------------- > > Key: IGNITE-8098 > URL: https://issues.apache.org/jira/browse/IGNITE-8098 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.3 > Reporter: Andrey Aleksandrov > Priority: Minor > Fix For: 2.8 > > > From time to time the Ignite cluster with services throws next exception > during restarting of some nodes: > java.lang.IllegalStateException: Getting affinity for topology version > earlier than affinity is calculated [locNode=TcpDiscoveryNode > [id=c770dbcf-2908-442d-8aa0-bf26a2aecfef, addrs=[10.44.162.169, 127.0.0.1], > sockAddrs=[clrv0000041279.ic.ing.net/10.44.162.169:56500, /127.0.0.1:56500], > discPort=56500, order=11, intOrder=8, lastExchangeTime=1520931375337, > loc=true, ver=2.3.3#20180213-sha1:f446df34, isClient=false], > grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=13, > minorTopVer=0], head=AffinityTopologyVersion [topVer=15, minorTopVer=0], > history=[AffinityTopologyVersion [topVer=11, minorTopVer=0], > AffinityTopologyVersion [topVer=11, minorTopVer=1], AffinityTopologyVersion > [topVer=12, minorTopVer=0], AffinityTopologyVersion [topVer=15, > minorTopVer=0]]] > Looks like the reason of this issue is the data race in GridServiceProcessor > class. > How to reproduce: > 1)To simulate data race you should update next place in source code: > Class: GridServiceProcessor > Method: @Override public void onEvent(final DiscoveryEvent evt, final > DiscoCache discoCache) { > Place: > .... > try { > svcName.set(dep.configuration().getName()); > ctx.cache().internalCache(UTILITY_CACHE_NAME).context().affinity(). > affinityReadyFuture(topVer).get(); > //HERE (between GET and REASSIGN) you should add Thread.sleep(100) for > example. > //try { > //Thread.sleep(100); > //} > //catch (InterruptedException e1) { > //e1.printStackTrace(); > //} > > reassign(dep, topVer); > } > catch (IgniteCheckedException ex) { > if (!(e instanceof ClusterTopologyCheckedException)) > LT.error(log, ex, "Failed to do service reassignment (will retry): " + > dep.configuration().getName()); > retries.add(dep); > } > ... > 2)After that you should imitate start/shutdown iterations. For reproducing I > used GridServiceProcessorBatchDeploySelfTest (but timeout on future.get > should be increased to avoid timeout error) -- This message was sent by Atlassian JIRA (v7.6.3#76005)