[ https://issues.apache.org/jira/browse/IGNITE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482176#comment-14482176 ]
Alexey Goncharuk edited comment on IGNITE-647 at 4/6/15 11:20 PM: ------------------------------------------------------------------ Initial investigation results: # Node 1 is started without cache "A" configuration. Start is not finished yet, onKernalStart() is not called. # Node 2 is started with cache "A" configuration, configured with fair affinity function. # During discovery data exchange node 1 collects info about cache "A" and adds it to registered caches. # Node 1 continues start process, iterates over registered caches and adds cache "A" to the list of cache contexts. # Node 1 initiates partition map exchange. Since cache "A" is already in the list of started caches, node 1 calculates nodes for cache "A" on topology version 1, but on topology version 1 cache "A" did not exist. This causes the affinity fetch future hang. Proposed solution: add topology version to discovery exchange process so that local node does not start caches received from nodes started after it (steps 3, 4). was (Author: agoncharuk): Initial investigation results: # Node 1 is started without cache "A" configuration. Start is not finished yet, onKernalStart() is not called. # Node 2 is started with cache "A" configuration, configured with fair affinity function. # During discovery data exchange node 1 collects info about cache "A" and adds it to registered caches. # Node 1 continues start process, iterates over registered caches and adds cache "A" to the list of cache contexts. # Node 1 initiates partition map exchange. Since cache "A" is already in the list of started caches, node 1 calculates nodes for cache "A" on topology version 1, but on topology version 1 cache "A" did not exist. This causes the affinity fetch future hang. Proposed solution: add topology version to discovery exchange process so that local node does not start caches received from nodes started after it. > org.apache.ignite.IgniteCacheAffinitySelfTest.testAffinity() hangs > ------------------------------------------------------------------ > > Key: IGNITE-647 > URL: https://issues.apache.org/jira/browse/IGNITE-647 > Project: Ignite > Issue Type: Bug > Reporter: Yakov Zhdanov > Assignee: Alexey Goncharuk > Priority: Blocker > Fix For: sprint-4 > > Attachments: threaddump.txt > > > 1-2 runs out of ~10 local runs hanged for me -- This message was sent by Atlassian JIRA (v6.3.4#6332)