[ 
https://issues.apache.org/jira/browse/IGNITE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482176#comment-14482176
 ] 

Alexey Goncharuk edited comment on IGNITE-647 at 4/6/15 11:20 PM:
------------------------------------------------------------------

Initial investigation results:

# Node 1 is started without cache "A" configuration. Start is not finished yet, 
onKernalStart() is not called.
# Node 2 is started with cache "A" configuration, configured with fair affinity 
function.
# During discovery data exchange node 1 collects info about cache "A" and adds 
it to registered caches.
# Node 1 continues start process, iterates over registered caches and adds 
cache "A" to the list of cache contexts.
# Node 1 initiates partition map exchange. Since cache "A" is already in the 
list of started caches, node 1 calculates nodes for cache "A" on topology 
version 1, but on topology version 1 cache "A" did not exist. This causes the 
affinity fetch future hang.

Proposed solution: add topology version to discovery exchange process so that 
local node does not start caches received from nodes started after it (steps 3, 
4).


was (Author: agoncharuk):
Initial investigation results:

# Node 1 is started without cache "A" configuration. Start is not finished yet, 
onKernalStart() is not called.
# Node 2 is started with cache "A" configuration, configured with fair affinity 
function.
# During discovery data exchange node 1 collects info about cache "A" and adds 
it to registered caches.
# Node 1 continues start process, iterates over registered caches and adds 
cache "A" to the list of cache contexts.
# Node 1 initiates partition map exchange. Since cache "A" is already in the 
list of started caches, node 1 calculates nodes for cache "A" on topology 
version 1, but on topology version 1 cache "A" did not exist. This causes the 
affinity fetch future hang.

Proposed solution: add topology version to discovery exchange process so that 
local node does not start caches received from nodes started after it.

> org.apache.ignite.IgniteCacheAffinitySelfTest.testAffinity() hangs
> ------------------------------------------------------------------
>
>                 Key: IGNITE-647
>                 URL: https://issues.apache.org/jira/browse/IGNITE-647
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Yakov Zhdanov
>            Assignee: Alexey Goncharuk
>            Priority: Blocker
>             Fix For: sprint-4
>
>         Attachments: threaddump.txt
>
>
> 1-2 runs out of ~10 local runs hanged for me



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to