[ 
https://issues.apache.org/jira/browse/IGNITE-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651879#comment-14651879
 ] 

Denis Magda commented on IGNITE-1189:
-------------------------------------

During debugging found out that there is no any issue in the tests or 
{{IgniteEx.stop()}} implementation.
The tests start failing with the issue in the title because there is a 
preceding test that fails with timeout exception - 
{{IgniteCacheAtomicNodeRestartTest.testRestartWithPutTenNodesTwoBackups}}.

According to the logs {{testRestartWithPutTenNodesTwoBackups}} hangs due to a 
deadlock:
1) Thread_1 acquired the first lock that protects {{GridCacheConcurrentMap}} 
and then blocked by trying to acquire the second lock ({{GridCacheMapEntry}}) 
when calling {{GridCacheMapEntry.obsolete}};
2) Thread_2 is trying to acquire lock that protects {{GridCacheConcurrentMap}}.

Seems that Thread_2 acquired the lock to {{GridCacheMapEntry}} instance, that 
Thread_1 is trying to acquire, before and now want to get synchronized access 
to {{GridCacheConcurrentMap}}. However, there is no any useful information in 
the dump to prove this or to find another reason. 

Will analyze code flow to detect the deadlock in it.

> Ignite instance with this name has already been started
> -------------------------------------------------------
>
>                 Key: IGNITE-1189
>                 URL: https://issues.apache.org/jira/browse/IGNITE-1189
>             Project: Ignite
>          Issue Type: Bug
>          Components: general
>            Reporter: Denis Magda
>            Assignee: Denis Magda
>
> I see this issue from time to time on Team City in different tests.
> In general, the stack trace looks like this one:
> {noformat}
> org.apache.ignite.IgniteCheckedException: Ignite instance with this name has 
> already been started: distributed.IgniteCacheAtomicNodeRestartTest0
>     at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:920)
>     at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:477)
>     at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:683)
>     at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:667)
>     at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:644)
>     at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.startGrids(GridCacheAbstractNodeRestartSelfTest.java:158)
>     at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.testRestart(GridCacheAbstractNodeRestartSelfTest.java:177)
>     at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridCachePartitionedNodeRestartTest.testRestart(GridCachePartitionedNodeRestartTest.java:58)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at junit.framework.TestCase.runTest(TestCase.java:176)
>     at 
> org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1624)
>     at 
> org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:70)
>     at 
> org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:1562)
> {noformat}
> Tried to reproduce locally but failed. According to the TC the issue is 
> reproducable with {{IgniteCacheAtomicNodeRestartTest}}.
> My understanding the bug appears because of:
> 1) An issue in the test;
> 2) {{IgniteEx.stop()}} that doesn't remove a grid from its internal map when 
> grid's state is not equal to {{STARTED}} during the stop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to