[ https://issues.apache.org/jira/browse/IGNITE-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031767#comment-15031767 ]
Denis Magda commented on IGNITE-1977: ------------------------------------- Finished improving and reworking IgniteSemaphore failover tests. Presently we have the following situation: - both testSemaphoreFailoverSafe and testSemaphoreNonFailoverSafe always pass; - the rest of the tests are failing from time to time and I've "muted" them deliberately by inserting "fail("https://issues.apache.org/jira/browse/IGNITE-1977")" in doTestSemaphore() method implementation. Vladislav, would you mind taking look at the failing tests and fix the implementation of IgniteSemaphore? You need to check them against all the suites that extend GridCacheAbstractDataStructuresFailoverSelfTest. > IgniteSemaphore's failover related tests lead to the deadlock > ------------------------------------------------------------- > > Key: IGNITE-1977 > URL: https://issues.apache.org/jira/browse/IGNITE-1977 > Project: Ignite > Issue Type: Bug > Reporter: Denis Magda > Assignee: Denis Magda > Attachments: ignite-1977.patch > > > All {{IgniteSemaphore}} related tests from > {{GridCacheAbstractDataStructuresFailoverSelfTest}} may cause a deadlock > which leads to the whole suite hanging. > The threads are waiting for the following condition: > {noformat} > "topology-change-thread-3" prio=6 tid=0x000000001d98d800 nid=0x2b20 waiting > on condition [0x000000002066f000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000798149948> (a > org.apache.ignite.internal.processors.datastructures.GridCacheSemaphoreImpl$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) > at > org.apache.ignite.internal.processors.datastructures.GridCacheSemaphoreImpl.acquire(GridCacheSemaphoreImpl.java:538) > at > org.apache.ignite.internal.processors.datastructures.GridCacheSemaphoreImpl.acquire(GridCacheSemaphoreImpl.java:525) > at > org.apache.ignite.internal.processors.cache.datastructures.GridCacheAbstractDataStructuresFailoverSelfTest$7.apply(GridCacheAbstractDataStructuresFailoverSelfTest.java:571) > at > org.apache.ignite.internal.util.lang.GridAbsClosure.run(GridAbsClosure.java:50) > at > org.apache.ignite.testframework.GridTestUtils$7.call(GridTestUtils.java:967) > at > org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86) > {noformat} > Probably the semaphore is not properly released when a node leaves the > topology abruptly. > In addition the tests should be rewritten to the way which is followed by > other data structures and atomics from this suite: using > {{ConstantTopologyChangeWorker}} and its descendants. -- This message was sent by Atlassian JIRA (v6.3.4#6332)