[ 
https://issues.apache.org/jira/browse/IGNITE-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Muzafarov updated IGNITE-15146:
-------------------------------------
    Priority: Blocker  (was: Critical)

> Checking the snapshot creates a large number of unused threads that do not 
> terminate.
> -------------------------------------------------------------------------------------
>
>                 Key: IGNITE-15146
>                 URL: https://issues.apache.org/jira/browse/IGNITE-15146
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.11
>            Reporter: Pavel Pereslegin
>            Assignee: Maxim Muzafarov
>            Priority: Blocker
>             Fix For: 2.12
>
>
> Each new run of snapshot verification creates dozens of new threads that do 
> not terminate after the procedure is complete. Over time, this can lead to an 
> OutOfMemoryError and node failure.
> {code:java}
>     @Test
>     public void testClusterSnapshotCheckMultipleTimes() throws Exception {
>         IgniteEx ignite = startGridsWithCache(3, dfltCacheCfg, 
> CACHE_KEYS_RANGE);
>         startClientGrid();
>         
>         ignite.snapshot().createSnapshot(SNAPSHOT_NAME)
>             .get();
>         int activeThreadsCntBefore = Thread.activeCount();
>         int iterations = 10;
>         for (int i = 0; i < iterations; i++)
>             snp(ignite).checkSnapshot(SNAPSHOT_NAME).get();
>         int createdThreads = Thread.activeCount() - activeThreadsCntBefore;
>         assertTrue("Threads created: " + createdThreads, createdThreads < 
> iterations);
>     }
> {code}
> Reproducer shows that 10 snapshot checks add approx 
> *{color:#de350b}~250{color}* new threads.
> The dump of "leaked" thread looks like this:
> {noformat}
> "binary-metadata-writer-#2208" #2249 prio=5 os_prio=0 tid=0x00007f9974087000 
> nid=0x65b38 waiting on condition [0x00007f986cf9c000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <merged>(a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>       at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>       at 
> org.apache.ignite.internal.processors.cache.binary.BinaryMetadataFileStore$BinaryMetadataAsyncWriter.body0(BinaryMetadataFileStore.java:460)
>       at 
> org.apache.ignite.internal.processors.cache.binary.BinaryMetadataFileStore$BinaryMetadataAsyncWriter.body(BinaryMetadataFileStore.java:441)
>       at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to