[ https://issues.apache.org/jira/browse/IGNITE-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maxim Muzafarov updated IGNITE-15146: ------------------------------------- Priority: Blocker (was: Critical) > Checking the snapshot creates a large number of unused threads that do not > terminate. > ------------------------------------------------------------------------------------- > > Key: IGNITE-15146 > URL: https://issues.apache.org/jira/browse/IGNITE-15146 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.11 > Reporter: Pavel Pereslegin > Assignee: Maxim Muzafarov > Priority: Blocker > Fix For: 2.12 > > > Each new run of snapshot verification creates dozens of new threads that do > not terminate after the procedure is complete. Over time, this can lead to an > OutOfMemoryError and node failure. > {code:java} > @Test > public void testClusterSnapshotCheckMultipleTimes() throws Exception { > IgniteEx ignite = startGridsWithCache(3, dfltCacheCfg, > CACHE_KEYS_RANGE); > startClientGrid(); > > ignite.snapshot().createSnapshot(SNAPSHOT_NAME) > .get(); > int activeThreadsCntBefore = Thread.activeCount(); > int iterations = 10; > for (int i = 0; i < iterations; i++) > snp(ignite).checkSnapshot(SNAPSHOT_NAME).get(); > int createdThreads = Thread.activeCount() - activeThreadsCntBefore; > assertTrue("Threads created: " + createdThreads, createdThreads < > iterations); > } > {code} > Reproducer shows that 10 snapshot checks add approx > *{color:#de350b}~250{color}* new threads. > The dump of "leaked" thread looks like this: > {noformat} > "binary-metadata-writer-#2208" #2249 prio=5 os_prio=0 tid=0x00007f9974087000 > nid=0x65b38 waiting on condition [0x00007f986cf9c000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <merged>(a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.ignite.internal.processors.cache.binary.BinaryMetadataFileStore$BinaryMetadataAsyncWriter.body0(BinaryMetadataFileStore.java:460) > at > org.apache.ignite.internal.processors.cache.binary.BinaryMetadataFileStore$BinaryMetadataAsyncWriter.body(BinaryMetadataFileStore.java:441) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)