[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584498#comment-16584498 ]
Jeff Jirsa commented on CASSANDRA-14653: ---------------------------------------- On which version was this observed? > The performance of "NonPeriodicTasks" pools defined in class > ScheduledExecutors is low > -------------------------------------------------------------------------------------- > > Key: CASSANDRA-14653 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14653 > Project: Cassandra > Issue Type: Improvement > Components: Compaction > Environment: Cassandra nodes : > 3 nodes, 330G physical memory per node , and four data directory (ssd) per > node. > Reporter: Peter Xie > Priority: Major > > We use cassandra as backend storage for Janusgraph. when we loading huge data > (~2 billion vertex, ~10 billion edges), we met some problems. > > At first, we use STCS as compaction strategy , but met below exception. we > checked the value of "max memory lock" is unlimited and "file map count" is > 1 million, these values should enough for loading data. last we found this > problem is caused by the virtual memory are all cosumed by cassandra. So not > additional virtual memory can be used by compaction task , and below > exception is thrown out. > {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 > JVMStabilityInspector.javv > a:74 - OutOfMemory error letting the JVM handle the error: > java.lang.OutOfMemoryError: Map failed > {quote} > So, we change compaction strategy to LCS, this change seems can resolve the > virtual memory problem. But we found another problem : Many sstables which > has been compacted are still retained on disk, these old sstables consume so > many disk space, it's causing no enough disk for saving real data. and we > found that many files like "mc_txn_compaction_xxx.log" are created under the > data directory. > After some times' investigaton, found this problem is caused by > "NonPeriodicTasks" thread pools. this pools is always using only one thread > for processing clean task after compaction. this thread pool is instanced > with class DebuggableScheduledThreadPoolExecutor, > and DebuggableScheduledThreadPoolExecutor is inherit from class > ScheduledThreadPoolExecutor. > By reading the code of class DebuggableScheduledThreadPoolExecutor, found > DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and > core pool size is 1. I think it should wrong using unbound queue. If we > using unbound queue, the thread pool wouldn't increasing thread even > there're many tasks are blocked in queue, because unbound queue never would > be full. I think here should use bound queue, so when clean task is heavily, > more threads would created for processing them. > {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String > threadPoolName, int priority) > Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, > priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } > > public ScheduledThreadPoolExecutor(int corePoolSize, > ThreadFactory threadFactory) > Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new > DelayedWorkQueue(), threadFactory); } > {quote} > Below is the case about clean task after compaction. there nearly 3 hours > delay for removing file "mc-56525". > {quote} > TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 > LifecycleTransaction.java:363 - Staging for obsolescence > BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') > .......... > TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - > removing > /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from > list of files tracked for test_2.edgestore > ............ > TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - > Async instance tidier for > /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, > before barrier > TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - > Async instance tidier for > /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, > after barrier > TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - > Async instance tidier for > /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, > completed > {quote} > > > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org