[ https://issues.apache.org/jira/browse/CASSANDRA-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346613#comment-15346613 ]
Nimi Wariboko Jr. commented on CASSANDRA-11373: ----------------------------------------------- I think I may have hit this issue with 3.3. I tried upgrading to 3.7, but the node is still stuck in a compaction loop. It looks like the fix doesn't account for already aborted compactions? > Cancelled compaction leading to infinite loop in compaction strategy > getNextBackgroundTask > ------------------------------------------------------------------------------------------ > > Key: CASSANDRA-11373 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11373 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Reporter: Eduard Tudenhoefner > Assignee: Marcus Eriksson > Fix For: 2.2.6, 3.0.5, 3.5 > > Attachments: jstack.txt > > > Our test is basically running *nodetool repair* on specific keyspaces (such > as keyspace1) and the test is also triggering *nodetool compact keyspace1 > standard1* in the background. > And so it looks like running major compactions & repairs lead to that issue > when using *LCS*. > Below is an excerpt from the *thread dump* (the rest is attached) > {code} > "CompactionExecutor:2" #33 daemon prio=1 os_prio=4 tid=0x00007f5363e64f10 > nid=0x3c4e waiting for monitor entry [0x00007f53340d8000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:252) > - waiting to lock <0x00000006c9362c80> (a > org.apache.cassandra.db.compaction.CompactionStrategyManager) > at > org.apache.cassandra.db.lifecycle.Tracker.notifySSTableRepairedStatusChanged(Tracker.java:434) > at > org.apache.cassandra.db.compaction.CompactionManager.performAnticompaction(CompactionManager.java:550) > at > org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:465) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: > - <0x00000006c9362ca8> (a > java.util.concurrent.ThreadPoolExecutor$Worker) > "CompactionExecutor:1" #32 daemon prio=1 os_prio=4 tid=0x00007f5363e618b0 > nid=0x3c4d runnable [0x00007f5334119000] > java.lang.Thread.State: RUNNABLE > at com.google.common.collect.Iterators$7.computeNext(Iterators.java:650) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at com.google.common.collect.Iterators.addAll(Iterators.java:361) > at com.google.common.collect.Iterables.addAll(Iterables.java:354) > at > org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:589) > at > org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:349) > - locked <0x00000006d0f7a6a8> (a > org.apache.cassandra.db.compaction.LeveledManifest) > at > org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:98) > - locked <0x00000006d0f7a568> (a > org.apache.cassandra.db.compaction.LeveledCompactionStrategy) > at > org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:95) > - locked <0x00000006c9362c80> (a > org.apache.cassandra.db.compaction.CompactionStrategyManager) > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:257) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > *CPU usage is at 100%* > {code} > top -p 15386 > top - 12:12:40 up 1:28, 1 user, load average: 1.08, 1.11, 1.16 > Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie > %Cpu(s): 0.3 us, 0.0 sy, 12.4 ni, 87.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 > st > KiB Mem: 16433792 total, 8947336 used, 7486456 free, 89552 buffers > KiB Swap: 0 total, 0 used, 0 free. 3326796 cached Mem > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 15386 automat+ 20 0 7891448 5.004g 290184 S 102.9 31.9 80:07.06 java > {code} > *ttop shows that the compaction thread consumes all the CPU* > {code} > $ java -jar sjk.jar ttop -p 15386 > Monitoring threads ... > 2016-03-17T12:17:13.514+0000 Process summary > process cpu=126.34% > application cpu=102.81% (user=102.46% sys=0.35%) > other: cpu=23.53% > heap allocation rate 375mb/s > [000002] user= 0.00% sys= 0.00% alloc= 0b/s - Reference Handler > [000003] user= 0.00% sys= 0.00% alloc= 0b/s - Finalizer > [000005] user= 0.00% sys= 0.00% alloc= 0b/s - Signal Dispatcher > [000012] user= 0.00% sys= 0.00% alloc= 0b/s - RMI TCP Accept-7199 > [000013] user= 0.00% sys= 0.00% alloc= 0b/s - RMI TCP Accept-0 > [000015] user= 0.00% sys= 0.00% alloc= 476b/s - > AsyncAppender-Worker-ASYNCDEBUGLOG > [000016] user= 0.00% sys= 0.05% alloc= 1070b/s - ScheduledTasks:1 > [000017] user= 0.00% sys= 0.00% alloc= 33b/s - EXPIRING-MAP-REAPER:1 > [000018] user= 0.00% sys= 0.02% alloc= 1932b/s - Background_Reporter:1 > [000022] user= 0.00% sys= 0.00% alloc= 0b/s - MemtablePostFlush:1 > [000023] user= 0.00% sys= 0.00% alloc= 0b/s - MemtableReclaimMemory:1 > [000026] user= 0.00% sys= 0.00% alloc= 0b/s - SlabPoolCleaner > [000027] user= 0.00% sys= 0.00% alloc= 0b/s - PERIODIC-COMMIT-LOG-SYNCER > [000028] user= 0.00% sys= 0.00% alloc= 0b/s - COMMIT-LOG-ALLOCATOR > [000029] user= 0.00% sys= 0.01% alloc= 7086b/s - OptionalTasks:1 > [000030] user= 0.00% sys= 0.00% alloc= 0b/s - Reference-Reaper:1 > [000031] user= 0.00% sys= 0.00% alloc= 0b/s - > Strong-Reference-Leak-Detector:1 > [000032] user=99.45% sys= 0.07% alloc= 374mb/s - CompactionExecutor:1 > [000033] user= 0.00% sys= 0.00% alloc= 0b/s - CompactionExecutor:2 > [000036] user= 0.00% sys= 0.00% alloc= 0b/s - NonPeriodicTasks:1 > [000037] user= 0.00% sys= 0.00% alloc= 0b/s - LocalPool-Cleaner:1 > [000041] user= 0.00% sys= 0.00% alloc= 0b/s - IndexSummaryManager:1 > [000043] user= 0.00% sys= 0.01% alloc= 2705b/s - GossipTasks:1 > [000044] user= 0.00% sys= 0.00% alloc= 0b/s - ACCEPT-/10.200.182.146 > [000045] user= 0.00% sys= 0.01% alloc= 2283b/s - BatchlogTasks:1 > [000055] user= 0.00% sys= 0.02% alloc= 9494b/s - GossipStage:1 > [000056] user= 0.00% sys= 0.00% alloc= 0b/s - AntiEntropyStage:1 > [000057] user= 0.00% sys= 0.00% alloc= 0b/s - MigrationStage:1 > [000058] user= 0.00% sys= 0.00% alloc= 0b/s - MiscStage:1 > [000067] user= 0.00% sys= 0.02% alloc= 2445b/s - > MessagingService-Incoming-/10.200.182.144 > [000068] user= 0.00% sys= 0.01% alloc= 968b/s - > MessagingService-Outgoing-/10.200.182.144 > [000069] user= 0.00% sys= 0.00% alloc= 0b/s - > MessagingService-Outgoing-/10.200.182.144 > [000070] user= 0.00% sys= 0.02% alloc= 512b/s - > MessagingService-Outgoing-/10.200.182.144 > [000072] user= 0.00% sys= 0.00% alloc= 0b/s - NanoTimeToCurrentTimeMillis > updater > [000073] user= 0.00% sys= 0.02% alloc= 3113b/s - > MessagingService-Incoming-/10.200.182.144 > [000074] user= 0.00% sys= 0.00% alloc= 0b/s - PendingRangeCalculator:1 > [000075] user= 0.00% sys= 0.41% alloc= 66kb/s - SharedPool-Worker-1 > [000076] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-2 > [000077] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-3 > [000078] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-4 > [000079] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-5 > [000080] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-6 > [000081] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-7 > [000082] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-8 > [000084] user= 0.00% sys= 0.00% alloc= 0b/s - Thread-2 > [000085] user= 0.00% sys= 0.00% alloc= 181b/s - HintsWriteExecutor:1 > [000091] user= 0.00% sys= 0.00% alloc= 0b/s - PO-thread-0 > [000092] user= 0.00% sys= 0.00% alloc= 0b/s - > NodeHealthPlugin-Scheduler-thread-0 > [000093] user= 0.00% sys= 0.00% alloc= 0b/s - pool-10-thread-1 > [000094] user= 0.00% sys= 0.00% alloc= 0b/s - pool-10-thread-2 > [000097] user= 0.00% sys= 0.00% alloc= 0b/s - Lease RemoteMessageServer > acceptor-2-1 > [000104] user= 0.00% sys= 0.00% alloc= 0b/s - RemoteMessageClient > worker-4-1 > [000120] user= 0.00% sys= 0.00% alloc= 0b/s - RemoteMessageClient > connection limiter - 0 > [000121] user= 0.00% sys= 0.00% alloc= 0b/s - threadDeathWatcher-5-1 > [000122] user= 0.00% sys= 0.00% alloc= 0b/s - PO-thread scheduler > [000123] user= 0.00% sys= 0.00% alloc= 0b/s - JOB-TRACKER > [000124] user= 0.00% sys= 0.01% alloc= 1276b/s - TASK-TRACKER > [000127] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-1 > [000128] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-2 > [000129] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-3 > [000130] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-4 > [000131] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-5 > [000132] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-6 > [000133] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-7 > [000134] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-8 > [000135] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-9 > [000136] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-10 > [000137] user= 0.19% sys=-0.18% alloc= 0b/s - epollEventLoopGroup-6-11 > [000138] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-12 > [000139] user= 0.19% sys=-0.19% alloc= 0b/s - epollEventLoopGroup-6-13 > [000140] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-14 > [000141] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-15 > [000142] user= 0.00% sys= 0.00% alloc= 0b/s - epollEventLoopGroup-6-16 > [000143] user= 0.19% sys=-0.04% alloc= 13kb/s - Thread-7 > [000144] user= 0.00% sys= 0.00% alloc= 0b/s - taskCleanup > [000145] user= 0.00% sys= 0.00% alloc= 0b/s - DseGossipStateUpdater > [000146] user= 0.00% sys= 0.00% alloc= 0b/s - DestroyJavaVM > [000149] user= 0.00% sys= 0.00% alloc= 0b/s - Thread-10 > [000150] user= 0.00% sys= 0.00% alloc= 0b/s - Thread-11 > [000151] user= 0.00% sys= 0.00% alloc= 0b/s - Directory/File cleanup > thread > [000153] user= 0.00% sys= 0.00% alloc= 0b/s - pool-15-thread-1 > [000190] user= 0.00% sys= 0.00% alloc= 0b/s - pool-18-thread-1 > [000215] user= 0.00% sys= 0.00% alloc= 0b/s - pool-10-thread-3 > [000217] user= 0.00% sys= 0.00% alloc= 0b/s - RMI Scheduler(0) > [000220] user= 0.00% sys= 0.00% alloc= 0b/s - RMI TCP > Connection(335)-10.200.182.146 > [000222] user= 0.00% sys= 0.00% alloc= 0b/s - pool-10-thread-4 > [000223] user= 0.00% sys= 0.00% alloc= 0b/s - taskCleanup > [000224] user= 0.00% sys= 0.00% alloc= 0b/s - Thread-69 > [000225] user= 0.00% sys= 0.00% alloc= 0b/s - Thread-70 > [000227] user= 0.00% sys= 0.00% alloc= 0b/s - pool-19-thread-1 > [000254] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-9 > [000255] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-11 > [000256] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-10 > [000269] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-13 > [000270] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-12 > [000272] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-14 > [000273] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-15 > [000274] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-18 > [000275] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-19 > [000276] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-20 > [000277] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-17 > [000278] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-16 > [000279] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-22 > [000280] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-21 > [000281] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-23 > [000282] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-24 > [000283] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-25 > [000284] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-26 > [000285] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-27 > [000286] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-28 > [000287] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-29 > [000288] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-30 > [000289] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-31 > [000290] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-32 > [000296] user= 0.00% sys= 0.00% alloc= 1970b/s - pool-2-thread-1 > [000297] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-33 > [000298] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-34 > [000302] user= 0.00% sys= 0.01% alloc= 1576b/s - > MessagingService-Incoming-/10.200.182.145 > [000303] user= 0.00% sys= 0.00% alloc= 451b/s - > MessagingService-Outgoing-/10.200.182.145 > [000304] user= 0.00% sys= 0.00% alloc= 0b/s - > MessagingService-Outgoing-/10.200.182.145 > [000305] user= 0.00% sys= 0.01% alloc= 206b/s - > MessagingService-Outgoing-/10.200.182.145 > [000308] user= 0.00% sys= 0.00% alloc= 424b/s - > MessagingService-Incoming-/10.200.182.145 > [000314] user= 0.00% sys= 0.00% alloc= 0b/s - > StreamingTransferTaskTimeouts:1 > [000324] user= 0.00% sys= 0.00% alloc= 0b/s - > MessagingService-Outgoing-/10.200.182.146 > [000325] user= 0.00% sys= 0.00% alloc= 0b/s - > MessagingService-Outgoing-/10.200.182.146 > [000326] user= 0.00% sys= 0.00% alloc= 0b/s - > MessagingService-Outgoing-/10.200.182.146 > [000328] user= 0.00% sys= 0.00% alloc= 0b/s - > MessagingService-Incoming-/10.200.182.146 > [000329] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-35 > [000330] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-37 > [000331] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-36 > [000332] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-39 > [000333] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-38 > [000334] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-42 > [000335] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-41 > [000336] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-40 > [000337] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-46 > [000338] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-44 > [000339] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-45 > [000340] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-43 > [000341] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-47 > [000342] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-48 > [000343] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-50 > [000344] user= 0.00% sys= 0.00% alloc= 0b/s - SharedPool-Worker-49 > [000375] user= 0.00% sys= 0.00% alloc= 0b/s - > StreamConnectionEstablisher:1 > [000376] user= 0.00% sys= 0.00% alloc= 0b/s - > StreamConnectionEstablisher:2 > [000406] user= 0.00% sys= 0.00% alloc= 0b/s - > MessagingService-Incoming-/10.200.182.145 > [000408] user= 0.00% sys= 0.00% alloc= 0b/s - > MessagingService-Incoming-/10.200.182.146 > [000409] user= 0.00% sys= 0.00% alloc= 0b/s - > MessagingService-Incoming-/10.200.182.144 > [000415] user= 0.00% sys= 0.00% alloc= 0b/s - > StreamConnectionEstablisher:3 > [000418] user= 0.00% sys= 0.00% alloc= 0b/s - > StreamConnectionEstablisher:4 > [000435] user= 0.00% sys= 0.00% alloc= 0b/s - > StreamConnectionEstablisher:5 > [000438] user= 0.00% sys= 0.00% alloc= 0b/s - > StreamConnectionEstablisher:6 > [000439] user= 0.00% sys= 0.00% alloc= 0b/s - > StreamConnectionEstablisher:7 > [000444] user= 0.00% sys= 0.00% alloc= 0b/s - > StreamConnectionEstablisher:8 > [000687] user= 0.00% sys= 0.00% alloc= 0b/s - JMX server connection > timeout 687 > [000688] user= 2.44% sys= 0.16% alloc= 1380kb/s - RMI TCP > Connection(401)-10.200.182.146 > [000694] user= 0.00% sys= 0.00% alloc= 0b/s - Attach Listener > [000726] user= 0.00% sys= 0.00% alloc= 0b/s - RMI TCP > Connection(400)-10.200.182.146 > [000743] user=-0.00% sys=-0.16% alloc=-109800b/s - MemtableFlushWriter:112 > [000745] user= 0.00% sys= 0.00% alloc= 0b/s - MemtableFlushWriter:113 > [000746] user= 0.00% sys= 0.03% alloc= 4295b/s - JMX server connection > timeout 746 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)