[ https://issues.apache.org/jira/browse/CASSANDRA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeremy Hanna updated CASSANDRA-12689: ------------------------------------- Component/s: Materialized Views > All MutationStage threads blocked, kills server > ----------------------------------------------- > > Key: CASSANDRA-12689 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12689 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths, Materialized Views > Reporter: Benjamin Roth > Assignee: Benjamin Roth > Priority: Critical > Fix For: 3.0.10, 3.10 > > > Under heavy load (e.g. due to repair during normal operations), a lot of > NullPointerExceptions occur in MutationStage. Unfortunately, the log is not > very chatty, trace is missing: > {noformat} > 2016-09-22T06:29:47+00:00 cas6 [MutationStage-1] > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService Uncaught > exception on thread Thread[MutationStage-1,5,main]: {} > 2016-09-22T06:29:47+00:00 cas6 #011java.lang.NullPointerException: null > {noformat} > Then, after some time, in most cases ALL threads in MutationStage pools are > completely blocked. This leads to piling up pending tasks until server runs > OOM and is completely unresponsive due to GC. Threads will NEVER unblock > until server restart. Even if load goes completely down, all hints are > paused, and no compaction or repair is running. Only restart helps. > I can understand that pending tasks in MutationStage may pile up under heavy > load, but tasks should be processed and dequeud after load goes down. This is > definitively not the case. This looks more like a an unhandled exception > leading to a stuck lock. > Stack trace from jconsole, all Threads in MutationStage show same trace. > {noformat} > Name: MutationStage-48 > State: WAITING on java.util.concurrent.CompletableFuture$Signaller@fcc8266 > Total blocked: 137 Total waited: 138.513 > {noformat} > Stack trace: > {noformat} > sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137) > org.apache.cassandra.db.Mutation.apply(Mutation.java:227) > org.apache.cassandra.db.Mutation.apply(Mutation.java:241) > org.apache.cassandra.hints.Hint.apply(Hint.java:96) > org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:91) > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) > org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) > java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org