[jira] [Comment Edited] (CASSANDRA-20147) Race in BatchCommitLogService in 4.1+

Elliott Sims (Jira) Mon, 16 Dec 2024 16:52:05 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906206#comment-17906206
 ]


Elliott Sims edited comment on CASSANDRA-20147 at 12/17/24 12:51 AM:
---------------------------------------------------------------------

Interesting that it was present in 2013.  We ran 2.2 and 3.0.x for quite a 
while in production.  It does look slightly different overall, so maybe it was 
a rarer race with how the flushing code was structured before.  Or possibly the 
combination of newer hardware and bottlenecks removed elsewhere in the 
application make it way more likely to happen.  We do have a fairly large 
number of mutation concurrency/threads (512 today, probably 128 or 256 
pre-3.11) combined with batch mode and bare-metal hardware.  

Maybe not the cleanest route, but I think just eating the exception in 
requestExtraSync() might be good enough:
{code:java}
     {
         // note: cannot simply invoke executor.interrupt() as some filesystems 
don't like it (jimfs, at least)
         syncRequested = true;
-        haveWork.release(1);
+        try {
+            haveWork.drainPermits();
+            haveWork.release(1);
+        }
+        except (IllegalArgumentException e) {
+        // This isn't a lock, it's waking up a thread that may already be 
awake, which is OK
+        }
     } {code}


was (Author: defenestrator):
Interesting that it was present in 2013.  We ran 2.2 and 3.0.x for quite a 
while in production.  It does look slightly different overall, so maybe it was 
a rarer race with how the flushing code was structured before.  Or possibly the 
combination of newer hardware and bottlenecks removed elsewhere in the 
application make it way more likely to happen.  We do have a fairly large 
number of mutation concurrency/threads (512 today, probably 128 or 256 
pre-3.11) combined with batch mode and bare-metal hardware.  

Maybe not the cleanest route, but I think just eating the exception in 
requestExtraSync() might be good enough:
{code:java}
     {
         // note: cannot simply invoke executor.interrupt() as some filesystems 
don't like it (jimfs, at least)
         syncRequested = true;
-        haveWork.release(1);
+        try {
+            haveWork.release(1);
+        }
+        except (IllegalArgumentException e) {
+        // This isn't a lock, it's waking up a thread that may already be 
awake, which is OK
+        }
     } {code}

> Race in BatchCommitLogService in 4.1+
> -------------------------------------
>
>                 Key: CASSANDRA-20147
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20147
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log
>            Reporter: Elliott Sims
>            Priority: Normal
>             Fix For: 4.1.x
>
>
> Saw this crash in production on multiple hosts:
> {code:java}
> ERROR [MutationStage-165] 2024-12-16 04:36:35,914 
> JVMStabilityInspector.java:68 - Exception in thread 
> Thread[MutationStage-165,5,SharedPool]
> java.lang.RuntimeException: java.lang.Error: Maximum permit count exceeded
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
>         at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
>         at 
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.Error: Maximum permit count exceeded
>         at 
> java.base/java.util.concurrent.Semaphore$Sync.tryReleaseShared(Semaphore.java:198)
>         at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1382)
>         at 
> java.base/java.util.concurrent.Semaphore.release(Semaphore.java:619)
>         at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService.requestExtraSync(AbstractCommitLogService.java:297)
>         at 
> org.apache.cassandra.db.commitlog.BatchCommitLogService.maybeWaitForSync(BatchCommitLogService.java:40)
>         at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService.finishWriteFor(AbstractCommitLogService.java:284)
>         at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:330)
>         at 
> org.apache.cassandra.db.CassandraKeyspaceWriteHandler.addToCommitLog(CassandraKeyspaceWriteHandler.java:100)
>         at 
> org.apache.cassandra.db.CassandraKeyspaceWriteHandler.beginWrite(CassandraKeyspaceWriteHandler.java:54)
>         at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:641)
>         at org.apache.cassandra.db.Keyspace.applyFuture(Keyspace.java:489)
>         at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:223)
>         at 
> org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:63)
>         at 
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
>         ... 6 common frames omitted {code}
> Digging through, it looks like in 4.1.0 {{AbstractCommitLogService}} was 
> reworked to use Semaphore instead of LockSupport.  From a skim of the docs, 
> {{LockSupport.unpark()}} is safe to run even if the permit's already 
> available where {{Semaphore.release()}} is not.
> I think that means that if more writes arrive during the right part of a 
> flush, it double-decrements `haveWork`.  Once it's decremented and starts 
> throwing, I think it never successfully makes it to another acquire/try
> The least-invasive fix is probably to call `haveWork.drainPermits()` in 
> `requestExtraSync()` right before `release(1)` to ensure that 
> `havework.permits` is 0 right before we increment it.  That may technically 
> still leave a race, but an almost-impossibly small one rather than a 
> nontrivial one.  It might also be possible to catch that specific Exception 
> and run acquire() or (probably better) drainPermits() and recover.  That does 
> feel like a weird thing to do for a semaphore exception, but should be OK in 
> this case since it's just being used to sleep the worker loop instead of 
> actual concurrency control.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-20147) Race in BatchCommitLogService in 4.1+

Reply via email to