[jira] [Comment Edited] (CASSANDRA-20147) Race in BatchCommitLogService in 4.1+

Benedict Elliott Smith (Jira) Mon, 16 Dec 2024 13:08:10 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906197#comment-17906197
 ]


Benedict Elliott Smith edited comment on CASSANDRA-20147 at 12/16/24 9:07 PM:
------------------------------------------------------------------------------

Thanks for the bug report. I think it's possible the problem is slightly 
different. It looks like we may be overflowing the maximum number of 
representable Semaphore permits (i.e. we have around 2B permits). I will be 
honest, I am surprised that these are the semantics of a Semaphore. I would 
expect you could specify a maximum number of permits and have them saturate. It 
is also surprising that I've never encountered this edge case before.

It's worth noting this bug appears to have existed since 2013, except for a 
brief period seemingly 2018 to 2021. It's also possible this was a bad port 
from an older branch where the change to `LockSupport` wasn't made. It might be 
better to either revert to LockSupport, or a light abstraction around it (i.e. 
introduce an interface with acquire/release representing a Semaphore with a 
single saturating permit, and default to LockSupport)





was (Author: benedict):
Thanks for the bug report. I think it's possible the problem is slightly 
different. It looks like we may be overflowing the maximum number of 
representable Semaphore permits (i.e. we have around 2B permits). I will be 
honest, I am surprised that these are the semantics of a Semaphore. I would 
expect you could specify a maximum number of permits and have them saturate. It 
is also surprising that I've never encountered this edge case before.

It's worth noting this bug appears to have existed since 2013, except for a 
brief period seemingly 2018 to 2021. It's also possible this was a bad port 
from an older branch where the change to `LockSupport` wasn't made. It might be 
better to either revert to LockSupport, or a light abstraction around it (i.e. 
introduce an interface with acquire/release representing a Semaphore with a 
single saturating permit, and default to LockSupport)

The suggestion to drain seems likely to solve this problem in all practical 
cases, though it would seem more sensible to drain on the commit log thread 
after it returns from `haveWork.acquire`, but it might be nice to move back to 
using a saturating semaphore with semantics equivalent to 
{{LockSupport.park/unpark}}.



> Race in BatchCommitLogService in 4.1+
> -------------------------------------
>
>                 Key: CASSANDRA-20147
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20147
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log
>            Reporter: Elliott Sims
>            Priority: Normal
>             Fix For: 4.1.x
>
>
> Saw this crash in production on multiple hosts:
> {code:java}
> ERROR [MutationStage-165] 2024-12-16 04:36:35,914 
> JVMStabilityInspector.java:68 - Exception in thread 
> Thread[MutationStage-165,5,SharedPool]
> java.lang.RuntimeException: java.lang.Error: Maximum permit count exceeded
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
>         at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
>         at 
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.Error: Maximum permit count exceeded
>         at 
> java.base/java.util.concurrent.Semaphore$Sync.tryReleaseShared(Semaphore.java:198)
>         at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1382)
>         at 
> java.base/java.util.concurrent.Semaphore.release(Semaphore.java:619)
>         at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService.requestExtraSync(AbstractCommitLogService.java:297)
>         at 
> org.apache.cassandra.db.commitlog.BatchCommitLogService.maybeWaitForSync(BatchCommitLogService.java:40)
>         at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService.finishWriteFor(AbstractCommitLogService.java:284)
>         at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:330)
>         at 
> org.apache.cassandra.db.CassandraKeyspaceWriteHandler.addToCommitLog(CassandraKeyspaceWriteHandler.java:100)
>         at 
> org.apache.cassandra.db.CassandraKeyspaceWriteHandler.beginWrite(CassandraKeyspaceWriteHandler.java:54)
>         at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:641)
>         at org.apache.cassandra.db.Keyspace.applyFuture(Keyspace.java:489)
>         at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:223)
>         at 
> org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:63)
>         at 
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
>         ... 6 common frames omitted {code}
> Digging through, it looks like in 4.1.0 {{AbstractCommitLogService}} was 
> reworked to use Semaphore instead of LockSupport.  From a skim of the docs, 
> {{LockSupport.unpark()}} is safe to run even if the permit's already 
> available where {{Semaphore.release()}} is not.
> I think that means that if more writes arrive during the right part of a 
> flush, it double-decrements `haveWork`.  Once it's decremented and starts 
> throwing, I think it never successfully makes it to another acquire/try
> The least-invasive fix is probably to call `haveWork.drainPermits()` in 
> `requestExtraSync()` right before `release(1)` to ensure that 
> `havework.permits` is 0 right before we increment it.  That may technically 
> still leave a race, but an almost-impossibly small one rather than a 
> nontrivial one.  It might also be possible to catch that specific Exception 
> and run acquire() or (probably better) drainPermits() and recover.  That does 
> feel like a weird thing to do for a semaphore exception, but should be OK in 
> this case since it's just being used to sleep the worker loop instead of 
> actual concurrency control.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-20147) Race in BatchCommitLogService in 4.1+

Reply via email to