[
https://issues.apache.org/jira/browse/CASSANDRA-20147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906197#comment-17906197
]
Benedict Elliott Smith edited comment on CASSANDRA-20147 at 12/16/24 9:07 PM:
------------------------------------------------------------------------------
Thanks for the bug report. I think it's possible the problem is slightly
different. It looks like we may be overflowing the maximum number of
representable Semaphore permits (i.e. we have around 2B permits). I will be
honest, I am surprised that these are the semantics of a Semaphore. I would
expect you could specify a maximum number of permits and have them saturate. It
is also surprising that I've never encountered this edge case before.
It's worth noting this bug appears to have existed since 2013, except for a
brief period seemingly 2018 to 2021. It's also possible this was a bad port
from an older branch where the change to `LockSupport` wasn't made. It might be
better to either revert to LockSupport, or a light abstraction around it (i.e.
introduce an interface with acquire/release representing a Semaphore with a
single saturating permit, and default to LockSupport)
was (Author: benedict):
Thanks for the bug report. I think it's possible the problem is slightly
different. It looks like we may be overflowing the maximum number of
representable Semaphore permits (i.e. we have around 2B permits). I will be
honest, I am surprised that these are the semantics of a Semaphore. I would
expect you could specify a maximum number of permits and have them saturate. It
is also surprising that I've never encountered this edge case before.
It's worth noting this bug appears to have existed since 2013, except for a
brief period seemingly 2018 to 2021. It's also possible this was a bad port
from an older branch where the change to `LockSupport` wasn't made. It might be
better to either revert to LockSupport, or a light abstraction around it (i.e.
introduce an interface with acquire/release representing a Semaphore with a
single saturating permit, and default to LockSupport)
The suggestion to drain seems likely to solve this problem in all practical
cases, though it would seem more sensible to drain on the commit log thread
after it returns from `haveWork.acquire`, but it might be nice to move back to
using a saturating semaphore with semantics equivalent to
{{LockSupport.park/unpark}}.
> Race in BatchCommitLogService in 4.1+
> -------------------------------------
>
> Key: CASSANDRA-20147
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20147
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Local/Commit Log
> Reporter: Elliott Sims
> Priority: Normal
> Fix For: 4.1.x
>
>
> Saw this crash in production on multiple hosts:
> {code:java}
> ERROR [MutationStage-165] 2024-12-16 04:36:35,914
> JVMStabilityInspector.java:68 - Exception in thread
> Thread[MutationStage-165,5,SharedPool]
> java.lang.RuntimeException: java.lang.Error: Maximum permit count exceeded
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
> at
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
> at
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.Error: Maximum permit count exceeded
> at
> java.base/java.util.concurrent.Semaphore$Sync.tryReleaseShared(Semaphore.java:198)
> at
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1382)
> at
> java.base/java.util.concurrent.Semaphore.release(Semaphore.java:619)
> at
> org.apache.cassandra.db.commitlog.AbstractCommitLogService.requestExtraSync(AbstractCommitLogService.java:297)
> at
> org.apache.cassandra.db.commitlog.BatchCommitLogService.maybeWaitForSync(BatchCommitLogService.java:40)
> at
> org.apache.cassandra.db.commitlog.AbstractCommitLogService.finishWriteFor(AbstractCommitLogService.java:284)
> at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:330)
> at
> org.apache.cassandra.db.CassandraKeyspaceWriteHandler.addToCommitLog(CassandraKeyspaceWriteHandler.java:100)
> at
> org.apache.cassandra.db.CassandraKeyspaceWriteHandler.beginWrite(CassandraKeyspaceWriteHandler.java:54)
> at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:641)
> at org.apache.cassandra.db.Keyspace.applyFuture(Keyspace.java:489)
> at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:223)
> at
> org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:63)
> at
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
> ... 6 common frames omitted {code}
> Digging through, it looks like in 4.1.0 {{AbstractCommitLogService}} was
> reworked to use Semaphore instead of LockSupport. From a skim of the docs,
> {{LockSupport.unpark()}} is safe to run even if the permit's already
> available where {{Semaphore.release()}} is not.
> I think that means that if more writes arrive during the right part of a
> flush, it double-decrements `haveWork`. Once it's decremented and starts
> throwing, I think it never successfully makes it to another acquire/try
> The least-invasive fix is probably to call `haveWork.drainPermits()` in
> `requestExtraSync()` right before `release(1)` to ensure that
> `havework.permits` is 0 right before we increment it. That may technically
> still leave a race, but an almost-impossibly small one rather than a
> nontrivial one. It might also be possible to catch that specific Exception
> and run acquire() or (probably better) drainPermits() and recover. That does
> feel like a weird thing to do for a semaphore exception, but should be OK in
> this case since it's just being used to sleep the worker loop instead of
> actual concurrency control.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]