[jira] [Commented] (CASSANDRA-17524) Schema mutations may not be completed on drain

Jon Meredith (Jira) Wed, 06 Apr 2022 15:48:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-17524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518502#comment-17518502
 ]


Jon Meredith commented on CASSANDRA-17524:
------------------------------------------

While working on the patch, I found that the in-jvm dtests were not 
intercepting the {{MigrationCoordinator}} uptime function like the 
{{MigrationManager}}. I've fixed and added a check to make sure that the 
{{GossiperHelper}} successfully makes sure the schema arrives on the target 
instance.

> Schema mutations may not be completed on drain
> ----------------------------------------------
>
>                 Key: CASSANDRA-17524
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17524
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Startup and Shutdown
>            Reporter: Jon Meredith
>            Assignee: Jon Meredith
>            Priority: Normal
>             Fix For: 4.1, 3.0.x, 3.11.x, 4.0.x
>
>
> The drain logic (invoked explicitly with nodetool or from the JVM
> shutdown hook) closes down executor stages that can create mutations (counter,
> view, mutation) before closing down the commitlog. The gossip
> stage also commits schema mutations, and should be treated the same way.
> The messaging service is shut down as part of drain, so there should be
> no new Gossip messages received, however any messages still queued
> in the executor could still run after the commitlog allocator is shut down as
> part of drain, causing the gossip stage thread to hang indefinitely waiting
> for a new segment that never arrives.
> Here is an example from an in-JVM dtest, showing an update to the peers table 
> as it shuts down.
> {code:java}
> park:-1, Unsafe (jdk.internal.misc)
> park:323, LockSupport (java.util.concurrent.locks)
> await:289, WaitQueue$Standard$AbstractSignal 
> (org.apache.cassandra.utils.concurrent)
> await:282, WaitQueue$Standard$AbstractSignal 
> (org.apache.cassandra.utils.concurrent)
> awaitUninterruptibly:186, Awaitable$Defaults 
> (org.apache.cassandra.utils.concurrent)
> awaitUninterruptibly:259, Awaitable$AbstractAwaitable 
> (org.apache.cassandra.utils.concurrent)
> awaitAvailableSegment:283, AbstractCommitLogSegmentManager 
> (org.apache.cassandra.db.commitlog)
> advanceAllocatingFrom:257, AbstractCommitLogSegmentManager 
> (org.apache.cassandra.db.commitlog)
> allocate:55, CommitLogSegmentManagerStandard 
> (org.apache.cassandra.db.commitlog)
> add:282, CommitLog (org.apache.cassandra.db.commitlog)
> beginWrite:50, CassandraKeyspaceWriteHandler (org.apache.cassandra.db)
> applyInternal:622, Keyspace (org.apache.cassandra.db)
> apply:506, Keyspace (org.apache.cassandra.db)
> apply:215, Mutation (org.apache.cassandra.db)
> apply:220, Mutation (org.apache.cassandra.db)
> apply:229, Mutation (org.apache.cassandra.db)
> executeInternalWithoutCondition:644, ModificationStatement 
> (org.apache.cassandra.cql3.statements)
> executeLocally:635, ModificationStatement 
> (org.apache.cassandra.cql3.statements)
> executeInternal:431, QueryProcessor (org.apache.cassandra.cql3)
> updateTokens:804, SystemKeyspace (org.apache.cassandra.db)
> updateTokenMetadata:2941, StorageService (org.apache.cassandra.service)
> handleStateNormal:3057, StorageService (org.apache.cassandra.service)
> onChange:2498, StorageService (org.apache.cassandra.service)
> markAsShutdown:607, Gossiper (org.apache.cassandra.gms)
> doVerb:39, GossipShutdownVerbHandler (org.apache.cassandra.gms)
> lambda$new$0:78, InboundSink (org.apache.cassandra.net)
> accept:-1, 581110313 (org.apache.cassandra.net.InboundSink$$Lambda$2638)
> accept:64, InboundSink$Filtered (org.apache.cassandra.net)
> accept:50, InboundSink$Filtered (org.apache.cassandra.net)
> accept:97, InboundSink (org.apache.cassandra.net)
> accept:45, InboundSink (org.apache.cassandra.net)
> run:433, InboundMessageHandler$ProcessMessage (org.apache.cassandra.net)
> run:124, ExecutionFailure$1 (org.apache.cassandra.concurrent)
> runWorker:1128, ThreadPoolExecutor (java.util.concurrent)
> run:628, ThreadPoolExecutor$Worker (java.util.concurrent)
> run:30, FastThreadLocalRunnable (io.netty.util.concurrent)
> run:829, Thread (java.lang)
> {code}
> This causes an exception during shutdown for the in-JVM dtest as it is
> unable to shutdown {{{}Stage.GOSSIP{}}}, but does not prevent regular
> shutdown for Cassandra as the executors are not stopped. The schema update
> would be lost, despite requesting a graceful shutdown.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17524) Schema mutations may not be completed on drain

Reply via email to