This deserves a JIRA
On Tue, Dec 12, 2023 at 8:30 AM Sebastian Marsching <sebast...@marsching.com> wrote: > Hi, > > while upgrading our production cluster from C* 3.11.14 to 4.1.3, we > experienced the issue that some SELECT queries failed due to supposedly no > replica being available. The system logs on the C* nodes where full of > messages like the following one: > > ERROR [ReadStage-1] 2023-12-11 13:53:57,278 JVMStabilityInspector.java:68 > - Exception in thread Thread[ReadStage-1,5,SharedPool] > java.lang.IllegalStateException: [channel_data_id, control_system_type, > server_id, decimation_levels] is not a subset of [channel_data_id] > at > org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:593) > at > org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:523) > at > org.apache.cassandra.db.rows.UnfilteredSerializer.serializeRowBody(UnfilteredSerializer.java:231) > at > org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:205) > at > org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:137) > at > org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:125) > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:140) > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:95) > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:80) > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:308) > at > org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:201) > at > org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:186) > at > org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:182) > at > org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48) > at > org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:337) > at > org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:63) > at org.apache.cassandra.net > .InboundSink.lambda$new$0(InboundSink.java:78) > at org.apache.cassandra.net > .InboundSink.accept(InboundSink.java:97) > at org.apache.cassandra.net > .InboundSink.accept(InboundSink.java:45) > at org.apache.cassandra.net > .InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430) > at > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133) > at > org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:829) > > This problem only persisted while the cluster had a mix of 3.11.14 and > 4.1.3 nodes. As soon as the last node was updated, the problem disappeared > immediately, so I suspect that it was somehow caused by the unavoidable > schema inconsistency during the upgrade. > > I just wanted to give everyone who hasn’t upgraded yet a heads up, so that > they are aware that this problem might exist. Interestingly, it seems like > not all queries involving the affected table were affected by this problem. > As far as I am aware, no schema changes have ever been made to the affected > table, so I am pretty certain that the schema inconsistencies were purely > related to the upgrade process. > > We hadn’t noticed this problem when testing the upgrade on our test > cluster because there we first did the upgrade and then ran the test > workload. So, if you are worried you might be affected by this problem as > well, you might want to run your workload on the test cluster while having > mixed versions. > > I did not investigate the cause further because simply completing the > upgrade process seemed like the quickest option to get the cluster fully > operational again. > > Cheers, > Sebastian > >