[ 
https://issues.apache.org/jira/browse/CASSANDRA-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538545#comment-17538545
 ] 

Jon Meredith commented on CASSANDRA-17601:
------------------------------------------

[~blerer] have you had a chance to take a look at the previous proposal?

I've pushed a new branch with a test and cleaned up fix that covers the current 
and other mixed-version cluster {{ColumnFilter}} issues. The goal is to prove 
that queries can be executed returning correct results without exceptions. 
Benjamin is correct that this fix causes a regression with digest mismatches 
due to Wildcard/SelectionColumnFilters while the cluster is in mixed mode, 
however it resolves immediately on upgrade completing, rather than being left 
in a precomputed {{{}ColumnFilterFactory{}}}.

Branch https://github.com/jonmeredith/cassandra/tree/C17601-4.0

PR [https://github.com/apache/cassandra/pull/1637]

The way that the current build system and CI infrastructure handles in-JVM 
upgrade dtests means the test requires manual work to run at the moment. Dtests 
jars are only built for the HEAD of each supported release branch, not for the 
most recently released version. Dtests are not run against the previous minor 
release on the branch as the dtest jar is overwritten.

To test, I temporarily bumped the release version on this branch to 4.0.6 and 
build dtest jars for the 3.0, 3.11 and 4.0 branches. I will see if I can 
improve how the upgrade tests run tomorrow.

I have not attempted to address any issues of schema disagreement. That seems 
best done in a more comprehensive fix that can handle all of the edge cases.

My expectation an application that needs to evolve it's schema should access 
columns by their names rather than wildcards, perform the schema evolution with 
the old names in place, then once live on all nodes, can start using the new 
columns. Schema disagreements could cause exceptions and I do not think this 
patch changes that behavior.

I don't have a good alternative solution to resolve the digest mismatch without 
significant refactoring of query preparation and possibly execution. At the 
time the query is prepared, and the {{ColumnFilter}} is first built, the 
{{ReplicaPlan}} is not known, so the version running on the endpoints cannot be 
known to decide which backward compatible column filter to use. The prepared 
statement cache could be cleared on {{upgradeFromVersion}} change, or the 
{{ColumnFilterFactory}} could rebuild each time until the cluster stabilizes 
(assuming a host is never downgraded/or older host joined).

> IllegalStateException with prepared queries selecting static columns in mixed 
> 3.0.x/4.x clusters
> ------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17601
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17601
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip, Consistency/Coordination
>            Reporter: Jon Meredith
>            Assignee: Jon Meredith
>            Priority: Normal
>             Fix For: 4.0.x, 4.1-beta, 4.x
>
>
> Clusters that contain prepared statements that partially select static 
> columns before the upgrade will fail to execute those statements coordinated 
> from the 4.x nodes until the upgrade completes.
> h2. Reproduction
> Setup (before upgrade)
> {code:java}
> CREATE KEYSPACE ks1 WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor':3}
> CREATE TABLE ks1.tbl1 (pk1 int,
> ck2 int,
> s3 int static,
> s4 int static,
> c5 int,
> PRIMARY KEY (pk1, ck2));
> INSERT INTO ks1.tbl1 (pk1, ck2, s3, s4, c5) VALUES (1, 2, 3, 4, 5);
> {code}
> Prepared Statement (prepare before upgrade)
> {code:java}
> SELECT c5, s3 FROM ks1.tbl1 WHERE pk1 = ? AND ck2 = ?;
> {code}
> Exception on 3.0.x nodes (when executing prepared statement after upgrade)
> {code:java}
> java.lang.IllegalStateException: [s3, s4] is not a subset of [s3] at 
> org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:566)
> at 
> org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:498) 
> at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serializeRowBody(UnfilteredSerializer.java:235)
> at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:209)
> at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:141)
> at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:129)
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:140)
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:95)
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:80)
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:308)
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:191)
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:181)
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:177)
> at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
> at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:335)
> at 
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:91)
> at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:433)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)
> {code}
> Exception on 4.0.x nodes (when executing prepared statement after upgrade)
> {code:java}
> java.lang.IllegalStateException: [ColumnDefinition{name=s3, 
> type=org.apache.cassandra.db.marshal.IntType, kind=STATIC, position=-1},
> ColumnDefinition{name=s4, type=org.apache.cassandra.db.marshal.IntType, 
> kind=STATIC, position=-1}] is not a subset of [s3]
> at org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:555)
> at 
> org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:487)
> at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serializeRowBody(UnfilteredSerializer.java:216)
> at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:190)
> at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:121)
> at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:109)
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:140)
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:94)
> at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79)
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:326)
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:186)
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:179)
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:175)
> at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:75)
> at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:499)
> at 
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:91)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.runUnsafe(AbstractLocalAwareExecutorService.java:194)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.runUnsafe(AbstractLocalAwareExecutorService.java:137)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:167)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:122) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> The root cause is CASSANDRA-16686 changes ColumnFilters to build and 
> deserialize based on what versions the coordinating node thinks are running 
> in the cluster, and that
> knowledge is always incorrect when statements are reprepared on startup and 
> may be incorrect as all nodes reach their final version.
> h2. Sequence of events:
> Prepared statements are persisted in {{system.prepared_statements}} to be 
> re-prepared on future startup.
> When the 4.x node starts up after upgrade, in 
> {{org.apache.cassandra.service.CassandraDaemon#setup}} it calls 
> {{QueryProcessor.instance.preloadPreparedStatements}} *before* the 
> {{Gossiper}} is started by a call to {{StorageService.instance.initServer()}} 
> later in {{{}setup{}}}.
> As part of preparing statements, when possible a {{ColumnFilterFactory}} is 
> created that returns a {{ColumnFilter}} built at the time the query is 
> prepared.
> After the changes from CASSANDRA-16686, the {{ColumnFilter}} builder 
> constructs different column filter variants depending on the lowest version 
> reported in gossip by checking 
> {{{}org.apache.cassandra.gms.Gossiper#upgradeFromVersionMemoized{}}}. If this 
> runs before the Gossiper is enabled the 
> {{{}SystemKeyspace.CURRENT_VERSION{}}}, causing the {{ColumnFilter}} to 
> create a column filter as if the cluster were fully upgraded.
> For the query above, the ColumnFilter creates an 
> ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS filter.
> The 3.0.x nodes participating do not understand the new flag and creates a 
> {{ColumnFilter}} the equivalent of a {{{}WildcardColumnFilter{}}}. The 4.x 
> nodes participating do understand the new flag, however the deserializer 
> takes the lower than 3.4 path as other 3.0 nodes are known about and creates 
> a {{{}WildcardColumFilter{}}}.
> The fetchedColumns sent by the ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS 
> filter only contains the queried static columns, however the pre-3.4 sstable 
> iterator returns all regular and static columns, causing an 
> IllegalStateException when the serialized response is sent back.
> The ISE clears once all nodes in the cluster think they are upgraded to the 
> current version and behave as the originally prepared query intended.
> h2. Related Problems
> _Non-deterministic behavior of 4.0.x/4.1.x nodes_
> If the prepared statements are cleared and/or freshly prepared when the 
> cluster is in mixed 3.0/4.0 mode, the pre-built ColumnFilter will remain in 
> the mixed mode version until re-prepared on a restart or cache clear/eviction.
> As upgradeFromVersionMemoized times out and is recalculated after the upgrade 
> reaches a single version, individual nodes will make a local decision on 
> column filter building and deserializing.
> Nodes that update upgradeFromVersionMemoized early that coordinate requests 
> may cause the same ISE against nodes responding to the read command have the 
> previous version still.
> _Digest Mismatches_
> If {{ALL_REGULARS_AND_QUERIED_STATICS_COLUMN}} {{ColumnFilter}} s are 
> incorrectly sent to 3.0.x nodes, the list of columns included will be ignored 
> and compute a different digest than one locally executed on a 4.0.x 
> coordinator.
> h1. Proposed fix
> In discussion with [~ifesdjeen], he suggested that the one way to resolve 
> this is the {{ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS}} filter should by 
> deprecated (or just removed) and no longer built, always selecting all static 
> columns
> This would just leave {{WildCardColumnFilter}} and {{SelectionColumnFilter}} 
> with {{ALL_COLUMNS}} or {{ONLY_QUERIED_COLUMNS}}.
> This is a potential performance regression for unusual schemas with very 
> large numbers of static columns, but seems unlikely in practice.
> /cc: [~blerer] 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to