[ 
https://issues.apache.org/jira/browse/CASSANDRA-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384645#comment-15384645
 ] 

Joshua McKenzie commented on CASSANDRA-12236:
---------------------------------------------

Two ways to approach this have come up in offline discussions. The first and 
less invasive method would be to suppress sending schema information about the 
cdc param status on versions >= 3.8 unless cdc_enabled:true is set in the 
cassandra.yaml. Specifically, removing the addition of the cdc param 
[here|https://github.com/apache/cassandra/blob/cassandra-3.8/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L514]
 and instead adding it conditionally 
[here|https://github.com/apache/cassandra/blob/cassandra-3.8/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L476].
 The upgrade path for users that want to enable cdc would be: don't enable CDC 
until your entire cluster is updated to a version >= 3.8. Then enable it and 
bounce your cluster. Since schema information is hard-coded in 
SchemaKeyspace.java and we don't actually care about the value in that param if 
cdc is not enabled on the cluster, it seems a reasonable workaround until we 
get to versioned sub-systems in Cassandra.

The second and more invasive method (at least from the perspective of the # of 
versions it touches and potential side-effects) would be to allow null columns 
during deserialization in 
[Columns.java|https://github.com/apache/cassandra/blob/cassandra-3.8/src/java/org/apache/cassandra/db/Columns.java#L433]
 if the mutation is for a schema table. This would apply to 3.0.x and 3.8+. 
This would get us back to a functionality somewhat similar to 
pre-CASSANDRA-8099, in that mutations for schema tables on different versions 
would no longer interrupt inter-node communication during an upgrade process 
via RTE.

I'm by no means an expert on schema dissemination - [~iamaleksey] / 
[~slebresne]: either of you have any feedback on the above two or other, better 
ideas on this front?

> RTE from new CDC column breaks in flight queries.
> -------------------------------------------------
>
>                 Key: CASSANDRA-12236
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12236
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeremiah Jordan
>            Priority: Blocker
>             Fix For: 3.8
>
>
> This RTE is not harmless. It will cause the internode connection to break 
> which will cause all in flight requests between these nodes to die/timeout.
> {noformat}
>     - Due to changes in schema migration handling and the storage format 
> after 3.0, you will
>       see error messages such as:
>          "java.lang.RuntimeException: Unknown column cdc during 
> deserialization"
>       in your system logs on a mixed-version cluster during upgrades. This 
> error message
>       is harmless and due to the 3.8 nodes having cdc added to their schema 
> tables while
>       the <3.8 nodes do not. This message should cease once all nodes are 
> upgraded to 3.8.
>       As always, refrain from schema changes during cluster upgrades.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to