[ 
https://issues.apache.org/jira/browse/CASSANDRA-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679122#comment-17679122
 ] 

Andres de la Peña commented on CASSANDRA-17507:
-----------------------------------------------

Note that the mistake in the serialisation of protocol v3 that was introduced 
with 4.0.0 has effectively meant that we have two versions of protocol v3: the 
one used by 3.0/3.x, and the one used by 4.0/4.1/4.x.

The proposed fix will make all new 4.0/4.1 minors use the same version of 
protocol v3 that 3.0/3.x have always used.

However, we will hit the same problem in a cluster with a node using an 
unpatched 4.0/4.1/4.x node and a patched 4.0/4.1/4.x node. In other words, we 
are trading upgrade issues on 3.0.x -> 4.0.7 by upgrade issues on 4.0.8 -> 
4.0.9, etc.

I'm not sure how we could know which version of v3 (broken or unbroken) we 
should use, if we can.

That said, the problem occurs only when using v3, and I'd say that the old v3 
protocol is much more likely to be used on a major [3.0 | 3.x]-> [4.0 | 4.x] 
upgrade than in a [4.0 | 4.1 | 4.x]-> [4.0 | 4.1 | 4.x] upgrade. If that 
assumption is true, we are improving things with this fix, and stopping the 
spread of the broken version of the v3 protocol.

What do you think? [~brandon.williams] any thoughts on this?

> IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling 
> upgrade
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17507
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17507
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Thomas Steinmaurer
>            Assignee: Andres de la Peña
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 4.x
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> In a 6 node 3.11.12 test cluster - freshly set up, thus no legacy SSTables 
> etc. - with ~ 1TB SSTables on disk per node, I have been running a rolling 
> upgrade to 4.0.3. On upgraded 4.0.3 nodes I then have seen the following 
> exception regularly, which disappeared once all 6 nodes have been on 4.0.3. 
> Is this known? Can this be ignored? As said, just a test drive, but not sure 
> if we want to have that in production, especially with a larger number of 
> nodes, where it could take some time, until all are upgraded. Thanks!
> {code}
> ERROR [Native-Transport-Requests-8] 2022-03-30 11:30:24,057 
> ErrorMessage.java:457 - Unexpected exception during request
> java.lang.IllegalArgumentException: newLimit > capacity: (290 > 15)
>         at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)
>         at java.base/java.nio.Buffer.limit(Buffer.java:346)
>         at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)
>         at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:262)
>         at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:107)
>         at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:39)
>         at 
> org.apache.cassandra.db.marshal.ValueAccessor.sliceWithShortLength(ValueAccessor.java:225)
>         at 
> org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:222)
>         at 
> org.apache.cassandra.service.pager.PagingState$RowMark.decodeClustering(PagingState.java:434)
>         at 
> org.apache.cassandra.service.pager.PagingState$RowMark.clustering(PagingState.java:388)
>         at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:88)
>         at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:32)
>         at 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:69)
>         at 
> org.apache.cassandra.service.pager.SinglePartitionPager.fetchPage(SinglePartitionPager.java:32)
>         at 
> org.apache.cassandra.cql3.statements.SelectStatement$Pager$NormalPager.fetchPage(SelectStatement.java:352)
>         at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:400)
>         at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:250)
>         at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:88)
>         at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:244)
>         at 
> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:723)
>         at 
> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:701)
>         at 
> org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:159)
>         at 
> org.apache.cassandra.transport.Message$Request.execute(Message.java:242)
>         at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:86)
>         at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:106)
>         at 
> org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:70)
>         at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>         at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to