[jira] [Comment Edited] (CASSANDRA-17507) IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling upgrade

2023-01-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679139#comment-17679139
 ] 

Andres de la Peña edited comment on CASSANDRA-17507 at 1/20/23 1:06 PM:


Oh, right, it's a typo, the last 4.0 affected version is 4.0.7. Just fixed it, 
thanks!


was (Author: adelapena):
Oh, right, it's a typo, the last 4.0 affected version is 4.0.7.

> IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling 
> upgrade
> ---
>
> Key: CASSANDRA-17507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Thomas Steinmaurer
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 4.x
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In a 6 node 3.11.12 test cluster - freshly set up, thus no legacy SSTables 
> etc. - with ~ 1TB SSTables on disk per node, I have been running a rolling 
> upgrade to 4.0.3. On upgraded 4.0.3 nodes I then have seen the following 
> exception regularly, which disappeared once all 6 nodes have been on 4.0.3. 
> Is this known? Can this be ignored? As said, just a test drive, but not sure 
> if we want to have that in production, especially with a larger number of 
> nodes, where it could take some time, until all are upgraded. Thanks!
> {code}
> ERROR [Native-Transport-Requests-8] 2022-03-30 11:30:24,057 
> ErrorMessage.java:457 - Unexpected exception during request
> java.lang.IllegalArgumentException: newLimit > capacity: (290 > 15)
> at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)
> at java.base/java.nio.Buffer.limit(Buffer.java:346)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:262)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:107)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:39)
> at 
> org.apache.cassandra.db.marshal.ValueAccessor.sliceWithShortLength(ValueAccessor.java:225)
> at 
> org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:222)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.decodeClustering(PagingState.java:434)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.clustering(PagingState.java:388)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:88)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:32)
> at 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:69)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.fetchPage(SinglePartitionPager.java:32)
> at 
> org.apache.cassandra.cql3.statements.SelectStatement$Pager$NormalPager.fetchPage(SelectStatement.java:352)
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:400)
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:250)
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:88)
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:244)
> at 
> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:723)
> at 
> org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:701)
> at 
> org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:159)
> at 
> org.apache.cassandra.transport.Message$Request.execute(Message.java:242)
> at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:86)
> at 
> org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:106)
> at 
> org.apache.cassandra.transport.Dispatcher.lambda$dispatch$0(Dispatcher.java:70)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---

[jira] [Comment Edited] (CASSANDRA-17507) IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling upgrade

2023-01-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679122#comment-17679122
 ] 

Andres de la Peña edited comment on CASSANDRA-17507 at 1/20/23 12:02 PM:
-

Note that the mistake in the serialisation of protocol v3 that was introduced 
with 4.0.0 has effectively meant that we have two versions of protocol v3: the 
one used by 3.0/3.x, and the one used by 4.0/4.1/4.x.

The proposed fix will make all new 4.0/4.1 minors use the same version of 
protocol v3 that 3.0/3.x have always used.

However, we will hit the same problem in a cluster with a node using an 
unpatched 4.0/4.1/4.x node and a patched 4.0/4.1/4.x node. In other words, we 
are trading upgrade issues on 3.0.x -> 4.0.7 by upgrade issues on 4.0.8 -> 
4.0.9, etc.

I'm not sure how we could know which version of v3 (broken or unbroken) we 
should use, if we can.

That said, the problem occurs only when using v3, and I'd say that the old v3 
protocol is much more likely to be used on a major 3.0/3.x- > 4.0/4.1 upgrade 
than in a 4.0/4.1 - > 4.0/4.1 upgrade. If that assumption is true, we are 
improving things with this fix, and stopping the spread of the broken version 
of the v3 protocol.

What do you think? [~brandon.williams] any thoughts on this?


was (Author: adelapena):
Note that the mistake in the serialisation of protocol v3 that was introduced 
with 4.0.0 has effectively meant that we have two versions of protocol v3: the 
one used by 3.0/3.x, and the one used by 4.0/4.1/4.x.

The proposed fix will make all new 4.0/4.1 minors use the same version of 
protocol v3 that 3.0/3.x have always used.

However, we will hit the same problem in a cluster with a node using an 
unpatched 4.0/4.1/4.x node and a patched 4.0/4.1/4.x node. In other words, we 
are trading upgrade issues on 3.0.x -> 4.0.7 by upgrade issues on 4.0.8 -> 
4.0.9, etc.

I'm not sure how we could know which version of v3 (broken or unbroken) we 
should use, if we can.

That said, the problem occurs only when using v3, and I'd say that the old v3 
protocol is much more likely to be used on a major [3.0 | 3.x] - > [4.0 | 4.x] 
upgrade than in a [4.0 | 4.1 | 4.x] - > [4.0 | 4.1 | 4.x] upgrade. If that 
assumption is true, we are improving things with this fix, and stopping the 
spread of the broken version of the v3 protocol.

What do you think? [~brandon.williams] any thoughts on this?

> IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling 
> upgrade
> ---
>
> Key: CASSANDRA-17507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Thomas Steinmaurer
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 4.x
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In a 6 node 3.11.12 test cluster - freshly set up, thus no legacy SSTables 
> etc. - with ~ 1TB SSTables on disk per node, I have been running a rolling 
> upgrade to 4.0.3. On upgraded 4.0.3 nodes I then have seen the following 
> exception regularly, which disappeared once all 6 nodes have been on 4.0.3. 
> Is this known? Can this be ignored? As said, just a test drive, but not sure 
> if we want to have that in production, especially with a larger number of 
> nodes, where it could take some time, until all are upgraded. Thanks!
> {code}
> ERROR [Native-Transport-Requests-8] 2022-03-30 11:30:24,057 
> ErrorMessage.java:457 - Unexpected exception during request
> java.lang.IllegalArgumentException: newLimit > capacity: (290 > 15)
> at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)
> at java.base/java.nio.Buffer.limit(Buffer.java:346)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:262)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:107)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:39)
> at 
> org.apache.cassandra.db.marshal.ValueAccessor.sliceWithShortLength(ValueAccessor.java:225)
> at 
> org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:222)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.decodeClustering(PagingState.java:434)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.clustering(PagingState.java:388)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:88)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SingleParti

[jira] [Comment Edited] (CASSANDRA-17507) IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling upgrade

2023-01-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679122#comment-17679122
 ] 

Andres de la Peña edited comment on CASSANDRA-17507 at 1/20/23 12:01 PM:
-

Note that the mistake in the serialisation of protocol v3 that was introduced 
with 4.0.0 has effectively meant that we have two versions of protocol v3: the 
one used by 3.0/3.x, and the one used by 4.0/4.1/4.x.

The proposed fix will make all new 4.0/4.1 minors use the same version of 
protocol v3 that 3.0/3.x have always used.

However, we will hit the same problem in a cluster with a node using an 
unpatched 4.0/4.1/4.x node and a patched 4.0/4.1/4.x node. In other words, we 
are trading upgrade issues on 3.0.x -> 4.0.7 by upgrade issues on 4.0.8 -> 
4.0.9, etc.

I'm not sure how we could know which version of v3 (broken or unbroken) we 
should use, if we can.

That said, the problem occurs only when using v3, and I'd say that the old v3 
protocol is much more likely to be used on a major [3.0 | 3.x] - > [4.0 | 4.x] 
upgrade than in a [4.0 | 4.1 | 4.x] - > [4.0 | 4.1 | 4.x] upgrade. If that 
assumption is true, we are improving things with this fix, and stopping the 
spread of the broken version of the v3 protocol.

What do you think? [~brandon.williams] any thoughts on this?


was (Author: adelapena):
Note that the mistake in the serialisation of protocol v3 that was introduced 
with 4.0.0 has effectively meant that we have two versions of protocol v3: the 
one used by 3.0/3.x, and the one used by 4.0/4.1/4.x.

The proposed fix will make all new 4.0/4.1 minors use the same version of 
protocol v3 that 3.0/3.x have always used.

However, we will hit the same problem in a cluster with a node using an 
unpatched 4.0/4.1/4.x node and a patched 4.0/4.1/4.x node. In other words, we 
are trading upgrade issues on 3.0.x -> 4.0.7 by upgrade issues on 4.0.8 -> 
4.0.9, etc.

I'm not sure how we could know which version of v3 (broken or unbroken) we 
should use, if we can.

That said, the problem occurs only when using v3, and I'd say that the old v3 
protocol is much more likely to be used on a major [3.0 | 3.x]-> [4.0 | 4.x] 
upgrade than in a [4.0 | 4.1 | 4.x]-> [4.0 | 4.1 | 4.x] upgrade. If that 
assumption is true, we are improving things with this fix, and stopping the 
spread of the broken version of the v3 protocol.

What do you think? [~brandon.williams] any thoughts on this?

> IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling 
> upgrade
> ---
>
> Key: CASSANDRA-17507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Thomas Steinmaurer
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 4.x
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In a 6 node 3.11.12 test cluster - freshly set up, thus no legacy SSTables 
> etc. - with ~ 1TB SSTables on disk per node, I have been running a rolling 
> upgrade to 4.0.3. On upgraded 4.0.3 nodes I then have seen the following 
> exception regularly, which disappeared once all 6 nodes have been on 4.0.3. 
> Is this known? Can this be ignored? As said, just a test drive, but not sure 
> if we want to have that in production, especially with a larger number of 
> nodes, where it could take some time, until all are upgraded. Thanks!
> {code}
> ERROR [Native-Transport-Requests-8] 2022-03-30 11:30:24,057 
> ErrorMessage.java:457 - Unexpected exception during request
> java.lang.IllegalArgumentException: newLimit > capacity: (290 > 15)
> at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)
> at java.base/java.nio.Buffer.limit(Buffer.java:346)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:262)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:107)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:39)
> at 
> org.apache.cassandra.db.marshal.ValueAccessor.sliceWithShortLength(ValueAccessor.java:225)
> at 
> org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:222)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.decodeClustering(PagingState.java:434)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.clustering(PagingState.java:388)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:88)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.next

[jira] [Comment Edited] (CASSANDRA-17507) IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling upgrade

2023-01-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675629#comment-17675629
 ] 

Andres de la Peña edited comment on CASSANDRA-17507 at 1/11/23 3:48 PM:


I can confirm that 4.1 and trunk are also affected. Here are the patches for 
all the branches:
||PR||CI||
|[4.0|https://github.com/apache/cassandra/pull/2082]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2543/workflows/cb16ec9d-8ec6-4914-a08a-92715bd15ff0]|
|[4.1|https://github.com/apache/cassandra/pull/2083]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2544/workflows/924c41ce-accb-44eb-be07-1fc678b1f4b2]|
|[trunk|https://github.com/apache/cassandra/pull/2084]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2542/workflows/16be8e22-b8d1-440f-93bd-a599b56e3093]|

I think that CI will fail for the new tests on the [4.0, 4.1] -> [4.1, trunk] 
upgrade paths. That's because our CI script [generates the dtest artifacts from 
the main 
repo|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml#L2684-L2693],
 and the branches there don't contain the serialization fix that we are 
proposing here.

Reviewers can test it locally by generating the dtests artifacts of each 
patched branch with {{{}ant dtest-jar{}}}, and copying all the generated 
{{dtest-*.jar}} files into the {{build}} directory of the tested branch.


was (Author: adelapena):
I can confirm that 4.1 and trunk are also affected. Here are the patches for 
all the branches:

||PR||CI||
|[4.0|https://github.com/apache/cassandra/pull/2082]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2543/workflows/cb16ec9d-8ec6-4914-a08a-92715bd15ff0]|
|[4.1|https://github.com/apache/cassandra/pull/2083]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2544/workflows/924c41ce-accb-44eb-be07-1fc678b1f4b2]|
|[trunk|https://github.com/apache/cassandra/pull/2084]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2542/workflows/16be8e22-b8d1-440f-93bd-a599b56e3093]|

I think that CI will fail for the new tests on the [4.0, 4.1] -> [4.1, trunk] 
upgrade paths. That's because our CI scripts generates the dtest artifacts from 
the main repo, and the branches there don't contain the serialization fix that 
we are proposing here.

Reviewers can test it locally by generating the dtests artifacts of each 
patched branch with {{{}ant dtest-jar{}}}, and copying all the generated 
{{dtest-*.jar}} files into the {{build}} directory of the tested branch.

> IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling 
> upgrade
> ---
>
> Key: CASSANDRA-17507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 4.0.x
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In a 6 node 3.11.12 test cluster - freshly set up, thus no legacy SSTables 
> etc. - with ~ 1TB SSTables on disk per node, I have been running a rolling 
> upgrade to 4.0.3. On upgraded 4.0.3 nodes I then have seen the following 
> exception regularly, which disappeared once all 6 nodes have been on 4.0.3. 
> Is this known? Can this be ignored? As said, just a test drive, but not sure 
> if we want to have that in production, especially with a larger number of 
> nodes, where it could take some time, until all are upgraded. Thanks!
> {code}
> ERROR [Native-Transport-Requests-8] 2022-03-30 11:30:24,057 
> ErrorMessage.java:457 - Unexpected exception during request
> java.lang.IllegalArgumentException: newLimit > capacity: (290 > 15)
> at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)
> at java.base/java.nio.Buffer.limit(Buffer.java:346)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:262)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:107)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:39)
> at 
> org.apache.cassandra.db.marshal.ValueAccessor.sliceWithShortLength(ValueAccessor.java:225)
> at 
> org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:222)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.decodeClustering(PagingState.java:434)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.clustering(PagingState.java:388)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:88)
>   

[jira] [Comment Edited] (CASSANDRA-17507) IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling upgrade

2023-01-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17675575#comment-17675575
 ] 

Andres de la Peña edited comment on CASSANDRA-17507 at 1/11/23 1:55 PM:


I think I have found the cause of the bug when using protocol v3.

Cassandra 3.0 and 3.x with protocol v3 and compact storage don't serialize 
single-column clusterings as single-element composites. Instead, single-column 
clusterings values are written as they are, as it can be seen 
[here|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/LegacyLayout.java#L477-L486].

However, Cassandra 4.0 always reads and writes single-column clusterings as 
composites. This can be seen 
[here|https://github.com/apache/cassandra/blob/cassandra-4.0.3/src/java/org/apache/cassandra/service/pager/PagingState.java#L434],
 exactly where the reported exception is thrown.

I think the solution is modifying the code to read legacy formats in Cassandra 
4.0 so it special cases single-column clusterings for compact storage:
||PR||CI||
|[4.0|https://github.com/apache/cassandra/pull/2082]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2540/workflows/22cfa989-31df-4fcc-a896-f46c8d77d364]|

If the approach looks good I'll prepare patches for 4.1 and trunk, that 
probably are also affected.


was (Author: adelapena):
I think I have found the cause of the bug when using protocol v3.

Cassandra 3.0 and 3.x with protocol v3 and compact storage doesn't serialize 
single-column clusterings as single-element composites. Instead, single-column 
clusterings values are written as they are, as it can be seen 
[here|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/LegacyLayout.java#L477-L486].
 However, Cassandra 4.0 always reads clusterings as composites. This can be 
seen 
[here|https://github.com/apache/cassandra/blob/cassandra-4.0.3/src/java/org/apache/cassandra/service/pager/PagingState.java#L434],
 exactly where the reported exception is thrown.

I think the solution is modifying the code to read legacy formats in Cassandra 
4.0 so it special cases single-column clusterings for compact storage:
||PR||CI||
|[4.0|https://github.com/apache/cassandra/pull/2082]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2540/workflows/22cfa989-31df-4fcc-a896-f46c8d77d364]|

If the approach looks good I'll prepare patches for 4.1 and trunk, that 
probably are also affected.

> IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling 
> upgrade
> ---
>
> Key: CASSANDRA-17507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 4.0.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In a 6 node 3.11.12 test cluster - freshly set up, thus no legacy SSTables 
> etc. - with ~ 1TB SSTables on disk per node, I have been running a rolling 
> upgrade to 4.0.3. On upgraded 4.0.3 nodes I then have seen the following 
> exception regularly, which disappeared once all 6 nodes have been on 4.0.3. 
> Is this known? Can this be ignored? As said, just a test drive, but not sure 
> if we want to have that in production, especially with a larger number of 
> nodes, where it could take some time, until all are upgraded. Thanks!
> {code}
> ERROR [Native-Transport-Requests-8] 2022-03-30 11:30:24,057 
> ErrorMessage.java:457 - Unexpected exception during request
> java.lang.IllegalArgumentException: newLimit > capacity: (290 > 15)
> at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)
> at java.base/java.nio.Buffer.limit(Buffer.java:346)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:262)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:107)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:39)
> at 
> org.apache.cassandra.db.marshal.ValueAccessor.sliceWithShortLength(ValueAccessor.java:225)
> at 
> org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:222)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.decodeClustering(PagingState.java:434)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.clustering(PagingState.java:388)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:88)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPage

[jira] [Comment Edited] (CASSANDRA-17507) IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling upgrade

2023-01-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656119#comment-17656119
 ] 

Andres de la Peña edited comment on CASSANDRA-17507 at 1/11/23 12:51 PM:
-

[This JVM 
dtest|https://github.com/apache/cassandra/compare/cassandra-4.0...adelapena:cassandra:17507-4.0-repro]
 reproduces the bug. It testes a 3.x -> 4.0 rolling upgrade scenario with a 
table with {{COMPACT STORAGE}} and a query over that uses paging. The bug only 
seems to manifest itself when the driver uses native protocol v3, instead on 
the default (v5 for 4.0 and v4 for 3.11).

The tests results can be found 
[here|https://app.circleci.com/pipelines/github/adelapena/cassandra/2536/workflows/5791569d-8ea1-42b5-bacd-bd8716afaee8/jobs/25163].
 The artifacts stored for each test contain an identical stack trace, for 
example [this 
one|https://output.circle-artifacts.com/output/job/f4cbecbc-92dd-49c8-a75d-a5a7b53bcd21/artifacts/0/stdout/fails/1/org.apache.cassandra.distributed.upgrade.CompactStoragePagingTest%23testPagingWithCompactStorageAndProtocolVersion.txt]

If this is actually caused by the combination of {{COMPACT STORAGE}}, paging 
and an old protocol version, probably the easiest workaround until we get a fix 
is setting the driver to use a more recent version of the native transport 
protocol.


was (Author: adelapena):
[This JVM 
dtest|https://github.com/apache/cassandra/compare/cassandra-4.0...adelapena:cassandra:17507-4.0]
 reproduces the bug. It testes a 3.x -> 4.0 rolling upgrade scenario with a 
table with {{COMPACT STORAGE}} and a query over that uses paging. The bug only 
seems to manifest itself when the driver uses native protocol v3, instead on 
the default (v5 for 4.0 and v4 for 3.11).

The tests results can be found 
[here|https://app.circleci.com/pipelines/github/adelapena/cassandra/2536/workflows/5791569d-8ea1-42b5-bacd-bd8716afaee8/jobs/25163].
 The artifacts stored for each test contain an identical stack trace, for 
example [this 
one|https://output.circle-artifacts.com/output/job/f4cbecbc-92dd-49c8-a75d-a5a7b53bcd21/artifacts/0/stdout/fails/1/org.apache.cassandra.distributed.upgrade.CompactStoragePagingTest%23testPagingWithCompactStorageAndProtocolVersion.txt]

If this is actually caused by the combination of {{COMPACT STORAGE}}, paging 
and an old protocol version, probably the easiest workaround until we get a fix 
is setting the driver to use a more recent version of the native transport 
protocol.

> IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling 
> upgrade
> ---
>
> Key: CASSANDRA-17507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 4.0.x
>
>
> In a 6 node 3.11.12 test cluster - freshly set up, thus no legacy SSTables 
> etc. - with ~ 1TB SSTables on disk per node, I have been running a rolling 
> upgrade to 4.0.3. On upgraded 4.0.3 nodes I then have seen the following 
> exception regularly, which disappeared once all 6 nodes have been on 4.0.3. 
> Is this known? Can this be ignored? As said, just a test drive, but not sure 
> if we want to have that in production, especially with a larger number of 
> nodes, where it could take some time, until all are upgraded. Thanks!
> {code}
> ERROR [Native-Transport-Requests-8] 2022-03-30 11:30:24,057 
> ErrorMessage.java:457 - Unexpected exception during request
> java.lang.IllegalArgumentException: newLimit > capacity: (290 > 15)
> at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)
> at java.base/java.nio.Buffer.limit(Buffer.java:346)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:262)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:107)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:39)
> at 
> org.apache.cassandra.db.marshal.ValueAccessor.sliceWithShortLength(ValueAccessor.java:225)
> at 
> org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:222)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.decodeClustering(PagingState.java:434)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.clustering(PagingState.java:388)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:88)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:32)
> 

[jira] [Comment Edited] (CASSANDRA-17507) IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling upgrade

2023-01-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656119#comment-17656119
 ] 

Andres de la Peña edited comment on CASSANDRA-17507 at 1/9/23 3:49 PM:
---

[This JVM 
dtest|https://github.com/apache/cassandra/compare/cassandra-4.0...adelapena:cassandra:17507-4.0]
 reproduces the bug. It testes a 3.x -> 4.0 rolling upgrade scenario with a 
table with {{COMPACT STORAGE}} and a query over that uses paging. The bug only 
seems to manifest itself when the driver uses native protocol v3, instead on 
the default (v5 for 4.0 and v4 for 3.11).

The tests results can be found 
[here|https://app.circleci.com/pipelines/github/adelapena/cassandra/2536/workflows/5791569d-8ea1-42b5-bacd-bd8716afaee8/jobs/25163].
 The artifacts stored for each test contain an identical stack trace, for 
example [this 
one|https://output.circle-artifacts.com/output/job/f4cbecbc-92dd-49c8-a75d-a5a7b53bcd21/artifacts/0/stdout/fails/1/org.apache.cassandra.distributed.upgrade.CompactStoragePagingTest%23testPagingWithCompactStorageAndProtocolVersion.txt]

If this is actually caused by the combination of {{COMPACT STORAGE}}, paging 
and an old protocol version, probably the easiest workaround until we get a fix 
is setting the driver to use a more recent version of the native transport 
protocol.


was (Author: adelapena):
[This JVM 
dtest|https://github.com/apache/cassandra/compare/cassandra-4.0...adelapena:cassandra:17507-4.0]
 reproduces the bug. It testes a 3.x -> 4.0 rolling upgrade scenario with a 
table with {{COMPACT STORAGE}} and a query over that uses paging. The bug only 
seems to manifest itself when the driver uses native protocol v3, instead on 
the default (v5 for 4.0 and v4 for 3.11).

The tests results can be found 
[here|https://app.circleci.com/pipelines/github/adelapena/cassandra/2536/workflows/5791569d-8ea1-42b5-bacd-bd8716afaee8/jobs/25163].
 The artifacts stored for each test contain an identical stacktrace, for 
example [this 
one|https://output.circle-artifacts.com/output/job/f4cbecbc-92dd-49c8-a75d-a5a7b53bcd21/artifacts/0/stdout/fails/1/org.apache.cassandra.distributed.upgrade.CompactStoragePagingTest%23testPagingWithCompactStorageAndProtocolVersion.txt]

If this is actually caused by the combination of {{{}COMPACT STORAGE{}}}, 
paging and and old protocol version, probably the easiest workaround until we 
get a fix is setting the driver to use a more recent version of the native 
transport protocol.

> IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling 
> upgrade
> ---
>
> Key: CASSANDRA-17507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 4.0.x
>
>
> In a 6 node 3.11.12 test cluster - freshly set up, thus no legacy SSTables 
> etc. - with ~ 1TB SSTables on disk per node, I have been running a rolling 
> upgrade to 4.0.3. On upgraded 4.0.3 nodes I then have seen the following 
> exception regularly, which disappeared once all 6 nodes have been on 4.0.3. 
> Is this known? Can this be ignored? As said, just a test drive, but not sure 
> if we want to have that in production, especially with a larger number of 
> nodes, where it could take some time, until all are upgraded. Thanks!
> {code}
> ERROR [Native-Transport-Requests-8] 2022-03-30 11:30:24,057 
> ErrorMessage.java:457 - Unexpected exception during request
> java.lang.IllegalArgumentException: newLimit > capacity: (290 > 15)
> at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)
> at java.base/java.nio.Buffer.limit(Buffer.java:346)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:262)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:107)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:39)
> at 
> org.apache.cassandra.db.marshal.ValueAccessor.sliceWithShortLength(ValueAccessor.java:225)
> at 
> org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:222)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.decodeClustering(PagingState.java:434)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.clustering(PagingState.java:388)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:88)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:32)
>   

[jira] [Comment Edited] (CASSANDRA-17507) IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling upgrade

2023-01-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656119#comment-17656119
 ] 

Andres de la Peña edited comment on CASSANDRA-17507 at 1/9/23 3:17 PM:
---

[This JVM 
dtest|https://github.com/apache/cassandra/compare/cassandra-4.0...adelapena:cassandra:17507-4.0]
 reproduces the bug. It testes a 3.x -> 4.0 rolling upgrade scenario with a 
table with {{COMPACT STORAGE}} and a query over that uses paging. The bug only 
seems to manifest itself when the driver uses native protocol v3, instead on 
the default (v5 for 4.0 and v4 for 3.11).

The tests results can be found 
[here|https://app.circleci.com/pipelines/github/adelapena/cassandra/2536/workflows/5791569d-8ea1-42b5-bacd-bd8716afaee8/jobs/25163].
 The artifacts stored for each test contain an identical stacktrace, for 
example [this 
one|https://output.circle-artifacts.com/output/job/f4cbecbc-92dd-49c8-a75d-a5a7b53bcd21/artifacts/0/stdout/fails/1/org.apache.cassandra.distributed.upgrade.CompactStoragePagingTest%23testPagingWithCompactStorageAndProtocolVersion.txt]

If this is actually caused by the combination of {{{}COMPACT STORAGE{}}}, 
paging and and old protocol version, probably the easiest workaround until we 
get a fix is setting the driver to use a more recent version of the native 
transport protocol.


was (Author: adelapena):
[This JVM 
dtest|https://github.com/apache/cassandra/compare/trunk...adelapena:cassandra:17507-4.0?expand=1]
 reproduces the bug. It testes a 3.x -> 4.0 rolling upgrade scenario with a 
table with {{COMPACT STORAGE}} and a query over that uses paging. The bug only 
seems to manifest itself when the driver uses native protocol v3, instead on 
the default (v5 for 4.0 and v4 for 3.11).

The tests results can be found 
[here|https://app.circleci.com/pipelines/github/adelapena/cassandra/2536/workflows/5791569d-8ea1-42b5-bacd-bd8716afaee8/jobs/25163].
 The artifacts stored for each test contain an identical stacktrace, for 
example [this 
one|https://output.circle-artifacts.com/output/job/f4cbecbc-92dd-49c8-a75d-a5a7b53bcd21/artifacts/0/stdout/fails/1/org.apache.cassandra.distributed.upgrade.CompactStoragePagingTest%23testPagingWithCompactStorageAndProtocolVersion.txt]

If this is actually caused by the combination of {{{}COMPACT STORAGE{}}}, 
paging and and old protocol version, probably the easiest workaround until we 
get a fix is setting the driver to use a more recent version of the native 
transport protocol.

> IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling 
> upgrade
> ---
>
> Key: CASSANDRA-17507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 4.0.x
>
>
> In a 6 node 3.11.12 test cluster - freshly set up, thus no legacy SSTables 
> etc. - with ~ 1TB SSTables on disk per node, I have been running a rolling 
> upgrade to 4.0.3. On upgraded 4.0.3 nodes I then have seen the following 
> exception regularly, which disappeared once all 6 nodes have been on 4.0.3. 
> Is this known? Can this be ignored? As said, just a test drive, but not sure 
> if we want to have that in production, especially with a larger number of 
> nodes, where it could take some time, until all are upgraded. Thanks!
> {code}
> ERROR [Native-Transport-Requests-8] 2022-03-30 11:30:24,057 
> ErrorMessage.java:457 - Unexpected exception during request
> java.lang.IllegalArgumentException: newLimit > capacity: (290 > 15)
> at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)
> at java.base/java.nio.Buffer.limit(Buffer.java:346)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:262)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:107)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:39)
> at 
> org.apache.cassandra.db.marshal.ValueAccessor.sliceWithShortLength(ValueAccessor.java:225)
> at 
> org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:222)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.decodeClustering(PagingState.java:434)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.clustering(PagingState.java:388)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:88)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:32)
>  

[jira] [Comment Edited] (CASSANDRA-17507) IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling upgrade

2023-01-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656119#comment-17656119
 ] 

Andres de la Peña edited comment on CASSANDRA-17507 at 1/9/23 3:15 PM:
---

[This JVM 
dtest|https://github.com/apache/cassandra/compare/trunk...adelapena:cassandra:17507-4.0?expand=1]
 reproduces the bug. It testes a 3.x -> 4.0 rolling upgrade scenario with a 
table with {{COMPACT STORAGE}} and a query over that uses paging. The bug only 
seems to manifest itself when the driver uses native protocol v3, instead on 
the default (v5 for 4.0 and v4 for 3.11).

The tests results can be found 
[here|https://app.circleci.com/pipelines/github/adelapena/cassandra/2536/workflows/5791569d-8ea1-42b5-bacd-bd8716afaee8/jobs/25163].
 The artifacts stored for each test contain an identical stacktrace, for 
example [this 
one|https://output.circle-artifacts.com/output/job/f4cbecbc-92dd-49c8-a75d-a5a7b53bcd21/artifacts/0/stdout/fails/1/org.apache.cassandra.distributed.upgrade.CompactStoragePagingTest%23testPagingWithCompactStorageAndProtocolVersion.txt]

If this is actually caused by the combination of {{{}COMPACT STORAGE{}}}, 
paging and and old protocol version, probably the easiest workaround until we 
get a fix is setting the driver to use a more recent version of the native 
transport protocol.


was (Author: adelapena):
[This JVM 
dtest|https://github.com/apache/cassandra/compare/trunk...adelapena:cassandra:17507-4.0?expand=1]
 reproduces the bug. It testes a 3.x -> 4.0 rolling upgrade scenario with a 
table with {{COMPACT STORAGE}} and a query over that uses paging. The bug only 
seems to manifest itself when the driver uses native protocol v3, instead on 
the default (v5 for 4.0 and v4 for 3.11).

The tests results can be found 
[here|https://app.circleci.com/pipelines/github/adelapena/cassandra/2536/workflows/5791569d-8ea1-42b5-bacd-bd8716afaee8/jobs/25163].
 The artifacts stored for each test contain an identical stacktrace, for 
example [this 
one|https://output.circle-artifacts.com/output/job/f4cbecbc-92dd-49c8-a75d-a5a7b53bcd21/artifacts/0/stdout/fails/1/org.apache.cassandra.distributed.upgrade.CompactStoragePagingTest%23testPagingWithCompactStorageAndProtocolVersion.txt]

If this actually is caused by the combination of {{{}COMPACT STORAGE{}}}, 
paging and and old protocol version, probably the easiest workaround until we 
get a fix is setting the driver to use a most recent version of the native 
transport protocol.

> IllegalArgumentException in query code path during 3.11.12 => 4.0.3 rolling 
> upgrade
> ---
>
> Key: CASSANDRA-17507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 4.0.x
>
>
> In a 6 node 3.11.12 test cluster - freshly set up, thus no legacy SSTables 
> etc. - with ~ 1TB SSTables on disk per node, I have been running a rolling 
> upgrade to 4.0.3. On upgraded 4.0.3 nodes I then have seen the following 
> exception regularly, which disappeared once all 6 nodes have been on 4.0.3. 
> Is this known? Can this be ignored? As said, just a test drive, but not sure 
> if we want to have that in production, especially with a larger number of 
> nodes, where it could take some time, until all are upgraded. Thanks!
> {code}
> ERROR [Native-Transport-Requests-8] 2022-03-30 11:30:24,057 
> ErrorMessage.java:457 - Unexpected exception during request
> java.lang.IllegalArgumentException: newLimit > capacity: (290 > 15)
> at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)
> at java.base/java.nio.Buffer.limit(Buffer.java:346)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)
> at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:262)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:107)
> at 
> org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:39)
> at 
> org.apache.cassandra.db.marshal.ValueAccessor.sliceWithShortLength(ValueAccessor.java:225)
> at 
> org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:222)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.decodeClustering(PagingState.java:434)
> at 
> org.apache.cassandra.service.pager.PagingState$RowMark.clustering(PagingState.java:388)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:88)
> at 
> org.apache.cassandra.service.pager.SinglePartitionPager.nextPageReadQuery(SinglePartitionPager.java:32)
>