[jira] [Updated] (CASSANDRA-19803) Flakey test org.apache.cassandra.distributed.test.TransientRangeMovement2Test#testMoveForward

2024-08-02 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19803:

Status: Ready to Commit  (was: Review In Progress)

+1

> Flakey test 
> org.apache.cassandra.distributed.test.TransientRangeMovement2Test#testMoveForward
> -
>
> Key: CASSANDRA-19803
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19803
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
>  junit.framework.AssertionFailedError: SHOULD NOT BE ON NODE: 11 -- 
> [(16,30)]: [00, 02, 04, 06, 08, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 
> 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 42, 44, 46, 48]
>   at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:231)
>   at 
> org.apache.cassandra.distributed.test.TransientRangeMovement2Test.testMoveForward(TransientRangeMovement2Test.java:143)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19803) Flakey test org.apache.cassandra.distributed.test.TransientRangeMovement2Test#testMoveForward

2024-08-02 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19803:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Flakey test 
> org.apache.cassandra.distributed.test.TransientRangeMovement2Test#testMoveForward
> -
>
> Key: CASSANDRA-19803
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19803
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
>  junit.framework.AssertionFailedError: SHOULD NOT BE ON NODE: 11 -- 
> [(16,30)]: [00, 02, 04, 06, 08, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 
> 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 42, 44, 46, 48]
>   at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:231)
>   at 
> org.apache.cassandra.distributed.test.TransientRangeMovement2Test.testMoveForward(TransientRangeMovement2Test.java:143)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19804) Flakey test upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD#test_rolling_upgrade

2024-07-31 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19804:

Status: Ready to Commit  (was: Review In Progress)

+1

> Flakey test 
> upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD#test_rolling_upgrade
> -
>
> Key: CASSANDRA-19804
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19804
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
>  ERRORS 
> 
> _ ERROR at teardown of 
> TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD.test_rolling_upgrade _
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [[node3] 'ERROR [InternalResponseStage:3] 2024-07-26 04:35:12,345 
> MessagingService.java:509 - Cannot send the message (from:/127.0.0.3:7000, 
> type:FETCH_LOG verb:TCM_FETCH_PEER_LOG_REQ) to /127.0.0.1:7000, as messaging 
> service is shutting down', [node3] 'ERROR [InternalResponseStage:4] 
> 2024-07-26 04:35:27,412 MessagingService.java:509 - Cannot send the message 
> (from:/127.0.0.3:7000, type:FETCH_LOG verb:TCM_FETCH_PEER_LOG_REQ) to 
> /127.0.0.1:7000, as messaging service is shutting down']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19804) Flakey test upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD#test_rolling_upgrade

2024-07-31 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19804:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Flakey test 
> upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD#test_rolling_upgrade
> -
>
> Key: CASSANDRA-19804
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19804
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
>  ERRORS 
> 
> _ ERROR at teardown of 
> TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD.test_rolling_upgrade _
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [[node3] 'ERROR [InternalResponseStage:3] 2024-07-26 04:35:12,345 
> MessagingService.java:509 - Cannot send the message (from:/127.0.0.3:7000, 
> type:FETCH_LOG verb:TCM_FETCH_PEER_LOG_REQ) to /127.0.0.1:7000, as messaging 
> service is shutting down', [node3] 'ERROR [InternalResponseStage:4] 
> 2024-07-26 04:35:27,412 MessagingService.java:509 - Cannot send the message 
> (from:/127.0.0.3:7000, type:FETCH_LOG verb:TCM_FETCH_PEER_LOG_REQ) to 
> /127.0.0.1:7000, as messaging service is shutting down']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19782) Host replacements no longer fully populate system.peers table

2024-07-31 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19782:

  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/32755cabfa2eeb99f0b8c91fc7bb53379259de54
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed, thanks!

> Host replacements no longer fully populate system.peers table
> -
>
> Key: CASSANDRA-19782
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19782
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> When running harry after a host replacement was done a failure happened due 
> to peers having the new node, but not the tokens for it (leading to a NPE in 
> harry).  I took the test 
> org.apache.cassandra.distributed.test.hostreplacement.HostReplacementTest#replaceDownedHost
>  and made one small change; log peers after the host replacement
> 4.1:
> {code}
> INFO  [main]  2024-07-18 09:36:48,211 HostReplacementTest.java:107 - 
> Peers table from node1:
> [/127.0.0.3, datacenter0, --4000-8000-0003, null, rack0, 
> 4.1.5-SNAPSHOT, /127.0.0.3, 94a14fb6-2cd9-3d1d-af84-a30e257aa7b8, 
> [9223372036854775805]]
> {code}
> Trunk:
> {code}
> INFO  [main]  2024-07-18 09:38:59,568 HostReplacementTest.java:109 - 
> Peers table from node1:
> [/127.0.0.3, null, null, null, null, 5.1.0-SNAPSHOT, /127.0.0.3, 
> ----000a, null]
> {code}
> Several fields are missing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19782) Host replacements no longer fully populate system.peers table

2024-07-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-19782:
---

Assignee: Sam Tunnicliffe

> Host replacements no longer fully populate system.peers table
> -
>
> Key: CASSANDRA-19782
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19782
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> When running harry after a host replacement was done a failure happened due 
> to peers having the new node, but not the tokens for it (leading to a NPE in 
> harry).  I took the test 
> org.apache.cassandra.distributed.test.hostreplacement.HostReplacementTest#replaceDownedHost
>  and made one small change; log peers after the host replacement
> 4.1:
> {code}
> INFO  [main]  2024-07-18 09:36:48,211 HostReplacementTest.java:107 - 
> Peers table from node1:
> [/127.0.0.3, datacenter0, --4000-8000-0003, null, rack0, 
> 4.1.5-SNAPSHOT, /127.0.0.3, 94a14fb6-2cd9-3d1d-af84-a30e257aa7b8, 
> [9223372036854775805]]
> {code}
> Trunk:
> {code}
> INFO  [main]  2024-07-18 09:38:59,568 HostReplacementTest.java:109 - 
> Peers table from node1:
> [/127.0.0.3, null, null, null, null, 5.1.0-SNAPSHOT, /127.0.0.3, 
> ----000a, null]
> {code}
> Several fields are missing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19361) fix node info NPE when ClusterMetadata is null

2024-07-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19361:

Fix Version/s: 5.x
   (was: 5.0)
   (was: 5.0-rc1)

> fix node info NPE when ClusterMetadata is null
> --
>
> Key: CASSANDRA-19361
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19361
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool, Transactional Cluster Metadata
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Normal
> Fix For: 5.x
>
> Attachments: CASSANDRA-19361-stack-error.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. How
>  
> I create an ensemble with 3 nodes(It works well), then I add the fourth node 
> to join the party. 
> when executing nodetool info, get the following exception:
> {code:java}
> ➜  bin ./nodetool info
> java.lang.NullPointerException at 
> org.apache.cassandra.service.StorageService.operationMode(StorageService.java:3744)
>  at 
> org.apache.cassandra.service.StorageService.isBootstrapFailed(StorageService.java:3810)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)   
> ➜  bin ./nodetool info 
> WARN  [InternalResponseStage:152] 2024-02-02 11:45:15,731 
> RemoteProcessor.java:213 - Got error from /127.0.0.4:7000: TIMEOUT when 
> sending TCM_COMMIT_REQ, retrying on 
> CandidateIterator{candidates=[/127.0.0.4:7000], checkLive=true} error: null 
> -- StackTrace -- java.lang.NullPointerException at 
> org.apache.cassandra.service.StorageService.getLocalHostId(StorageService.java:1904)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at 
> jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> java.base/sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:260){code}
> server 1 cannot execute node info and cql shell, server 2 and 3 can do it. 
> Try to query the system prefix tables, I attach stack error log for the 
> further debugging. Cannot find a way to recover. After deleting data(losing 
> all data), restart and everything became OK
> {code:java}
> ➜  bin ./nodetool status
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load  Tokens  Owns (effective)  Host ID                        
>        Rack
> UN  127.0.0.2  ?     16      51.2%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> DN  127.0.0.4  ?     16      48.8%             
> 6d194555-f6eb-41d0-c000-0001  rack1{code}
> h3. When
>  
> It was introduced by the Patch: CEP-21. Anyway, the NPE check is needed to 
> protect its propagation anywhere
> {code:java}
> Implementation of Transactional Cluster Metadata as described in CEP-21
> Hash: ae084237
>  
> code diff:
>  
>     public String getLocalHostId()
>      {
> -        UUID id = getLocalHostUUID();
> -        return id != null ? id.toString() : null;
> +        return getLocalHostUUID().toString();
>      }
>  
>      public UUID getLocalHostUUID()
>      {
> -        UUID id = 
> getTokenMetadata().getHostId(FBUtilities.getBroadcastAddressAndPort());
> -        if (id != null)
> -            return id;
> -        // this condition is to prevent accessing the tables when the node 
> is not started yet, and in particular,
> -        // when it is not going to be started at all (e.g. when running some 
> unit tests or client tools).
> -        else if ((DatabaseDescriptor.isDaemonInitialized() || 
> DatabaseDescriptor.isToolInitialized()) && CommitLog.instance.isStarted())
> -            return SystemKeyspace.getLocalHostId();
> -
> -        return null;
> +        // Metadata collector requires using local host id, and flush of 
> IndexInfo may race with
> +        // creation and initialization of cluster metadata service. Metadata 
> collector does accept
> +        // 

[jira] [Updated] (CASSANDRA-19782) Host replacements no longer fully populate system.peers table

2024-07-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19782:

Attachment: ci_summary.html

> Host replacements no longer fully populate system.peers table
> -
>
> Key: CASSANDRA-19782
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19782
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> When running harry after a host replacement was done a failure happened due 
> to peers having the new node, but not the tokens for it (leading to a NPE in 
> harry).  I took the test 
> org.apache.cassandra.distributed.test.hostreplacement.HostReplacementTest#replaceDownedHost
>  and made one small change; log peers after the host replacement
> 4.1:
> {code}
> INFO  [main]  2024-07-18 09:36:48,211 HostReplacementTest.java:107 - 
> Peers table from node1:
> [/127.0.0.3, datacenter0, --4000-8000-0003, null, rack0, 
> 4.1.5-SNAPSHOT, /127.0.0.3, 94a14fb6-2cd9-3d1d-af84-a30e257aa7b8, 
> [9223372036854775805]]
> {code}
> Trunk:
> {code}
> INFO  [main]  2024-07-18 09:38:59,568 HostReplacementTest.java:109 - 
> Peers table from node1:
> [/127.0.0.3, null, null, null, null, 5.1.0-SNAPSHOT, /127.0.0.3, 
> ----000a, null]
> {code}
> Several fields are missing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19782) Host replacements no longer fully populate system.peers table

2024-07-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19782:

Test and Documentation Plan: Updated existing test
 Status: Patch Available  (was: Open)

> Host replacements no longer fully populate system.peers table
> -
>
> Key: CASSANDRA-19782
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19782
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> When running harry after a host replacement was done a failure happened due 
> to peers having the new node, but not the tokens for it (leading to a NPE in 
> harry).  I took the test 
> org.apache.cassandra.distributed.test.hostreplacement.HostReplacementTest#replaceDownedHost
>  and made one small change; log peers after the host replacement
> 4.1:
> {code}
> INFO  [main]  2024-07-18 09:36:48,211 HostReplacementTest.java:107 - 
> Peers table from node1:
> [/127.0.0.3, datacenter0, --4000-8000-0003, null, rack0, 
> 4.1.5-SNAPSHOT, /127.0.0.3, 94a14fb6-2cd9-3d1d-af84-a30e257aa7b8, 
> [9223372036854775805]]
> {code}
> Trunk:
> {code}
> INFO  [main]  2024-07-18 09:38:59,568 HostReplacementTest.java:109 - 
> Peers table from node1:
> [/127.0.0.3, null, null, null, null, 5.1.0-SNAPSHOT, /127.0.0.3, 
> ----000a, null]
> {code}
> Several fields are missing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19239) jvm-dtests crash on java 17

2024-07-18 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867081#comment-17867081
 ] 

Sam Tunnicliffe commented on CASSANDRA-19239:
-

It looks as though this is due to a classloader leak dating back to 
CASSANDRA-16565. The {{oshi.jna}} package needed adding to the default list of 
shared packages. Kudos to [~drohrer] for tracking it down.

I've made the fix in the dtest-api lib directly (& also added the 
{{org.jboss.byteman}} packages we append to the list in 
{{{}AbstractCluster{}}}).

The 
[0.0.17-b9f2d0a-SNAPSHOT|https://repository.apache.org/content/groups/snapshots/org/apache/cassandra/dtest-api/0.0.17-b9f2d0a-SNAPSHOT/]
 includes this fix and as soon as I can demonstrate that via a CI run we can 
cut a release and update the dependency.  

> jvm-dtests crash on java 17
> ---
>
> Key: CASSANDRA-19239
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19239
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 5.x
>
> Attachments: image-2024-01-23-13-11-50-313.png, 
> image-2024-01-23-13-12-33-954.png, screenshot-1.png, screenshot-2.png
>
>
> This is a similar problem to the one mentioned in 
> https://issues.apache.org/jira/browse/CASSANDRA-15981
> I'm filling it because I've noticed the same problem on JDK17, perhaps we 
> should also disable unloading classes with CMS for JDK17. 
> However, I'm in favour of moving tests to G1 instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19761) When JVM dtest is shutting down, if a new epoch is being committed the node is unable to shut down

2024-07-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19761:

  Fix Version/s: 5.x
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/dc45bb5876aafa2ce7dcfe6a3b7de0f6a9a35fda
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> When JVM dtest is shutting down, if a new epoch is being committed the node 
> is unable to shut down
> --
>
> Key: CASSANDRA-19761
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19761
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Low
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> The following was seen in the accord branch, but the problem is found in 
> trunk as well.
> {code}
> node1_isolatedExecutor:8:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$None.parkNanos(InterceptorOfSystemMethods.java:373)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$Global.parkNanos(InterceptorOfSystemMethods.java:166)
>   
> java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
>   
> java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTerminationUntil(ExecutorUtils.java:110)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTermination(ExecutorUtils.java:100)
>   org.apache.cassandra.concurrent.Stage.shutdownAndWait(Stage.java:195)
>   
> org.apache.cassandra.distributed.impl.Instance.lambda$shutdown$44(Instance.java:975)
> {code}
> {code}
> node1_MiscStage:1:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.park(LockSupport.java:323)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:290)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:283)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:306)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:338)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$Defaults.awaitUninterruptibly(Awaitable.java:186)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AbstractAwaitable.awaitUninterruptibly(Awaitable.java:259)
>   org.apache.cassandra.tcm.log.LocalLog$Async.runOnce(LocalLog.java:710)
>   org.apache.cassandra.tcm.log.LocalLog.runOnce(LocalLog.java:404)
>   
> org.apache.cassandra.tcm.log.LocalLog.waitForHighestConsecutive(LocalLog.java:346)
>   
> org.apache.cassandra.tcm.PaxosBackedProcessor.fetchLogAndWait(PaxosBackedProcessor.java:163)
>   
> org.apache.cassandra.tcm.AbstractLocalProcessor.commit(AbstractLocalProcessor.java:109)
>   
> org.apache.cassandra.distributed.test.log.TestProcessor.commit(TestProcessor.java:61)
>   
> org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.commit(ClusterMetadataService.java:841)
>   org.apache.cassandra.tcm.Processor.commit(Processor.java:45)
>   
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:516)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl.lambda$updateFastPath$2(AccordFastPathCoordinator.java:208)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl$$Lambda$11211/0x000802441840.run(Unknown
>  Source)
> {code}
> Accord is trying to commit a new epoch, but TCM uses “awaitUninterruptibly” 
> which ignores the thread interrupt done while the cluster is shutting down.  
> When this is happening the instance is unable to make progress so loops 
> endlessly, causing the test to fail to close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19761) When JVM dtest is shutting down, if a new epoch is being committed the node is unable to shut down

2024-07-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19761:

Status: Ready to Commit  (was: Review In Progress)

+1s from David & Alex on the PR

> When JVM dtest is shutting down, if a new epoch is being committed the node 
> is unable to shut down
> --
>
> Key: CASSANDRA-19761
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19761
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Low
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> The following was seen in the accord branch, but the problem is found in 
> trunk as well.
> {code}
> node1_isolatedExecutor:8:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$None.parkNanos(InterceptorOfSystemMethods.java:373)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$Global.parkNanos(InterceptorOfSystemMethods.java:166)
>   
> java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
>   
> java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTerminationUntil(ExecutorUtils.java:110)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTermination(ExecutorUtils.java:100)
>   org.apache.cassandra.concurrent.Stage.shutdownAndWait(Stage.java:195)
>   
> org.apache.cassandra.distributed.impl.Instance.lambda$shutdown$44(Instance.java:975)
> {code}
> {code}
> node1_MiscStage:1:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.park(LockSupport.java:323)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:290)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:283)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:306)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:338)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$Defaults.awaitUninterruptibly(Awaitable.java:186)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AbstractAwaitable.awaitUninterruptibly(Awaitable.java:259)
>   org.apache.cassandra.tcm.log.LocalLog$Async.runOnce(LocalLog.java:710)
>   org.apache.cassandra.tcm.log.LocalLog.runOnce(LocalLog.java:404)
>   
> org.apache.cassandra.tcm.log.LocalLog.waitForHighestConsecutive(LocalLog.java:346)
>   
> org.apache.cassandra.tcm.PaxosBackedProcessor.fetchLogAndWait(PaxosBackedProcessor.java:163)
>   
> org.apache.cassandra.tcm.AbstractLocalProcessor.commit(AbstractLocalProcessor.java:109)
>   
> org.apache.cassandra.distributed.test.log.TestProcessor.commit(TestProcessor.java:61)
>   
> org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.commit(ClusterMetadataService.java:841)
>   org.apache.cassandra.tcm.Processor.commit(Processor.java:45)
>   
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:516)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl.lambda$updateFastPath$2(AccordFastPathCoordinator.java:208)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl$$Lambda$11211/0x000802441840.run(Unknown
>  Source)
> {code}
> Accord is trying to commit a new epoch, but TCM uses “awaitUninterruptibly” 
> which ignores the thread interrupt done while the cluster is shutting down.  
> When this is happening the instance is unable to make progress so loops 
> endlessly, causing the test to fail to close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19361) fix node info NPE when ClusterMetadata is null

2024-07-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19361:

Resolution: Cannot Reproduce
Status: Resolved  (was: Triage Needed)

> fix node info NPE when ClusterMetadata is null
> --
>
> Key: CASSANDRA-19361
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19361
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool, Transactional Cluster Metadata
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Normal
> Fix For: 5.0.x
>
> Attachments: CASSANDRA-19361-stack-error.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. How
>  
> I create an ensemble with 3 nodes(It works well), then I add the fourth node 
> to join the party. 
> when executing nodetool info, get the following exception:
> {code:java}
> ➜  bin ./nodetool info
> java.lang.NullPointerException at 
> org.apache.cassandra.service.StorageService.operationMode(StorageService.java:3744)
>  at 
> org.apache.cassandra.service.StorageService.isBootstrapFailed(StorageService.java:3810)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)   
> ➜  bin ./nodetool info 
> WARN  [InternalResponseStage:152] 2024-02-02 11:45:15,731 
> RemoteProcessor.java:213 - Got error from /127.0.0.4:7000: TIMEOUT when 
> sending TCM_COMMIT_REQ, retrying on 
> CandidateIterator{candidates=[/127.0.0.4:7000], checkLive=true} error: null 
> -- StackTrace -- java.lang.NullPointerException at 
> org.apache.cassandra.service.StorageService.getLocalHostId(StorageService.java:1904)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at 
> jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> java.base/sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:260){code}
> server 1 cannot execute node info and cql shell, server 2 and 3 can do it. 
> Try to query the system prefix tables, I attach stack error log for the 
> further debugging. Cannot find a way to recover. After deleting data(losing 
> all data), restart and everything became OK
> {code:java}
> ➜  bin ./nodetool status
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load  Tokens  Owns (effective)  Host ID                        
>        Rack
> UN  127.0.0.2  ?     16      51.2%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> DN  127.0.0.4  ?     16      48.8%             
> 6d194555-f6eb-41d0-c000-0001  rack1{code}
> h3. When
>  
> It was introduced by the Patch: CEP-21. Anyway, the NPE check is needed to 
> protect its propagation anywhere
> {code:java}
> Implementation of Transactional Cluster Metadata as described in CEP-21
> Hash: ae084237
>  
> code diff:
>  
>     public String getLocalHostId()
>      {
> -        UUID id = getLocalHostUUID();
> -        return id != null ? id.toString() : null;
> +        return getLocalHostUUID().toString();
>      }
>  
>      public UUID getLocalHostUUID()
>      {
> -        UUID id = 
> getTokenMetadata().getHostId(FBUtilities.getBroadcastAddressAndPort());
> -        if (id != null)
> -            return id;
> -        // this condition is to prevent accessing the tables when the node 
> is not started yet, and in particular,
> -        // when it is not going to be started at all (e.g. when running some 
> unit tests or client tools).
> -        else if ((DatabaseDescriptor.isDaemonInitialized() || 
> DatabaseDescriptor.isToolInitialized()) && CommitLog.instance.isStarted())
> -            return SystemKeyspace.getLocalHostId();
> -
> -        return null;
> +        // Metadata collector requires using local host id, and flush of 
> IndexInfo may race with
> +        // creation and initialization of cluster metadata service. Metadata 
> collector does accept
> +        // null localhost 

[jira] [Updated] (CASSANDRA-19761) When JVM dtest is shutting down, if a new epoch is being committed the node is unable to shut down

2024-07-16 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19761:

Attachment: ci_summary-1.html

> When JVM dtest is shutting down, if a new epoch is being committed the node 
> is unable to shut down
> --
>
> Key: CASSANDRA-19761
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19761
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Low
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> The following was seen in the accord branch, but the problem is found in 
> trunk as well.
> {code}
> node1_isolatedExecutor:8:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$None.parkNanos(InterceptorOfSystemMethods.java:373)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$Global.parkNanos(InterceptorOfSystemMethods.java:166)
>   
> java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
>   
> java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTerminationUntil(ExecutorUtils.java:110)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTermination(ExecutorUtils.java:100)
>   org.apache.cassandra.concurrent.Stage.shutdownAndWait(Stage.java:195)
>   
> org.apache.cassandra.distributed.impl.Instance.lambda$shutdown$44(Instance.java:975)
> {code}
> {code}
> node1_MiscStage:1:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.park(LockSupport.java:323)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:290)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:283)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:306)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:338)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$Defaults.awaitUninterruptibly(Awaitable.java:186)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AbstractAwaitable.awaitUninterruptibly(Awaitable.java:259)
>   org.apache.cassandra.tcm.log.LocalLog$Async.runOnce(LocalLog.java:710)
>   org.apache.cassandra.tcm.log.LocalLog.runOnce(LocalLog.java:404)
>   
> org.apache.cassandra.tcm.log.LocalLog.waitForHighestConsecutive(LocalLog.java:346)
>   
> org.apache.cassandra.tcm.PaxosBackedProcessor.fetchLogAndWait(PaxosBackedProcessor.java:163)
>   
> org.apache.cassandra.tcm.AbstractLocalProcessor.commit(AbstractLocalProcessor.java:109)
>   
> org.apache.cassandra.distributed.test.log.TestProcessor.commit(TestProcessor.java:61)
>   
> org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.commit(ClusterMetadataService.java:841)
>   org.apache.cassandra.tcm.Processor.commit(Processor.java:45)
>   
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:516)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl.lambda$updateFastPath$2(AccordFastPathCoordinator.java:208)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl$$Lambda$11211/0x000802441840.run(Unknown
>  Source)
> {code}
> Accord is trying to commit a new epoch, but TCM uses “awaitUninterruptibly” 
> which ignores the thread interrupt done while the cluster is shutting down.  
> When this is happening the instance is unable to make progress so loops 
> endlessly, causing the test to fail to close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19761) When JVM dtest is shutting down, if a new epoch is being committed the node is unable to shut down

2024-07-15 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19761:

Severity: Low  (was: Critical)

> When JVM dtest is shutting down, if a new epoch is being committed the node 
> is unable to shut down
> --
>
> Key: CASSANDRA-19761
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19761
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Low
> Attachments: ci_summary.html
>
>
> The following was seen in the accord branch, but the problem is found in 
> trunk as well.
> {code}
> node1_isolatedExecutor:8:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$None.parkNanos(InterceptorOfSystemMethods.java:373)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$Global.parkNanos(InterceptorOfSystemMethods.java:166)
>   
> java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
>   
> java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTerminationUntil(ExecutorUtils.java:110)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTermination(ExecutorUtils.java:100)
>   org.apache.cassandra.concurrent.Stage.shutdownAndWait(Stage.java:195)
>   
> org.apache.cassandra.distributed.impl.Instance.lambda$shutdown$44(Instance.java:975)
> {code}
> {code}
> node1_MiscStage:1:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.park(LockSupport.java:323)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:290)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:283)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:306)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:338)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$Defaults.awaitUninterruptibly(Awaitable.java:186)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AbstractAwaitable.awaitUninterruptibly(Awaitable.java:259)
>   org.apache.cassandra.tcm.log.LocalLog$Async.runOnce(LocalLog.java:710)
>   org.apache.cassandra.tcm.log.LocalLog.runOnce(LocalLog.java:404)
>   
> org.apache.cassandra.tcm.log.LocalLog.waitForHighestConsecutive(LocalLog.java:346)
>   
> org.apache.cassandra.tcm.PaxosBackedProcessor.fetchLogAndWait(PaxosBackedProcessor.java:163)
>   
> org.apache.cassandra.tcm.AbstractLocalProcessor.commit(AbstractLocalProcessor.java:109)
>   
> org.apache.cassandra.distributed.test.log.TestProcessor.commit(TestProcessor.java:61)
>   
> org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.commit(ClusterMetadataService.java:841)
>   org.apache.cassandra.tcm.Processor.commit(Processor.java:45)
>   
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:516)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl.lambda$updateFastPath$2(AccordFastPathCoordinator.java:208)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl$$Lambda$11211/0x000802441840.run(Unknown
>  Source)
> {code}
> Accord is trying to commit a new epoch, but TCM uses “awaitUninterruptibly” 
> which ignores the thread interrupt done while the cluster is shutting down.  
> When this is happening the instance is unable to make progress so loops 
> endlessly, causing the test to fail to close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19761) When JVM dtest is shutting down, if a new epoch is being committed the node is unable to shut down

2024-07-15 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19761:

Reviewers: Alex Petrov, Sam Tunnicliffe
   Status: Review In Progress  (was: Patch Available)

> When JVM dtest is shutting down, if a new epoch is being committed the node 
> is unable to shut down
> --
>
> Key: CASSANDRA-19761
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19761
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Attachments: ci_summary.html
>
>
> The following was seen in the accord branch, but the problem is found in 
> trunk as well.
> {code}
> node1_isolatedExecutor:8:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$None.parkNanos(InterceptorOfSystemMethods.java:373)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$Global.parkNanos(InterceptorOfSystemMethods.java:166)
>   
> java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
>   
> java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTerminationUntil(ExecutorUtils.java:110)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTermination(ExecutorUtils.java:100)
>   org.apache.cassandra.concurrent.Stage.shutdownAndWait(Stage.java:195)
>   
> org.apache.cassandra.distributed.impl.Instance.lambda$shutdown$44(Instance.java:975)
> {code}
> {code}
> node1_MiscStage:1:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.park(LockSupport.java:323)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:290)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:283)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:306)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:338)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$Defaults.awaitUninterruptibly(Awaitable.java:186)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AbstractAwaitable.awaitUninterruptibly(Awaitable.java:259)
>   org.apache.cassandra.tcm.log.LocalLog$Async.runOnce(LocalLog.java:710)
>   org.apache.cassandra.tcm.log.LocalLog.runOnce(LocalLog.java:404)
>   
> org.apache.cassandra.tcm.log.LocalLog.waitForHighestConsecutive(LocalLog.java:346)
>   
> org.apache.cassandra.tcm.PaxosBackedProcessor.fetchLogAndWait(PaxosBackedProcessor.java:163)
>   
> org.apache.cassandra.tcm.AbstractLocalProcessor.commit(AbstractLocalProcessor.java:109)
>   
> org.apache.cassandra.distributed.test.log.TestProcessor.commit(TestProcessor.java:61)
>   
> org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.commit(ClusterMetadataService.java:841)
>   org.apache.cassandra.tcm.Processor.commit(Processor.java:45)
>   
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:516)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl.lambda$updateFastPath$2(AccordFastPathCoordinator.java:208)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl$$Lambda$11211/0x000802441840.run(Unknown
>  Source)
> {code}
> Accord is trying to commit a new epoch, but TCM uses “awaitUninterruptibly” 
> which ignores the thread interrupt done while the cluster is shutting down.  
> When this is happening the instance is unable to make progress so loops 
> endlessly, causing the test to fail to close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19761) When JVM dtest is shutting down, if a new epoch is being committed the node is unable to shut down

2024-07-15 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19761:

Reviewers: Alex Petrov  (was: Alex Petrov, Sam Tunnicliffe)

> When JVM dtest is shutting down, if a new epoch is being committed the node 
> is unable to shut down
> --
>
> Key: CASSANDRA-19761
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19761
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Attachments: ci_summary.html
>
>
> The following was seen in the accord branch, but the problem is found in 
> trunk as well.
> {code}
> node1_isolatedExecutor:8:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$None.parkNanos(InterceptorOfSystemMethods.java:373)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$Global.parkNanos(InterceptorOfSystemMethods.java:166)
>   
> java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
>   
> java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTerminationUntil(ExecutorUtils.java:110)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTermination(ExecutorUtils.java:100)
>   org.apache.cassandra.concurrent.Stage.shutdownAndWait(Stage.java:195)
>   
> org.apache.cassandra.distributed.impl.Instance.lambda$shutdown$44(Instance.java:975)
> {code}
> {code}
> node1_MiscStage:1:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.park(LockSupport.java:323)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:290)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:283)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:306)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:338)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$Defaults.awaitUninterruptibly(Awaitable.java:186)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AbstractAwaitable.awaitUninterruptibly(Awaitable.java:259)
>   org.apache.cassandra.tcm.log.LocalLog$Async.runOnce(LocalLog.java:710)
>   org.apache.cassandra.tcm.log.LocalLog.runOnce(LocalLog.java:404)
>   
> org.apache.cassandra.tcm.log.LocalLog.waitForHighestConsecutive(LocalLog.java:346)
>   
> org.apache.cassandra.tcm.PaxosBackedProcessor.fetchLogAndWait(PaxosBackedProcessor.java:163)
>   
> org.apache.cassandra.tcm.AbstractLocalProcessor.commit(AbstractLocalProcessor.java:109)
>   
> org.apache.cassandra.distributed.test.log.TestProcessor.commit(TestProcessor.java:61)
>   
> org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.commit(ClusterMetadataService.java:841)
>   org.apache.cassandra.tcm.Processor.commit(Processor.java:45)
>   
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:516)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl.lambda$updateFastPath$2(AccordFastPathCoordinator.java:208)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl$$Lambda$11211/0x000802441840.run(Unknown
>  Source)
> {code}
> Accord is trying to commit a new epoch, but TCM uses “awaitUninterruptibly” 
> which ignores the thread interrupt done while the cluster is shutting down.  
> When this is happening the instance is unable to make progress so loops 
> endlessly, causing the test to fail to close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19761) When JVM dtest is shutting down, if a new epoch is being committed the node is unable to shut down

2024-07-15 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19761:

Attachment: ci_summary.html

> When JVM dtest is shutting down, if a new epoch is being committed the node 
> is unable to shut down
> --
>
> Key: CASSANDRA-19761
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19761
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Attachments: ci_summary.html
>
>
> The following was seen in the accord branch, but the problem is found in 
> trunk as well.
> {code}
> node1_isolatedExecutor:8:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$None.parkNanos(InterceptorOfSystemMethods.java:373)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$Global.parkNanos(InterceptorOfSystemMethods.java:166)
>   
> java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
>   
> java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTerminationUntil(ExecutorUtils.java:110)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTermination(ExecutorUtils.java:100)
>   org.apache.cassandra.concurrent.Stage.shutdownAndWait(Stage.java:195)
>   
> org.apache.cassandra.distributed.impl.Instance.lambda$shutdown$44(Instance.java:975)
> {code}
> {code}
> node1_MiscStage:1:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.park(LockSupport.java:323)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:290)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:283)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:306)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:338)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$Defaults.awaitUninterruptibly(Awaitable.java:186)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AbstractAwaitable.awaitUninterruptibly(Awaitable.java:259)
>   org.apache.cassandra.tcm.log.LocalLog$Async.runOnce(LocalLog.java:710)
>   org.apache.cassandra.tcm.log.LocalLog.runOnce(LocalLog.java:404)
>   
> org.apache.cassandra.tcm.log.LocalLog.waitForHighestConsecutive(LocalLog.java:346)
>   
> org.apache.cassandra.tcm.PaxosBackedProcessor.fetchLogAndWait(PaxosBackedProcessor.java:163)
>   
> org.apache.cassandra.tcm.AbstractLocalProcessor.commit(AbstractLocalProcessor.java:109)
>   
> org.apache.cassandra.distributed.test.log.TestProcessor.commit(TestProcessor.java:61)
>   
> org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.commit(ClusterMetadataService.java:841)
>   org.apache.cassandra.tcm.Processor.commit(Processor.java:45)
>   
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:516)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl.lambda$updateFastPath$2(AccordFastPathCoordinator.java:208)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl$$Lambda$11211/0x000802441840.run(Unknown
>  Source)
> {code}
> Accord is trying to commit a new epoch, but TCM uses “awaitUninterruptibly” 
> which ignores the thread interrupt done while the cluster is shutting down.  
> When this is happening the instance is unable to make progress so loops 
> endlessly, causing the test to fail to close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19761) When JVM dtest is shutting down, if a new epoch is being committed the node is unable to shut down

2024-07-15 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19761:

Authors: Sam Tunnicliffe
Test and Documentation Plan: Added new dtest, run existing tests.
 Status: Patch Available  (was: Open)

Linked to trunk PR & attached CI summary with no new failures

> When JVM dtest is shutting down, if a new epoch is being committed the node 
> is unable to shut down
> --
>
> Key: CASSANDRA-19761
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19761
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Priority: Normal
>
> The following was seen in the accord branch, but the problem is found in 
> trunk as well.
> {code}
> node1_isolatedExecutor:8:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$None.parkNanos(InterceptorOfSystemMethods.java:373)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$Global.parkNanos(InterceptorOfSystemMethods.java:166)
>   
> java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
>   
> java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTerminationUntil(ExecutorUtils.java:110)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTermination(ExecutorUtils.java:100)
>   org.apache.cassandra.concurrent.Stage.shutdownAndWait(Stage.java:195)
>   
> org.apache.cassandra.distributed.impl.Instance.lambda$shutdown$44(Instance.java:975)
> {code}
> {code}
> node1_MiscStage:1:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.park(LockSupport.java:323)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:290)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:283)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:306)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:338)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$Defaults.awaitUninterruptibly(Awaitable.java:186)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AbstractAwaitable.awaitUninterruptibly(Awaitable.java:259)
>   org.apache.cassandra.tcm.log.LocalLog$Async.runOnce(LocalLog.java:710)
>   org.apache.cassandra.tcm.log.LocalLog.runOnce(LocalLog.java:404)
>   
> org.apache.cassandra.tcm.log.LocalLog.waitForHighestConsecutive(LocalLog.java:346)
>   
> org.apache.cassandra.tcm.PaxosBackedProcessor.fetchLogAndWait(PaxosBackedProcessor.java:163)
>   
> org.apache.cassandra.tcm.AbstractLocalProcessor.commit(AbstractLocalProcessor.java:109)
>   
> org.apache.cassandra.distributed.test.log.TestProcessor.commit(TestProcessor.java:61)
>   
> org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.commit(ClusterMetadataService.java:841)
>   org.apache.cassandra.tcm.Processor.commit(Processor.java:45)
>   
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:516)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl.lambda$updateFastPath$2(AccordFastPathCoordinator.java:208)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl$$Lambda$11211/0x000802441840.run(Unknown
>  Source)
> {code}
> Accord is trying to commit a new epoch, but TCM uses “awaitUninterruptibly” 
> which ignores the thread interrupt done while the cluster is shutting down.  
> When this is happening the instance is unable to make progress so loops 
> endlessly, causing the test to fail to close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19761) When JVM dtest is shutting down, if a new epoch is being committed the node is unable to shut down

2024-07-15 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-19761:
---

Assignee: Sam Tunnicliffe

> When JVM dtest is shutting down, if a new epoch is being committed the node 
> is unable to shut down
> --
>
> Key: CASSANDRA-19761
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19761
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
>
> The following was seen in the accord branch, but the problem is found in 
> trunk as well.
> {code}
> node1_isolatedExecutor:8:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$None.parkNanos(InterceptorOfSystemMethods.java:373)
>   
> org.apache.cassandra.simulator.systems.InterceptorOfSystemMethods$Global.parkNanos(InterceptorOfSystemMethods.java:166)
>   
> java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
>   
> java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTerminationUntil(ExecutorUtils.java:110)
>   
> org.apache.cassandra.utils.ExecutorUtils.awaitTermination(ExecutorUtils.java:100)
>   org.apache.cassandra.concurrent.Stage.shutdownAndWait(Stage.java:195)
>   
> org.apache.cassandra.distributed.impl.Instance.lambda$shutdown$44(Instance.java:975)
> {code}
> {code}
> node1_MiscStage:1:
>   java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
>   
> java.base@11.0.15/java.util.concurrent.locks.LockSupport.park(LockSupport.java:323)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:290)
>   
> org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:283)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:306)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AsyncAwaitable.await(Awaitable.java:338)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$Defaults.awaitUninterruptibly(Awaitable.java:186)
>   
> org.apache.cassandra.utils.concurrent.Awaitable$AbstractAwaitable.awaitUninterruptibly(Awaitable.java:259)
>   org.apache.cassandra.tcm.log.LocalLog$Async.runOnce(LocalLog.java:710)
>   org.apache.cassandra.tcm.log.LocalLog.runOnce(LocalLog.java:404)
>   
> org.apache.cassandra.tcm.log.LocalLog.waitForHighestConsecutive(LocalLog.java:346)
>   
> org.apache.cassandra.tcm.PaxosBackedProcessor.fetchLogAndWait(PaxosBackedProcessor.java:163)
>   
> org.apache.cassandra.tcm.AbstractLocalProcessor.commit(AbstractLocalProcessor.java:109)
>   
> org.apache.cassandra.distributed.test.log.TestProcessor.commit(TestProcessor.java:61)
>   
> org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.commit(ClusterMetadataService.java:841)
>   org.apache.cassandra.tcm.Processor.commit(Processor.java:45)
>   
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:516)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl.lambda$updateFastPath$2(AccordFastPathCoordinator.java:208)
>   
> org.apache.cassandra.service.accord.AccordFastPathCoordinator$Impl$$Lambda$11211/0x000802441840.run(Unknown
>  Source)
> {code}
> Accord is trying to commit a new epoch, but TCM uses “awaitUninterruptibly” 
> which ignores the thread interrupt done while the cluster is shutting down.  
> When this is happening the instance is unable to make progress so loops 
> endlessly, causing the test to fail to close.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19764) Corruption can occur while a field is being added to UDT clustering key

2024-07-12 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17865475#comment-17865475
 ] 

Sam Tunnicliffe commented on CASSANDRA-19764:
-

bq. In other words, TCM makes it practically impossible to run into the 
situation this test is meant to exercise?

That's the intention, yes. 

> Corruption can occur while a field is being added to UDT clustering key
> ---
>
> Key: CASSANDRA-19764
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19764
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/UDT
>Reporter: Branimir Lambov
>Priority: Normal
>
> CASSANDRA-15938 made some improvements in how unknown components in UDTs are 
> treated. Unfortunately this can cause corruption as soon as more than one 
> value is inserted for a partition.
> The problem can be easily shown by modifying the 
> {{FrozenUDTTest.testDivergentSchema}} test to insert two entries in the wrong 
> order:
> {code:java}
> cluster.coordinator(1).execute("insert into " + KEYSPACE + ".x (id, ck, i) 
> VALUES (?, " + json(1, 2) + ", ? )", ConsistencyLevel.ALL,
> 1, 2);
> cluster.coordinator(1).execute("insert into " + KEYSPACE + ".x (id, ck, i) 
> VALUES (?, " + json(1, 1) + ", ? )", ConsistencyLevel.ALL,
> 1, 1);
> {code}
> after which we can get corrupted sstable state, shown as a
> {code:java}
> java.lang.AssertionError: Lower bound [SSTABLE_LOWER_BOUND(1) ]is bigger than 
> first returned value [Row: ck=1 | i=2]
> {code}
> exception, or results like {{[[1],[2],[2],[1]]}} or {{[[2],[1],[2]]}} for 
> {{select i from x WHERE id = 1}} depending on which node we use as 
> coordinator.
> Because we don't know the type of new fields and cannot properly order 
> entries, we need to outright reject UDT keys that are not compatible with a 
> replica's schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19764) Corruption can occur while a field is being added to UDT clustering key

2024-07-12 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17865462#comment-17865462
 ] 

Sam Tunnicliffe commented on CASSANDRA-19764:
-

Dropping {{TCM_REPLICATION}} prevents the schema change from being pushed to 
node2, but when it receives the subsequent mutation it detects that it is 
behind the coordinator and so catches up before executing the write:
{code:java}
{INFO  [node2_MutationStage-1] node2 2024-07-12 13:32:40,474 
PeerLogFetcher.java:91 - Fetching log from /127.0.0.1:7012, at least 
Epoch{epoch=10}
{code}
If we also block that catchup, the write will timeout at {{QUORUM}} 
consistency, as expected.

> Corruption can occur while a field is being added to UDT clustering key
> ---
>
> Key: CASSANDRA-19764
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19764
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/UDT
>Reporter: Branimir Lambov
>Priority: Normal
>
> CASSANDRA-15938 made some improvements in how unknown components in UDTs are 
> treated. Unfortunately this can cause corruption as soon as more than one 
> value is inserted for a partition.
> The problem can be easily shown by modifying the 
> {{FrozenUDTTest.testDivergentSchema}} test to insert two entries in the wrong 
> order:
> {code:java}
> cluster.coordinator(1).execute("insert into " + KEYSPACE + ".x (id, ck, i) 
> VALUES (?, " + json(1, 2) + ", ? )", ConsistencyLevel.ALL,
> 1, 2);
> cluster.coordinator(1).execute("insert into " + KEYSPACE + ".x (id, ck, i) 
> VALUES (?, " + json(1, 1) + ", ? )", ConsistencyLevel.ALL,
> 1, 1);
> {code}
> after which we can get corrupted sstable state, shown as a
> {code:java}
> java.lang.AssertionError: Lower bound [SSTABLE_LOWER_BOUND(1) ]is bigger than 
> first returned value [Row: ck=1 | i=2]
> {code}
> exception, or results like {{[[1],[2],[2],[1]]}} or {{[[2],[1],[2]]}} for 
> {{select i from x WHERE id = 1}} depending on which node we use as 
> coordinator.
> Because we don't know the type of new fields and cannot properly order 
> entries, we need to outright reject UDT keys that are not compatible with a 
> replica's schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19765) Remove accessibility to system_auth.roles salted_hash for non-superusers

2024-07-12 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17865355#comment-17865355
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-19765 at 7/12/24 8:20 AM:
--

{quote}I wonder what would be the motivation for a superuser to grant select on 
system_auth.roles to a nonsuperuser?
{quote}
Actually, if you were to {{GRANT SELECT ON ALL KEYSPACES TO role}} that would 
include {{{}system_auth{}}}. This is may not be expected, but it has always 
been the case. I have a patch that changes this, but I haven't had chance to 
file a Jira or start a discussion about it yet. 


was (Author: beobal):
{quote}I wonder what would be the motivation for a superuser to grant select on 
system_auth.roles to a nonsuperuser?
{quote}
Actually, if you were to {{GRANT SELECT ON ALL KEYSPACES TO role}} that would 
include {{{}sytem_auth{}}}. This is may not be expected, but it has always been 
the case. I have a patch that changes this, but I haven't had chance to file a 
Jira or start a discussion about it yet. 

> Remove accessibility to system_auth.roles salted_hash for non-superusers
> 
>
> Key: CASSANDRA-19765
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19765
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x
>
>
> Cassandra permits all users with SELECT on system_auth.roles to access 
> contents of the salted_hash column. This column contains a bcrypt hash, which 
> shouldn't be visible. This isn't a significant security risk at the current 
> time, but is prone to [retrospective 
> decryption|https://en.wikipedia.org/wiki/Harvest_now,_decrypt_later]. We 
> should protect this column so passwords cannot be cracked in the future.
>  
>  
> {code:java}
> $ ./bin/cqlsh -u cassandra -p cassandra
> [cqlsh 6.3.0 | Cassandra 5.1-SNAPSHOT | CQL spec 3.4.8 | Native protocol v5] 
> cassandra@cqlsh> CREATE ROLE nonsuperuser WITH LOGIN=true AND 
> PASSWORD='nonsuperuser';
> cassandra@cqlsh> GRANT SELECT ON system_auth.roles TO nonsuperuser;
> cassandra@cqlsh> exit;
> $ ./bin/cqlsh -u nonsuperuser -p nonsuperuser
> [cqlsh 6.3.0 | Cassandra 5.1-SNAPSHOT | CQL spec 3.4.8 | Native protocol v5] 
> nonsuperuser@cqlsh> SELECT * FROM system_auth.roles;
>  role         | can_login | is_superuser | member_of | salted_hash
> --+---+--+---+--
>     cassandra |      True |         True |      null | 
> $2a$10$WMg9UlR7F8Ko7LZxEyg0Ue12BoHR/Dn/0/3YtV4nRYCPcY7/5OmA6
>  nonsuperuser |      True |        False |      null | 
> $2a$10$HmHwVZRk8F904UUNMiUYi.xkVglWyKNgHMo1xJsCCKirwyb9NO/im
> (2 rows)
> {code}
>  
> Patches available:
> 3.0: 
> https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19765-salted_hash-visibility-30
> 3.11: 
> https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19765-salted_hash-visibility-311
> 4.0: 
> https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19765-salted_hash-visibility-40
> 4.1: 
> https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19765-salted_hash-visibility-41
> 5.0: 
> https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19765-salted_hash-visibility-50
> trunk: 
> https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19765-salted_hash-visibility-trunk



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19765) Remove accessibility to system_auth.roles salted_hash for non-superusers

2024-07-12 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17865355#comment-17865355
 ] 

Sam Tunnicliffe commented on CASSANDRA-19765:
-

{quote}I wonder what would be the motivation for a superuser to grant select on 
system_auth.roles to a nonsuperuser?
{quote}
Actually, if you were to {{GRANT SELECT ON ALL KEYSPACES TO role}} that would 
include {{{}sytem_auth{}}}. This is may not be expected, but it has always been 
the case. I have a patch that changes this, but I haven't had chance to file a 
Jira or start a discussion about it yet. 

> Remove accessibility to system_auth.roles salted_hash for non-superusers
> 
>
> Key: CASSANDRA-19765
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19765
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x
>
>
> Cassandra permits all users with SELECT on system_auth.roles to access 
> contents of the salted_hash column. This column contains a bcrypt hash, which 
> shouldn't be visible. This isn't a significant security risk at the current 
> time, but is prone to [retrospective 
> decryption|https://en.wikipedia.org/wiki/Harvest_now,_decrypt_later]. We 
> should protect this column so passwords cannot be cracked in the future.
>  
>  
> {code:java}
> $ ./bin/cqlsh -u cassandra -p cassandra
> [cqlsh 6.3.0 | Cassandra 5.1-SNAPSHOT | CQL spec 3.4.8 | Native protocol v5] 
> cassandra@cqlsh> CREATE ROLE nonsuperuser WITH LOGIN=true AND 
> PASSWORD='nonsuperuser';
> cassandra@cqlsh> GRANT SELECT ON system_auth.roles TO nonsuperuser;
> cassandra@cqlsh> exit;
> $ ./bin/cqlsh -u nonsuperuser -p nonsuperuser
> [cqlsh 6.3.0 | Cassandra 5.1-SNAPSHOT | CQL spec 3.4.8 | Native protocol v5] 
> nonsuperuser@cqlsh> SELECT * FROM system_auth.roles;
>  role         | can_login | is_superuser | member_of | salted_hash
> --+---+--+---+--
>     cassandra |      True |         True |      null | 
> $2a$10$WMg9UlR7F8Ko7LZxEyg0Ue12BoHR/Dn/0/3YtV4nRYCPcY7/5OmA6
>  nonsuperuser |      True |        False |      null | 
> $2a$10$HmHwVZRk8F904UUNMiUYi.xkVglWyKNgHMo1xJsCCKirwyb9NO/im
> (2 rows)
> {code}
>  
> Patches available:
> 3.0: 
> https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19765-salted_hash-visibility-30
> 3.11: 
> https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19765-salted_hash-visibility-311
> 4.0: 
> https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19765-salted_hash-visibility-40
> 4.1: 
> https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19765-salted_hash-visibility-41
> 5.0: 
> https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19765-salted_hash-visibility-50
> trunk: 
> https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19765-salted_hash-visibility-trunk



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19711) Ignore repair requests for system_cluster_metadata

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19711:

Fix Version/s: 5.x
   (was: 5.1-alpha1)

> Ignore repair requests for system_cluster_metadata
> --
>
> Key: CASSANDRA-19711
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19711
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
>
> Since system_cluster_metadata is not replicated like other keyspaces we might 
> break existing repair automation if a {{nodetool repair}} is run against a 
> node not in the CMS. Just ignore the request if so.
> https://github.com/krummas/cassandra/commit/76437723acea35421ec5bf0412dcdee1411dcb6e



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19713) Disallow denylisting keys in system_cluster_metadata

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19713:

  Fix Version/s: 5.x
Source Control Link: 
https://github.com/apache/cassandra/commit/82c00cc01ef4312d0d7eb7ca95c9368af75e7893
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Disallow denylisting keys in system_cluster_metadata
> 
>
> Key: CASSANDRA-19713
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19713
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> https://github.com/krummas/cassandra/commit/0435a9dbc382a428864b4b329e127882d9c18419



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19712) Fix gossip status after replacement

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19712:

Fix Version/s: 5.x
   (was: 5.1-alpha1)

> Fix gossip status after replacement
> ---
>
> Key: CASSANDRA-19712
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19712
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
>
> Make sure gossip status is correct for replacement node.
> https://github.com/krummas/cassandra/commit/2ed38a6273def17e6decbb8e74826b1995800d59



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19710) Avoid ClassCastException when verifying tables with reversed partitioner

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19710:

Fix Version/s: 5.x
   (was: 5.1-alpha1)

> Avoid ClassCastException when verifying tables with reversed partitioner
> 
>
> Key: CASSANDRA-19710
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19710
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
>
> A few TCM tables use a custom partitioner, this causes class cast exception 
> when running nodetool verify on them.
> https://github.com/krummas/cassandra/commit/64897cb6382967f3e134752f5b9f223ff7daeb84



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19712) Fix gossip status after replacement

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19712:

  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/51ef21b6bc43d1d2fa24ff362d0411e4e248b079
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Fix gossip status after replacement
> ---
>
> Key: CASSANDRA-19712
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19712
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> Make sure gossip status is correct for replacement node.
> https://github.com/krummas/cassandra/commit/2ed38a6273def17e6decbb8e74826b1995800d59



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19710) Avoid ClassCastException when verifying tables with reversed partitioner

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19710:

  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/27c1e56e43cafc8966878ff9c48b0e566c07e32b
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Avoid ClassCastException when verifying tables with reversed partitioner
> 
>
> Key: CASSANDRA-19710
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19710
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> A few TCM tables use a custom partitioner, this causes class cast exception 
> when running nodetool verify on them.
> https://github.com/krummas/cassandra/commit/64897cb6382967f3e134752f5b9f223ff7daeb84



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19714) Use table-specific partitioners during Paxos repair

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19714:

Fix Version/s: 5.x
   (was: 5.1-alpha1)

> Use table-specific partitioners during Paxos repair
> ---
>
> Key: CASSANDRA-19714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19714
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Partition keys in the \{{system.paxos}} table are derived from the key 
> involved in the paxos transaction. Initially, it was safe to assume that the 
> paxos table itself used the same partitioner as the tables in the 
> transactions as all distributed keyspaces and tables were configured with the 
> global partitioner. This is no longer true as the 
> \{{system_cluster_metadata.distributed_metadata_log}} has its own custom 
> partitioner. 
> Likewise, \{{PaxosRepairHistory}} and the \{{system.paxos_repair_history}} 
> table which makes that history durable map token ranges in the transacted 
> tables to ballots. Prior to CASSANDRA-19482 it was safe to assume that these 
> ranges contained tokens from the global partitioner but as this is no longer 
> the case, we must use the specific partitioner for the table in question when 
> working with ranges during paxos repair. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19711) Ignore repair requests for system_cluster_metadata

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19711:

  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/5f78bf65dc3d60622a24d4ff8b21404b39b0a930
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Ignore repair requests for system_cluster_metadata
> --
>
> Key: CASSANDRA-19711
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19711
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> Since system_cluster_metadata is not replicated like other keyspaces we might 
> break existing repair automation if a {{nodetool repair}} is run against a 
> node not in the CMS. Just ignore the request if so.
> https://github.com/krummas/cassandra/commit/76437723acea35421ec5bf0412dcdee1411dcb6e



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19714) Use table-specific partitioners during Paxos repair

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19714:

Source Control Link: 
https://github.com/apache/cassandra/commit/2c003710881860bde420d6a2dc1cb71e845bdb28
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Use table-specific partitioners during Paxos repair
> ---
>
> Key: CASSANDRA-19714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19714
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> Partition keys in the \{{system.paxos}} table are derived from the key 
> involved in the paxos transaction. Initially, it was safe to assume that the 
> paxos table itself used the same partitioner as the tables in the 
> transactions as all distributed keyspaces and tables were configured with the 
> global partitioner. This is no longer true as the 
> \{{system_cluster_metadata.distributed_metadata_log}} has its own custom 
> partitioner. 
> Likewise, \{{PaxosRepairHistory}} and the \{{system.paxos_repair_history}} 
> table which makes that history durable map token ranges in the transacted 
> tables to ballots. Prior to CASSANDRA-19482 it was safe to assume that these 
> ranges contained tokens from the global partitioner but as this is no longer 
> the case, we must use the specific partitioner for the table in question when 
> working with ranges during paxos repair. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19713) Disallow denylisting keys in system_cluster_metadata

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19713:

Status: Ready to Commit  (was: Review In Progress)

+1

> Disallow denylisting keys in system_cluster_metadata
> 
>
> Key: CASSANDRA-19713
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19713
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> https://github.com/krummas/cassandra/commit/0435a9dbc382a428864b4b329e127882d9c18419



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19711) Ignore repair requests for system_cluster_metadata

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19711:

Status: Ready to Commit  (was: Review In Progress)

+1

> Ignore repair requests for system_cluster_metadata
> --
>
> Key: CASSANDRA-19711
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19711
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> Since system_cluster_metadata is not replicated like other keyspaces we might 
> break existing repair automation if a {{nodetool repair}} is run against a 
> node not in the CMS. Just ignore the request if so.
> https://github.com/krummas/cassandra/commit/76437723acea35421ec5bf0412dcdee1411dcb6e



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19714) Use table-specific partitioners during Paxos repair

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19714:

Status: Ready to Commit  (was: Review In Progress)

> Use table-specific partitioners during Paxos repair
> ---
>
> Key: CASSANDRA-19714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19714
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> Partition keys in the \{{system.paxos}} table are derived from the key 
> involved in the paxos transaction. Initially, it was safe to assume that the 
> paxos table itself used the same partitioner as the tables in the 
> transactions as all distributed keyspaces and tables were configured with the 
> global partitioner. This is no longer true as the 
> \{{system_cluster_metadata.distributed_metadata_log}} has its own custom 
> partitioner. 
> Likewise, \{{PaxosRepairHistory}} and the \{{system.paxos_repair_history}} 
> table which makes that history durable map token ranges in the transacted 
> tables to ballots. Prior to CASSANDRA-19482 it was safe to assume that these 
> ranges contained tokens from the global partitioner but as this is no longer 
> the case, we must use the specific partitioner for the table in question when 
> working with ranges during paxos repair. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19712) Fix gossip status after replacement

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19712:

Status: Ready to Commit  (was: Review In Progress)

+1

> Fix gossip status after replacement
> ---
>
> Key: CASSANDRA-19712
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19712
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> Make sure gossip status is correct for replacement node.
> https://github.com/krummas/cassandra/commit/2ed38a6273def17e6decbb8e74826b1995800d59



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19710) Avoid ClassCastException when verifying tables with reversed partitioner

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19710:

Status: Ready to Commit  (was: Review In Progress)

+1

> Avoid ClassCastException when verifying tables with reversed partitioner
> 
>
> Key: CASSANDRA-19710
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19710
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> A few TCM tables use a custom partitioner, this causes class cast exception 
> when running nodetool verify on them.
> https://github.com/krummas/cassandra/commit/64897cb6382967f3e134752f5b9f223ff7daeb84



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19713) Disallow denylisting keys in system_cluster_metadata

2024-07-10 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864704#comment-17864704
 ] 

Sam Tunnicliffe commented on CASSANDRA-19713:
-

Rebased and re-ran CI. The only failures (4) are seen already in trunk: 
\{{largecolumn_test.py::TestLargeColumn::test_cleanup}}, plus a few timeouts in 
the jvm-dtests on j17 (CASSANDRA-19239)

> Disallow denylisting keys in system_cluster_metadata
> 
>
> Key: CASSANDRA-19713
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19713
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> https://github.com/krummas/cassandra/commit/0435a9dbc382a428864b4b329e127882d9c18419



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19713) Disallow denylisting keys in system_cluster_metadata

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19713:

Attachment: ci_summary-1.html

> Disallow denylisting keys in system_cluster_metadata
> 
>
> Key: CASSANDRA-19713
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19713
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> https://github.com/krummas/cassandra/commit/0435a9dbc382a428864b4b329e127882d9c18419



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19710) Avoid ClassCastException when verifying tables with reversed partitioner

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19710:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Avoid ClassCastException when verifying tables with reversed partitioner
> 
>
> Key: CASSANDRA-19710
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19710
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> A few TCM tables use a custom partitioner, this causes class cast exception 
> when running nodetool verify on them.
> https://github.com/krummas/cassandra/commit/64897cb6382967f3e134752f5b9f223ff7daeb84



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19710) Avoid ClassCastException when verifying tables with reversed partitioner

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19710:

Test and Documentation Plan: new and existing tests
 Status: Patch Available  (was: Open)

> Avoid ClassCastException when verifying tables with reversed partitioner
> 
>
> Key: CASSANDRA-19710
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19710
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> A few TCM tables use a custom partitioner, this causes class cast exception 
> when running nodetool verify on them.
> https://github.com/krummas/cassandra/commit/64897cb6382967f3e134752f5b9f223ff7daeb84



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19711) Ignore repair requests for system_cluster_metadata

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19711:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Ignore repair requests for system_cluster_metadata
> --
>
> Key: CASSANDRA-19711
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19711
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> Since system_cluster_metadata is not replicated like other keyspaces we might 
> break existing repair automation if a {{nodetool repair}} is run against a 
> node not in the CMS. Just ignore the request if so.
> https://github.com/krummas/cassandra/commit/76437723acea35421ec5bf0412dcdee1411dcb6e



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19712) Fix gossip status after replacement

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19712:

Test and Documentation Plan: new and existing tests
 Status: Patch Available  (was: Open)

> Fix gossip status after replacement
> ---
>
> Key: CASSANDRA-19712
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19712
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> Make sure gossip status is correct for replacement node.
> https://github.com/krummas/cassandra/commit/2ed38a6273def17e6decbb8e74826b1995800d59



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19711) Ignore repair requests for system_cluster_metadata

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19711:

Test and Documentation Plan: new and existing tests
 Status: Patch Available  (was: Open)

> Ignore repair requests for system_cluster_metadata
> --
>
> Key: CASSANDRA-19711
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19711
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> Since system_cluster_metadata is not replicated like other keyspaces we might 
> break existing repair automation if a {{nodetool repair}} is run against a 
> node not in the CMS. Just ignore the request if so.
> https://github.com/krummas/cassandra/commit/76437723acea35421ec5bf0412dcdee1411dcb6e



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19712) Fix gossip status after replacement

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19712:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Fix gossip status after replacement
> ---
>
> Key: CASSANDRA-19712
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19712
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> Make sure gossip status is correct for replacement node.
> https://github.com/krummas/cassandra/commit/2ed38a6273def17e6decbb8e74826b1995800d59



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19714) Use table-specific partitioners during Paxos repair

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19714:

Status: Review In Progress  (was: Patch Available)

> Use table-specific partitioners during Paxos repair
> ---
>
> Key: CASSANDRA-19714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19714
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> Partition keys in the \{{system.paxos}} table are derived from the key 
> involved in the paxos transaction. Initially, it was safe to assume that the 
> paxos table itself used the same partitioner as the tables in the 
> transactions as all distributed keyspaces and tables were configured with the 
> global partitioner. This is no longer true as the 
> \{{system_cluster_metadata.distributed_metadata_log}} has its own custom 
> partitioner. 
> Likewise, \{{PaxosRepairHistory}} and the \{{system.paxos_repair_history}} 
> table which makes that history durable map token ranges in the transacted 
> tables to ballots. Prior to CASSANDRA-19482 it was safe to assume that these 
> ranges contained tokens from the global partitioner but as this is no longer 
> the case, we must use the specific partitioner for the table in question when 
> working with ranges during paxos repair. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19713) Disallow denylisting keys in system_cluster_metadata

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19713:

Status: Review In Progress  (was: Patch Available)

> Disallow denylisting keys in system_cluster_metadata
> 
>
> Key: CASSANDRA-19713
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19713
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>
> https://github.com/krummas/cassandra/commit/0435a9dbc382a428864b4b329e127882d9c18419



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19714) Use table-specific partitioners during Paxos repair

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19714:

Test and Documentation Plan: new and existing tests
 Status: Patch Available  (was: Open)

> Use table-specific partitioners during Paxos repair
> ---
>
> Key: CASSANDRA-19714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19714
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-alpha1
>
>
> Partition keys in the \{{system.paxos}} table are derived from the key 
> involved in the paxos transaction. Initially, it was safe to assume that the 
> paxos table itself used the same partitioner as the tables in the 
> transactions as all distributed keyspaces and tables were configured with the 
> global partitioner. This is no longer true as the 
> \{{system_cluster_metadata.distributed_metadata_log}} has its own custom 
> partitioner. 
> Likewise, \{{PaxosRepairHistory}} and the \{{system.paxos_repair_history}} 
> table which makes that history durable map token ranges in the transacted 
> tables to ballots. Prior to CASSANDRA-19482 it was safe to assume that these 
> ranges contained tokens from the global partitioner but as this is no longer 
> the case, we must use the specific partitioner for the table in question when 
> working with ranges during paxos repair. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19713) Disallow denylisting keys in system_cluster_metadata

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19713:

Test and Documentation Plan: new and existing tests
 Status: Patch Available  (was: Open)

> Disallow denylisting keys in system_cluster_metadata
> 
>
> Key: CASSANDRA-19713
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19713
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>
> https://github.com/krummas/cassandra/commit/0435a9dbc382a428864b4b329e127882d9c18419



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19753) Not getting responses with concurrent stream IDs in native protocol v5

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19753:

Resolution: Not A Problem
Status: Resolved  (was: Open)

No problem [~whatyouhide], always grateful to client implementers, especially 
for the v5 support.

> Not getting responses with concurrent stream IDs in native protocol v5
> --
>
> Key: CASSANDRA-19753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19753
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Andrea Leopardi
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Attachments: xandra.log
>
>
> This is not gonna be an easy bug to report or to give a great set of repro 
> steps for, so apologies in advance. I’m one of the authors and the maintainer 
> of [Xandra|https://github.com/whatyouhide/xandra], the Cassandra client for 
> Elixir.
> We noticed an issue with request timeouts in a new version of our client. 
> Just for reference, the issue is [this 
> one|https://github.com/whatyouhide/xandra/issues/356].
> After some debugging, we figured out that the issue was limited to *native 
> protocol v5*. With native protocol v5, the issue shows up in C* 4.1 and 5.0. 
> With native protocol v4, those versions (4.1 and 5.0) both work fine. I'm 
> running C* in a Docker container, but I've had folks reproduce this with all 
> sorts of C* setups.
> h2. The Issue
> The new version of our client in question uses concurrent requests. We assign 
> each request a sequential stream ID ({{1}}, {{2}}, ...). We behave in a 
> compliant way with [section 2.4.1.3. of the native protocol v5 
> spec|https://github.com/apache/cassandra/blob/e7cf38b5de6f804ce121e7a676576135db0c4bb1/doc/native_protocol_v5.spec#L316C1-L316C9]—to
>  the best of my knowledge.
> Now, it seems like C* does not respond do all requests this way. We have a 
> [simple test|https://github.com/whatyouhide/xandra/pull/368] in our repo that 
> reproduces this. It just issues two requests in parallel (with stream IDs 
> {{1}} and {{2}}) and then keeps issuing requests as soon as there are 
> responses. Almost 100% of the times, we don't get the response on at least 
> one stream. I've also attached some debug logs that show this in case it can 
> be helpful (from the client perspective). The {{<<56, 0, 2, 67, 161, ...>>}} 
> syntax is Erlang's syntax for bytestrings, where each number is the decimal 
> value for a single byte. You can see in the logs that we never get the 
> response frame on stream ID 1. Sometimes it's stream ID 2, or 3, or whatever.
> I’m pretty short on what to do next on our end. I’ve tried shuffling around 
> the socket buffer size as well (from {{10}} bytes to {{100}} bytes) to 
> get the packets to split up in all sorts of places, but everything works as 
> expected _except_ for the requests that are not coming out of C*.
> Any other help is appreciated here, but I've started to suspect this might be 
> something with C*. It could totally not be, but I figured it was worth to 
> post out here.
> Thank you all in advance folks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19755) Coordinator read latency metrics are inflated for some queries

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19755:

  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/1cd0b382143ec56118105a6ed991c0803f400b18
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed to 4.1 and merged to 5.0 & trunk, thanks.

> Coordinator read latency metrics are inflated for some queries
> --
>
> Key: CASSANDRA-19755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19755
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
> Attachments: ci_summary-4.1.html, ci_summary-5.0.html, 
> ci_summary-trunk.html
>
>
> When a partition read is decomposed on the coordinator into multiple single 
> partition read queries, the latency metric captured in StorageProxy can be 
> artificially increased.
> This primarily affects reads where paging and aggregates are used or where an 
> IN clause selects multiple partition keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19755) Coordinator read latency metrics are inflated for some queries

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19755:

Reviewers: Marcus Eriksson
   Status: Review In Progress  (was: Needs Committer)

> Coordinator read latency metrics are inflated for some queries
> --
>
> Key: CASSANDRA-19755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19755
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
> Attachments: ci_summary-4.1.html, ci_summary-5.0.html, 
> ci_summary-trunk.html
>
>
> When a partition read is decomposed on the coordinator into multiple single 
> partition read queries, the latency metric captured in StorageProxy can be 
> artificially increased.
> This primarily affects reads where paging and aggregates are used or where an 
> IN clause selects multiple partition keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19755) Coordinator read latency metrics are inflated for some queries

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19755:

Status: Needs Committer  (was: Patch Available)

> Coordinator read latency metrics are inflated for some queries
> --
>
> Key: CASSANDRA-19755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19755
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
> Attachments: ci_summary-4.1.html, ci_summary-5.0.html, 
> ci_summary-trunk.html
>
>
> When a partition read is decomposed on the coordinator into multiple single 
> partition read queries, the latency metric captured in StorageProxy can be 
> artificially increased.
> This primarily affects reads where paging and aggregates are used or where an 
> IN clause selects multiple partition keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19755) Coordinator read latency metrics are inflated for some queries

2024-07-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19755:

Status: Ready to Commit  (was: Review In Progress)

> Coordinator read latency metrics are inflated for some queries
> --
>
> Key: CASSANDRA-19755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19755
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
> Attachments: ci_summary-4.1.html, ci_summary-5.0.html, 
> ci_summary-trunk.html
>
>
> When a partition read is decomposed on the coordinator into multiple single 
> partition read queries, the latency metric captured in StorageProxy can be 
> artificially increased.
> This primarily affects reads where paging and aggregates are used or where an 
> IN clause selects multiple partition keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19753) Not getting responses with concurrent stream IDs in native protocol v5

2024-07-09 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864325#comment-17864325
 ] 

Sam Tunnicliffe commented on CASSANDRA-19753:
-

Hi [~whatyouhide] 

from the attached log output, it looks like the client isn't correctly decoding 
V5 frames which contain multiple envelopes. In this case, the first response 
from the server contains 2 envelopes; first the response for Stream 2, followed 
by the response for Stream 1. The client only seems to process the first 
envelope in the frame, where it should continue consuming envelopes until the 
frame body is exhausted.

As an illustration, I've extracted the bytes from the log you attached and fed 
them into the java driver (using version 3.x as that's the implementation I'm 
more familiar with) and wrapped this up in a [unit 
test|https://github.com/beobal/java-driver/commit/c9e07151fda3007b00555175445078e97f870105].

> Not getting responses with concurrent stream IDs in native protocol v5
> --
>
> Key: CASSANDRA-19753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19753
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Andrea Leopardi
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Attachments: xandra.log
>
>
> This is not gonna be an easy bug to report or to give a great set of repro 
> steps for, so apologies in advance. I’m one of the authors and the maintainer 
> of [Xandra|https://github.com/whatyouhide/xandra], the Cassandra client for 
> Elixir.
> We noticed an issue with request timeouts in a new version of our client. 
> Just for reference, the issue is [this 
> one|https://github.com/whatyouhide/xandra/issues/356].
> After some debugging, we figured out that the issue was limited to *native 
> protocol v5*. With native protocol v5, the issue shows up in C* 4.1 and 5.0. 
> With native protocol v4, those versions (4.1 and 5.0) both work fine. I'm 
> running C* in a Docker container, but I've had folks reproduce this with all 
> sorts of C* setups.
> h2. The Issue
> The new version of our client in question uses concurrent requests. We assign 
> each request a sequential stream ID ({{1}}, {{2}}, ...). We behave in a 
> compliant way with [section 2.4.1.3. of the native protocol v5 
> spec|https://github.com/apache/cassandra/blob/e7cf38b5de6f804ce121e7a676576135db0c4bb1/doc/native_protocol_v5.spec#L316C1-L316C9]—to
>  the best of my knowledge.
> Now, it seems like C* does not respond do all requests this way. We have a 
> [simple test|https://github.com/whatyouhide/xandra/pull/368] in our repo that 
> reproduces this. It just issues two requests in parallel (with stream IDs 
> {{1}} and {{2}}) and then keeps issuing requests as soon as there are 
> responses. Almost 100% of the times, we don't get the response on at least 
> one stream. I've also attached some debug logs that show this in case it can 
> be helpful (from the client perspective). The {{<<56, 0, 2, 67, 161, ...>>}} 
> syntax is Erlang's syntax for bytestrings, where each number is the decimal 
> value for a single byte. You can see in the logs that we never get the 
> response frame on stream ID 1. Sometimes it's stream ID 2, or 3, or whatever.
> I’m pretty short on what to do next on our end. I’ve tried shuffling around 
> the socket buffer size as well (from {{10}} bytes to {{100}} bytes) to 
> get the packets to split up in all sorts of places, but everything works as 
> expected _except_ for the requests that are not coming out of C*.
> Any other help is appreciated here, but I've started to suspect this might be 
> something with C*. It could totally not be, but I figured it was worth to 
> post out here.
> Thank you all in advance folks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19753) Not getting responses with concurrent stream IDs in native protocol v5

2024-07-09 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-19753:
---

Assignee: Sam Tunnicliffe

> Not getting responses with concurrent stream IDs in native protocol v5
> --
>
> Key: CASSANDRA-19753
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19753
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Andrea Leopardi
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Attachments: xandra.log
>
>
> This is not gonna be an easy bug to report or to give a great set of repro 
> steps for, so apologies in advance. I’m one of the authors and the maintainer 
> of [Xandra|https://github.com/whatyouhide/xandra], the Cassandra client for 
> Elixir.
> We noticed an issue with request timeouts in a new version of our client. 
> Just for reference, the issue is [this 
> one|https://github.com/whatyouhide/xandra/issues/356].
> After some debugging, we figured out that the issue was limited to *native 
> protocol v5*. With native protocol v5, the issue shows up in C* 4.1 and 5.0. 
> With native protocol v4, those versions (4.1 and 5.0) both work fine. I'm 
> running C* in a Docker container, but I've had folks reproduce this with all 
> sorts of C* setups.
> h2. The Issue
> The new version of our client in question uses concurrent requests. We assign 
> each request a sequential stream ID ({{1}}, {{2}}, ...). We behave in a 
> compliant way with [section 2.4.1.3. of the native protocol v5 
> spec|https://github.com/apache/cassandra/blob/e7cf38b5de6f804ce121e7a676576135db0c4bb1/doc/native_protocol_v5.spec#L316C1-L316C9]—to
>  the best of my knowledge.
> Now, it seems like C* does not respond do all requests this way. We have a 
> [simple test|https://github.com/whatyouhide/xandra/pull/368] in our repo that 
> reproduces this. It just issues two requests in parallel (with stream IDs 
> {{1}} and {{2}}) and then keeps issuing requests as soon as there are 
> responses. Almost 100% of the times, we don't get the response on at least 
> one stream. I've also attached some debug logs that show this in case it can 
> be helpful (from the client perspective). The {{<<56, 0, 2, 67, 161, ...>>}} 
> syntax is Erlang's syntax for bytestrings, where each number is the decimal 
> value for a single byte. You can see in the logs that we never get the 
> response frame on stream ID 1. Sometimes it's stream ID 2, or 3, or whatever.
> I’m pretty short on what to do next on our end. I’ve tried shuffling around 
> the socket buffer size as well (from {{10}} bytes to {{100}} bytes) to 
> get the packets to split up in all sorts of places, but everything works as 
> expected _except_ for the requests that are not coming out of C*.
> Any other help is appreciated here, but I've started to suspect this might be 
> something with C*. It could totally not be, but I figured it was worth to 
> post out here.
> Thank you all in advance folks! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19755) Coordinator read latency metrics are inflated for some queries

2024-07-05 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863392#comment-17863392
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-19755 at 7/5/24 5:50 PM:
-

The issue is with sharing the same start time 
({{RequestTime.startedAtNanos()}}) for the timing of each individual read 
query, which was introduced by CASSANDRA-19534.
This results in accumulation so that the latency reported for query _n_ 
includes the latency of _(n-1)_ + _(n-2)_ + _(n-3)_ etc.
The fix is trivial and doesn't affect the other aspects of CASSANDRA-19534.

|4.1 patch:|[https://github.com/apache/cassandra/pull/3404]|
|5.0 patch:|[https://github.com/apache/cassandra/pull/3405]|
|trunk patch:|[https://github.com/apache/cassandra/pull/3406]|

There are 3 failures in the 4.1 CI results, but all of them are present without 
this patch. Two are already tracked (CASSANDRA-17298, CASSANDRA-19702), but I 
couldn't find an open JIRA for the {{ReadRepairTest}} dtest failure.
The jvm17 dtest timeouts on trunk are most likely down to CASSANDRA-19239, for 
which a fix is coming soon.



was (Author: beobal):
The issue is with sharing the same start time 
({{RequestTime.startedAtNanos()}}) for the timing of each individual read 
query, which was introduced by CASSANDRA-19534.
This results in accumulation so that the latency reported for query _n_ 
includes the latency of _(n-1)_ + _(n-2)_ + _(n-3)_ etc.
The fix is trivial and doesn't affect the other aspects of CASSANDRA-19534.

|4.1 patch:|[https://github.com/apache/cassandra/pull/3404]|
|5.0 patch:|[https://github.com/apache/cassandra/pull/3405]|
|trunk patch:|[https://github.com/apache/cassandra/pull/3406]|

There are 3 failures in the 4.1 CI results, but all of them are present without 
this patch. Two are already tracked (CASSANDRA-17928, CASSANDRA-19702), but I 
couldn't find an open JIRA for the {{ReadRepairTest}} dtest failure.
The jvm17 dtest timeouts on trunk are most likely down to CASSANDRA-19239, for 
which a fix is coming soon.


> Coordinator read latency metrics are inflated for some queries
> --
>
> Key: CASSANDRA-19755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19755
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
> Attachments: ci_summary-4.1.html, ci_summary-5.0.html, 
> ci_summary-trunk.html
>
>
> When a partition read is decomposed on the coordinator into multiple single 
> partition read queries, the latency metric captured in StorageProxy can be 
> artificially increased.
> This primarily affects reads where paging and aggregates are used or where an 
> IN clause selects multiple partition keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19755) Coordinator read latency metrics are inflated for some queries

2024-07-05 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19755:

Attachment: ci_summary-4.1.html
ci_summary-5.0.html
ci_summary-trunk.html

> Coordinator read latency metrics are inflated for some queries
> --
>
> Key: CASSANDRA-19755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19755
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
> Attachments: ci_summary-4.1.html, ci_summary-5.0.html, 
> ci_summary-trunk.html
>
>
> When a partition read is decomposed on the coordinator into multiple single 
> partition read queries, the latency metric captured in StorageProxy can be 
> artificially increased.
> This primarily affects reads where paging and aggregates are used or where an 
> IN clause selects multiple partition keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19755) Coordinator read latency metrics are inflated for some queries

2024-07-05 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19755:

Test and Documentation Plan: new and existing tests
 Status: Patch Available  (was: Open)

The issue is with sharing the same start time 
({{RequestTime.startedAtNanos()}}) for the timing of each individual read 
query, which was introduced by CASSANDRA-19534.
This results in accumulation so that the latency reported for query _n_ 
includes the latency of _(n-1)_ + _(n-2)_ + _(n-3)_ etc.
The fix is trivial and doesn't affect the other aspects of CASSANDRA-19534.

|4.1 patch:|[https://github.com/apache/cassandra/pull/3404]|
|5.0 patch:|[https://github.com/apache/cassandra/pull/3405]|
|trunk patch:|[https://github.com/apache/cassandra/pull/3406]|

There are 3 failures in the 4.1 CI results, but all of them are present without 
this patch. Two are already tracked (CASSANDRA-17928, CASSANDRA-19702), but I 
couldn't find an open JIRA for the {{ReadRepairTest}} dtest failure.
The jvm17 dtest timeouts on trunk are most likely down to CASSANDRA-19239, for 
which a fix is coming soon.


> Coordinator read latency metrics are inflated for some queries
> --
>
> Key: CASSANDRA-19755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19755
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
>
> When a partition read is decomposed on the coordinator into multiple single 
> partition read queries, the latency metric captured in StorageProxy can be 
> artificially increased.
> This primarily affects reads where paging and aggregates are used or where an 
> IN clause selects multiple partition keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19755) Coordinator read latency metrics are inflated for some queries

2024-07-05 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19755:

Fix Version/s: 5.0-rc
   (was: 5.0.x)

> Coordinator read latency metrics are inflated for some queries
> --
>
> Key: CASSANDRA-19755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19755
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
>
> When a partition read is decomposed on the coordinator into multiple single 
> partition read queries, the latency metric captured in StorageProxy can be 
> artificially increased.
> This primarily affects reads where paging and aggregates are used or where an 
> IN clause selects multiple partition keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19755) Coordinator read latency metrics are inflated for some queries

2024-07-05 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19755:

 Bug Category: Parent values: Correctness(12982)
   Complexity: Normal
Discovered By: User Report
Fix Version/s: 4.1.x
   5.0.x
 Severity: Normal
 Assignee: Sam Tunnicliffe
   Status: Open  (was: Triage Needed)

> Coordinator read latency metrics are inflated for some queries
> --
>
> Key: CASSANDRA-19755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19755
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.1.x, 5.0.x
>
>
> When a partition read is decomposed on the coordinator into multiple single 
> partition read queries, the latency metric captured in StorageProxy can be 
> artificially increased.
> This primarily affects reads where paging and aggregates are used or where an 
> IN clause selects multiple partition keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19755) Coordinator read latency metrics are inflated for some queries

2024-07-05 Thread Sam Tunnicliffe (Jira)
Sam Tunnicliffe created CASSANDRA-19755:
---

 Summary: Coordinator read latency metrics are inflated for some 
queries
 Key: CASSANDRA-19755
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19755
 Project: Cassandra
  Issue Type: Bug
  Components: Observability/Metrics
Reporter: Sam Tunnicliffe


When a partition read is decomposed on the coordinator into multiple single 
partition read queries, the latency metric captured in StorageProxy can be 
artificially increased.
This primarily affects reads where paging and aggregates are used or where an 
IN clause selects multiple partition keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19130) Implement transactional table truncation

2024-07-04 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863056#comment-17863056
 ] 

Sam Tunnicliffe commented on CASSANDRA-19130:
-

To be honest, I would simply not permit truncations in a mixed version cluster. 
We already don't allow schema or topology changes in this scenario, to the 
extent that if the request is received by an upgraded node, it will reject it 
if all other nodes are not yet in a compatible state. The caveat is that if the 
coordinator is still running the previous version, there is no means to reject 
in the same way - hence discussion{^}[1][2]{^} about adding an operator control 
to do this manually in a 5.0.x and advising that upgrades go through that 
version.  

[1] 
[https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-MigrationPlan]
[2] 
https://issues.apache.org/jira/browse/CASSANDRA-19556?focusedCommentId=17848544=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17848544

> Implement transactional table truncation
> 
>
> Key: CASSANDRA-19130
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19130
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Consistency/Coordination
>Reporter: Marcus Eriksson
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TRUNCATE table should leverage cluster metadata to ensure consistent 
> truncation timestamps across all replicas. The current implementation depends 
> on all nodes being available, but this could be reimplemented as a 
> {{Transformation}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19735) Cannot correctly create keyspace statement with replication during schemaChange

2024-07-02 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19735:

Resolution: Not A Problem
Status: Resolved  (was: Triage Needed)

This is failing because the dtest framework has already created that keyspace 
(see {{DistributedTestBase::init}}). 


> Cannot correctly create keyspace statement with replication during 
> schemaChange
> ---
>
> Key: CASSANDRA-19735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19735
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: ConfX
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> h3. What happened
> A specific schema change for creating keyspace with replications failed 
> during Cassandra upgrade testing, but can pass under Cassandra distributed 
> testing (non-upgrade).
> h3. How to reproduce:
> Put the following test under 
> {{{}cassandra/test/distributed/org/apache/cassandra/distributed/upgrade/{}}}, 
> and build dtest jars for any versions within [4.1.3, 5.0-alpha2].
> {code:java}
> package org.apache.cassandra.distributed.upgrade;
> public class demoUpgradeTest extends UpgradeTestBase
>     @Test
>     public void demoTest() throws Throwable {
>         new TestCase()
>                 .nodes(1)
>                 .nodesToUpgrade(1)
>                 .withConfig(config -> config.with(NETWORK, GOSSIP, 
> NATIVE_PROTOCOL))
>                 .upgradesToCurrentFrom(v41)
>                 .setup((cluster) -> {
>                     cluster.schemaChange(withKeyspace("CREATE KEYSPACE %s 
> WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2}"));
>                 }).runAfterNodeUpgrade((cluster, node) -> {
>                     // let's do nothing here.
>                 }).run();
>     }
> } {code}
> Run the test with
> {code:java}
> $ ant test-jvm-dtest-some-Duse.jdk11=true 
> -Dtest.name=org.apache.cassandra.distributed.upgrade.demoUpgradeTest {code}
> You will see the following failure:
> {code:java}
> [junit-timeout] Testcase: 
> demoTest(org.apache.cassandra.distributed.upgrade.demoUpgradeTest)-_jdk11:    
> Caused an ERROR
> [junit-timeout] Cannot add existing keyspace "distributed_test_keyspace"
> [junit-timeout] org.apache.cassandra.exceptions.AlreadyExistsException: 
> Cannot add existing keyspace "distributed_test_keyspace"
> [junit-timeout]     at 
> org.apache.cassandra.cql3.statements.schema.CreateKeyspaceStatement.apply(CreateKeyspaceStatement.java:78)
> [junit-timeout]     at 
> org.apache.cassandra.schema.DefaultSchemaUpdateHandler.apply(DefaultSchemaUpdateHandler.java:230)
> [junit-timeout]     at 
> org.apache.cassandra.schema.Schema.transform(Schema.java:597)
> [junit-timeout]     at 
> org.apache.cassandra.cql3.statements.schema.AlterSchemaStatement.execute(AlterSchemaStatement.java:114)
> [junit-timeout]     at 
> org.apache.cassandra.cql3.statements.schema.AlterSchemaStatement.execute(AlterSchemaStatement.java:60)
> [junit-timeout]     at 
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:122)
> [junit-timeout]     at 
> org.apache.cassandra.distributed.impl.Coordinator.unsafeExecuteInternal(Coordinator.java:103)
> [junit-timeout]     at 
> org.apache.cassandra.distributed.impl.Coordinator.lambda$executeWithResult$0(Coordinator.java:66)
> [junit-timeout]     at 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
> [junit-timeout]     at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
> [junit-timeout]     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [junit-timeout]     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [junit-timeout]     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> [junit-timeout]     at java.base/java.lang.Thread.run(Thread.java:829) {code}
> I have tested version pairs 4.1.3_4.1.4, 4.1.4_4.1.5, 4.1.5_5.0-alpha1, and 
> 5.0-alpha1_5.0-alpha2. All of them have the same issue.
> I wrote a very similar test with Cassandra distributed test framework 
> (non-upgrade test) as below:
> {code:java}
> package org.apache.cassandra.distributed.test.streaming;public class 
> LCSStreamingKeepLevelTest extends TestBaseImpl
> {
>     @Test
>     public void demoTest() throws IOException
>     {
>         try (Cluster cluster = builder().withNodes(1)
>                 .withConfig(config -> config.with(NETWORK, GOSSIP, 
> NATIVE_PROTOCOL))
>                 .start())
>         {
>             cluster.schemaChange(withKeyspace("CREATE KEYSPACE %s WITH 
> replication = {'class': 'SimpleStrategy', 'replication_factor': 2}"));
>    

[jira] [Commented] (CASSANDRA-19705) Reconfigure CMS after move/bootstrap/replacement

2024-06-25 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859836#comment-17859836
 ] 

Sam Tunnicliffe commented on CASSANDRA-19705:
-

+1

> Reconfigure CMS after move/bootstrap/replacement
> 
>
> Key: CASSANDRA-19705
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19705
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The CMS placement uses SimpleStrategy/NTS to decide where it is placed to 
> make it easier to safely bounce a cluster using existing tools (with CMS 
> placement {{dc1: 3, dc2: 3}} we will   use the placements for min_token in a 
> NetworkTopologyStrategy with the same replication setting). 
> We need to reconfigure this after move/bootstrap/replacement though, since 
> the placements might have changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19705) Reconfigure CMS after move/bootstrap/replacement

2024-06-25 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19705:

Status: Ready to Commit  (was: Review In Progress)

Thanks, the updated PR and CI look good!

> Reconfigure CMS after move/bootstrap/replacement
> 
>
> Key: CASSANDRA-19705
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19705
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The CMS placement uses SimpleStrategy/NTS to decide where it is placed to 
> make it easier to safely bounce a cluster using existing tools (with CMS 
> placement {{dc1: 3, dc2: 3}} we will   use the placements for min_token in a 
> NetworkTopologyStrategy with the same replication setting). 
> We need to reconfigure this after move/bootstrap/replacement though, since 
> the placements might have changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19705) Reconfigure CMS after move/bootstrap/replacement

2024-06-20 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19705:

Status: Review In Progress  (was: Patch Available)

> Reconfigure CMS after move/bootstrap/replacement
> 
>
> Key: CASSANDRA-19705
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19705
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The CMS placement uses SimpleStrategy/NTS to decide where it is placed to 
> make it easier to safely bounce a cluster using existing tools (with CMS 
> placement {{dc1: 3, dc2: 3}} we will   use the placements for min_token in a 
> NetworkTopologyStrategy with the same replication setting). 
> We need to reconfigure this after move/bootstrap/replacement though, since 
> the placements might have changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19705) Reconfigure CMS after move/bootstrap/replacement

2024-06-20 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856495#comment-17856495
 ] 

Sam Tunnicliffe commented on CASSANDRA-19705:
-

Made a suggestion on the PR, but looks good generally.

> Reconfigure CMS after move/bootstrap/replacement
> 
>
> Key: CASSANDRA-19705
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19705
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The CMS placement uses SimpleStrategy/NTS to decide where it is placed to 
> make it easier to safely bounce a cluster using existing tools (with CMS 
> placement {{dc1: 3, dc2: 3}} we will   use the placements for min_token in a 
> NetworkTopologyStrategy with the same replication setting). 
> We need to reconfigure this after move/bootstrap/replacement though, since 
> the placements might have changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19705) Reconfigure CMS after move/bootstrap/replacement

2024-06-20 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19705:

Test and Documentation Plan: New tests added
 Status: Patch Available  (was: Open)

> Reconfigure CMS after move/bootstrap/replacement
> 
>
> Key: CASSANDRA-19705
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19705
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The CMS placement uses SimpleStrategy/NTS to decide where it is placed to 
> make it easier to safely bounce a cluster using existing tools (with CMS 
> placement {{dc1: 3, dc2: 3}} we will   use the placements for min_token in a 
> NetworkTopologyStrategy with the same replication setting). 
> We need to reconfigure this after move/bootstrap/replacement though, since 
> the placements might have changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19705) Reconfigure CMS after move/bootstrap/replacement

2024-06-20 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19705:

Change Category: Semantic
 Complexity: Normal
  Fix Version/s: 5.x
  Reviewers: Sam Tunnicliffe
 Status: Open  (was: Triage Needed)

> Reconfigure CMS after move/bootstrap/replacement
> 
>
> Key: CASSANDRA-19705
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19705
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The CMS placement uses SimpleStrategy/NTS to decide where it is placed to 
> make it easier to safely bounce a cluster using existing tools (with CMS 
> placement {{dc1: 3, dc2: 3}} we will   use the placements for min_token in a 
> NetworkTopologyStrategy with the same replication setting). 
> We need to reconfigure this after move/bootstrap/replacement though, since 
> the placements might have changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19488) Ensure snitches always defer to ClusterMetadata

2024-06-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19488:

Reviewers: Aleksey Yeschenko, Marcus Eriksson

> Ensure snitches always defer to ClusterMetadata
> ---
>
> Key: CASSANDRA-19488
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19488
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership, Messaging/Internode, Transactional 
> Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Internally, C* always uses {{ClusterMetadata}} as the source of topology 
> information when calculating data placements, replica plans etc and as such 
> the role of the snitch has been somewhat reduced. 
> Sorting and comparison functions as provided by specialisations like 
> {{DynamicEndpointSnitch}} are still used, but the snitch should only be 
> responsible for providing the DC and rack for a new node when it first joins 
> a cluster.
> Aside from initial startup and registration, snitch implementations should 
> always defer to {{{}ClusterMetadata{}}}, for DC and rack otherwise there is a 
> risk that the snitch config drifts out of sync with TCM and output from tools 
> like {{nodetool ring}} and {{gossipinfo}} becomes incorrect.
> A complication is that topology is used when opening connections to peers as 
> certain internode connection settings are variable at the DC level, so at the 
> time of connecting we want to check the location of the remote peer. Usually, 
> this is available from {{{}ClusterMetadata{}}}, but in the case of a brand 
> new node joining the cluster nothing is known a priori. The current 
> implementation assumes that the snitch will know the location of the new node 
> ahead of time, but in practice this is often not the case (though with 
> variants of {{PropertyFileSnitch}} it _should_ be), and the remote node is 
> temporarily assigned a default DC. This is problematic as it can cause the 
> internode connection settings which depend on DC to be incorrectly set. 
> Internode connections are long lived and any established while the DC is 
> unknown (potentially with incorrect config) will persist indefinitely. This 
> particular issue is not directly related to TCM and is present in earlier 
> versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19488) Ensure snitches always defer to ClusterMetadata

2024-06-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19488:

Attachment: ci_summary.html

> Ensure snitches always defer to ClusterMetadata
> ---
>
> Key: CASSANDRA-19488
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19488
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership, Messaging/Internode, Transactional 
> Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> Internally, C* always uses {{ClusterMetadata}} as the source of topology 
> information when calculating data placements, replica plans etc and as such 
> the role of the snitch has been somewhat reduced. 
> Sorting and comparison functions as provided by specialisations like 
> {{DynamicEndpointSnitch}} are still used, but the snitch should only be 
> responsible for providing the DC and rack for a new node when it first joins 
> a cluster.
> Aside from initial startup and registration, snitch implementations should 
> always defer to {{{}ClusterMetadata{}}}, for DC and rack otherwise there is a 
> risk that the snitch config drifts out of sync with TCM and output from tools 
> like {{nodetool ring}} and {{gossipinfo}} becomes incorrect.
> A complication is that topology is used when opening connections to peers as 
> certain internode connection settings are variable at the DC level, so at the 
> time of connecting we want to check the location of the remote peer. Usually, 
> this is available from {{{}ClusterMetadata{}}}, but in the case of a brand 
> new node joining the cluster nothing is known a priori. The current 
> implementation assumes that the snitch will know the location of the new node 
> ahead of time, but in practice this is often not the case (though with 
> variants of {{PropertyFileSnitch}} it _should_ be), and the remote node is 
> temporarily assigned a default DC. This is problematic as it can cause the 
> internode connection settings which depend on DC to be incorrectly set. 
> Internode connections are long lived and any established while the DC is 
> unknown (potentially with incorrect config) will persist indefinitely. This 
> particular issue is not directly related to TCM and is present in earlier 
> versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19488) Ensure snitches always defer to ClusterMetadata

2024-06-19 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856271#comment-17856271
 ] 

Sam Tunnicliffe commented on CASSANDRA-19488:
-

Linked to a WIP branch for this so people can start taking a look without 
waiting for everything that's outstanding.
This completely deprecates the {{IEndpointSnitch}} interface, splitting its 
responsibilities out into a few new classes.
* {{o.a.c.locator.Locator}} is responsible for looking up DC/rack info for 
endpoints
* {{o.a.c.locator.InitialLocationProvider}} supplies the DC/rack for a new node 
to use when joining the cluster.
* {{o.a.c.locator.NodeProximity}} handles the sorting and ranking of replica 
lists
* {{o.a.c.locator.NodeAddressConfig}} is mainly to support the functionality of 
{{ReconnectableSnitchHelper}} and also {{Ec2MultiRegionSnitch}} which 
dynamically configures the broadcast address (not just the 
local/private/preferred one).

For migration and to allow us to deprecate snitches in a controlled way, it is 
still fully supported to configure with using the {{endpoint_snitch}} setting 
in yaml. {{o.a.c.locator.SnitchAdapter}} acts as a facade here, presenting the 
new interfaces to calling code and delegating to the legacy snitch impl. Most 
of the in-tree snitches impls have been refactored to extract impls of the new 
interfaces so that their functionality can be used via the new 
{{initial_location_provider/node_proximity}} settings.

Additionally there is some plumbing in {{o.a.c.locator.LocatorAdapter}} to help 
with inspecting topology before cluster metadata is fully available. 
Specifically, we prefer to always categorise peers as remote if the actual 
status can't reliably be determined. The reasoning here is that it's better to 
wrongly assume a local peer is remote and apply inter-dc settings (e.g. 
compression/encryption) than the reverse. The window where this is an issue is 
rather small (just at startup) and affects only the initial connections a node 
establishes with its peers. The adapter is notified once CM is available and 
any such connections are torn down to be re-established using the correct 
metadata.

An incomplete todo list: 
* Documentation. javadoc, inline comments/annotations & user docs all need more 
work.
* Naming. I'm not wedded to most of the naming here and there are some things 
which I didn't rename at all yet:
** {{o.a.c.locator.SnitchProperties}}
** {{o.a.c.locator.DynamicEndpointSnitch}}, this now {{implements 
NodeProximity}} but we have continuity of MBean naming to consider
** {{o.a.c.locator.EndpointSnitchInfo}}, similar MBean concerns apply here
** The method names on the new {{NodeProximity}} iface are copied directly from 
{{IEndpointSnitch}}, should fix that.
* Testing. CI is currently looking not perfect but ok, some new tests are 
needed.
** Behaviour around initial connections to peers before cluster metadata is 
available and re-connecting once it is.
** in-jvm & python dtests as well as simulator are mostly still configuring C* 
with a snitch and letting the adapter do its thing. This exercises the 
migration path and verifies existing yaml files won't break things on upgrade. 
At the same time, there are currently no tests of using the new config appraoch 
with {{InitialLocationProvider/NodeProximity/NodeAddressConfig}} directly. Is 
the right way to approach this to add the new settings to 
{{cassandra_latest.yaml}}?
** This patch appears to have exacerbated the issue described in 
CASSANDRA-19239. The branch includes a temporary commit which skips 
{{NativeTransportEncryptionOptionsTest}}.
** I haven't run python upgrade dtests on this for a while.
** Need to organise testing of the various supported cloud metadata services in 
their actual environments  
* There is an {{InitialLocationProvider}} that corresponds to or replaces each 
of the {{AbstractCloudMetadataServiceSnitch}} impls, but I haven't (yet) 
refactored {{PropertyFileSnitch/GossipingPropertyFileSnitch}} in the same way.


> Ensure snitches always defer to ClusterMetadata
> ---
>
> Key: CASSANDRA-19488
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19488
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership, Messaging/Internode, Transactional 
> Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Internally, C* always uses {{ClusterMetadata}} as the source of topology 
> information when calculating data placements, replica plans etc and as such 
> the role of the snitch has been somewhat reduced. 
> Sorting and comparison functions as provided by specialisations like 
> {{DynamicEndpointSnitch}} are still used, but the snitch should only be 
> responsible for 

[jira] [Created] (CASSANDRA-19714) Use table-specific partitioners during Paxos repair

2024-06-17 Thread Sam Tunnicliffe (Jira)
Sam Tunnicliffe created CASSANDRA-19714:
---

 Summary: Use table-specific partitioners during Paxos repair
 Key: CASSANDRA-19714
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19714
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sam Tunnicliffe


Partition keys in the \{{system.paxos}} table are derived from the key involved 
in the paxos transaction. Initially, it was safe to assume that the paxos table 
itself used the same partitioner as the tables in the transactions as all 
distributed keyspaces and tables were configured with the global partitioner. 
This is no longer true as the 
\{{system_cluster_metadata.distributed_metadata_log}} has its own custom 
partitioner. 


Likewise, \{{PaxosRepairHistory}} and the \{{system.paxos_repair_history}} 
table which makes that history durable map token ranges in the transacted 
tables to ballots. Prior to CASSANDRA-19482 it was safe to assume that these 
ranges contained tokens from the global partitioner but as this is no longer 
the case, we must use the specific partitioner for the table in question when 
working with ranges during paxos repair. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19692) ClassCastException on selection with where clause from system.local_metadata_log

2024-06-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19692:

  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/63648c1e86bdc31d60b80e55b4f48c55aa5e8deb
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

thanks both, committed.

> ClassCastException on selection with where clause from 
> system.local_metadata_log
> 
>
> Key: CASSANDRA-19692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> {code}
> select * from system.local_metadata_log where epoch = 1;
> NoHostAvailable: ('Unable to complete the operation against any hosts', 
> {:  message="java.lang.ClassCastException: class 
> org.apache.cassandra.dht.Murmur3Partitioner$LongToken cannot be cast to class 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> (org.apache.cassandra.dht.Murmur3Partitioner$LongToken and 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> are in unnamed module of loader 'app')">})
> {code}
> same select but with "limit" works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19692) ClassCastException on selection with where clause from system.local_metadata_log

2024-06-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19692:

Fix Version/s: 5.1
   (was: 5.x)

> ClassCastException on selection with where clause from 
> system.local_metadata_log
> 
>
> Key: CASSANDRA-19692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> {code}
> select * from system.local_metadata_log where epoch = 1;
> NoHostAvailable: ('Unable to complete the operation against any hosts', 
> {:  message="java.lang.ClassCastException: class 
> org.apache.cassandra.dht.Murmur3Partitioner$LongToken cannot be cast to class 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> (org.apache.cassandra.dht.Murmur3Partitioner$LongToken and 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> are in unnamed module of loader 'app')">})
> {code}
> same select but with "limit" works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19692) ClassCastException on selection with where clause from system.local_metadata_log

2024-06-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19692:

Status: Review In Progress  (was: Needs Committer)

> ClassCastException on selection with where clause from 
> system.local_metadata_log
> 
>
> Key: CASSANDRA-19692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> {code}
> select * from system.local_metadata_log where epoch = 1;
> NoHostAvailable: ('Unable to complete the operation against any hosts', 
> {:  message="java.lang.ClassCastException: class 
> org.apache.cassandra.dht.Murmur3Partitioner$LongToken cannot be cast to class 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> (org.apache.cassandra.dht.Murmur3Partitioner$LongToken and 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> are in unnamed module of loader 'app')">})
> {code}
> same select but with "limit" works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19692) ClassCastException on selection with where clause from system.local_metadata_log

2024-06-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19692:

Status: Ready to Commit  (was: Review In Progress)

> ClassCastException on selection with where clause from 
> system.local_metadata_log
> 
>
> Key: CASSANDRA-19692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> {code}
> select * from system.local_metadata_log where epoch = 1;
> NoHostAvailable: ('Unable to complete the operation against any hosts', 
> {:  message="java.lang.ClassCastException: class 
> org.apache.cassandra.dht.Murmur3Partitioner$LongToken cannot be cast to class 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> (org.apache.cassandra.dht.Murmur3Partitioner$LongToken and 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> are in unnamed module of loader 'app')">})
> {code}
> same select but with "limit" works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19692) ClassCastException on selection with where clause from system.local_metadata_log

2024-06-12 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854448#comment-17854448
 ] 

Sam Tunnicliffe commented on CASSANDRA-19692:
-

CI is still affected by the issue with jdk17 upgrade dtests, plus there are a 
couple of timeouts there that seem unrelated and which I can't repro locally. 
The python dtest failure is CASSANDRA-19697

> ClassCastException on selection with where clause from 
> system.local_metadata_log
> 
>
> Key: CASSANDRA-19692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> {code}
> select * from system.local_metadata_log where epoch = 1;
> NoHostAvailable: ('Unable to complete the operation against any hosts', 
> {:  message="java.lang.ClassCastException: class 
> org.apache.cassandra.dht.Murmur3Partitioner$LongToken cannot be cast to class 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> (org.apache.cassandra.dht.Murmur3Partitioner$LongToken and 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> are in unnamed module of loader 'app')">})
> {code}
> same select but with "limit" works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19692) ClassCastException on selection with where clause from system.local_metadata_log

2024-06-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19692:

Status: Needs Committer  (was: Patch Available)

> ClassCastException on selection with where clause from 
> system.local_metadata_log
> 
>
> Key: CASSANDRA-19692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> {code}
> select * from system.local_metadata_log where epoch = 1;
> NoHostAvailable: ('Unable to complete the operation against any hosts', 
> {:  message="java.lang.ClassCastException: class 
> org.apache.cassandra.dht.Murmur3Partitioner$LongToken cannot be cast to class 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> (org.apache.cassandra.dht.Murmur3Partitioner$LongToken and 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> are in unnamed module of loader 'app')">})
> {code}
> same select but with "limit" works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19692) ClassCastException on selection with where clause from system.local_metadata_log

2024-06-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19692:

Attachment: ci_summary-1.html

> ClassCastException on selection with where clause from 
> system.local_metadata_log
> 
>
> Key: CASSANDRA-19692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> {code}
> select * from system.local_metadata_log where epoch = 1;
> NoHostAvailable: ('Unable to complete the operation against any hosts', 
> {:  message="java.lang.ClassCastException: class 
> org.apache.cassandra.dht.Murmur3Partitioner$LongToken cannot be cast to class 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> (org.apache.cassandra.dht.Murmur3Partitioner$LongToken and 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> are in unnamed module of loader 'app')">})
> {code}
> same select but with "limit" works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19692) ClassCastException on selection with where clause from system.local_metadata_log

2024-06-12 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854305#comment-17854305
 ] 

Sam Tunnicliffe commented on CASSANDRA-19692:
-

Rebased and pushed with review fixes, CI pending

> ClassCastException on selection with where clause from 
> system.local_metadata_log
> 
>
> Key: CASSANDRA-19692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> {code}
> select * from system.local_metadata_log where epoch = 1;
> NoHostAvailable: ('Unable to complete the operation against any hosts', 
> {:  message="java.lang.ClassCastException: class 
> org.apache.cassandra.dht.Murmur3Partitioner$LongToken cannot be cast to class 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> (org.apache.cassandra.dht.Murmur3Partitioner$LongToken and 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> are in unnamed module of loader 'app')">})
> {code}
> same select but with "limit" works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19692) ClassCastException on selection with where clause from system.local_metadata_log

2024-06-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-19692:
---

Assignee: Sam Tunnicliffe

> ClassCastException on selection with where clause from 
> system.local_metadata_log
> 
>
> Key: CASSANDRA-19692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> {code}
> select * from system.local_metadata_log where epoch = 1;
> NoHostAvailable: ('Unable to complete the operation against any hosts', 
> {:  message="java.lang.ClassCastException: class 
> org.apache.cassandra.dht.Murmur3Partitioner$LongToken cannot be cast to class 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> (org.apache.cassandra.dht.Murmur3Partitioner$LongToken and 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> are in unnamed module of loader 'app')">})
> {code}
> same select but with "limit" works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19692) ClassCastException on selection with where clause from system.local_metadata_log

2024-06-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19692:

Attachment: ci_summary.html

> ClassCastException on selection with where clause from 
> system.local_metadata_log
> 
>
> Key: CASSANDRA-19692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> {code}
> select * from system.local_metadata_log where epoch = 1;
> NoHostAvailable: ('Unable to complete the operation against any hosts', 
> {:  message="java.lang.ClassCastException: class 
> org.apache.cassandra.dht.Murmur3Partitioner$LongToken cannot be cast to class 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> (org.apache.cassandra.dht.Murmur3Partitioner$LongToken and 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> are in unnamed module of loader 'app')">})
> {code}
> same select but with "limit" works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19692) ClassCastException on selection with where clause from system.local_metadata_log

2024-06-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19692:

Authors: Sam Tunnicliffe
Test and Documentation Plan: New utest added + existing tests run in CI
 Status: Patch Available  (was: Open)

There's currently an issue with jdk17 upgrade dtests, but it's unrelated to 
this patch. Likewise, the other failures appear to be related to CI infra and 
look much better locally.

> ClassCastException on selection with where clause from 
> system.local_metadata_log
> 
>
> Key: CASSANDRA-19692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> {code}
> select * from system.local_metadata_log where epoch = 1;
> NoHostAvailable: ('Unable to complete the operation against any hosts', 
> {:  message="java.lang.ClassCastException: class 
> org.apache.cassandra.dht.Murmur3Partitioner$LongToken cannot be cast to class 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> (org.apache.cassandra.dht.Murmur3Partitioner$LongToken and 
> org.apache.cassandra.dht.ReversedLongLocalPartitioner$ReversedLongLocalToken 
> are in unnamed module of loader 'app')">})
> {code}
> same select but with "limit" works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19217) Test failure: auth_test.TestAuthUnavailable

2024-06-04 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19217:

  Fix Version/s: 5.1
 (was: 5.x)
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra-dtest/commit/f8be85023ad75ead695f6f014d3cc391fbce43b2
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

thanks [~brandon.williams], committed.

> Test failure: auth_test.TestAuthUnavailable
> ---
>
> Key: CASSANDRA-19217
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19217
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Jacek Lewandowski
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
>
> https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1233/workflows/bb617340-f1da-4550-9c87-5541469972c4/jobs/62551/tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19217) Test failure: auth_test.TestAuthUnavailable

2024-06-03 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19217:

Test and Documentation Plan: Test only fix, ran repeated dtests.
 Status: Patch Available  (was: In Progress)

This happens if a client read/write occurs which causes a read from the 
{{system_auth.roles}} table while the rf change is in flight. 
As the caching of roles info is disabled completely by these tests any client 
operation, including those performed automatically by the driver, will hit 
this.   
This is expected behaviour now that we are able to actually detect it and it 
doesn't actually affect the running or the validity of this test. 
Once the rf change is completed attempts to login fail as expected with the 
UnavailableException, so I think we should just add an exclusion pattern to 
ignore these in the C* logs. 
I've opened a dtest PR 
[here|https://github.com/apache/cassandra-dtest/pull/263] and run some repeated 
test pipelines 
[here|https://app.circleci.com/pipelines/github/beobal/cassandra/458/workflows/dc58b01b-5554-49ce-bf67-284ad1956350/jobs/9220],
 
[here|https://app.circleci.com/pipelines/github/beobal/cassandra/458/workflows/eec2e932-b61e-4eb0-b75e-99a4bc249fa5/jobs/9113]
 and 
[here|https://app.circleci.com/pipelines/github/beobal/cassandra/458/workflows/ca293fe6-949f-4cb8-a29d-3914fb904235/jobs/9167].
 


> Test failure: auth_test.TestAuthUnavailable
> ---
>
> Key: CASSANDRA-19217
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19217
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Jacek Lewandowski
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1233/workflows/bb617340-f1da-4550-9c87-5541469972c4/jobs/62551/tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19217) Test failure: auth_test.TestAuthUnavailable

2024-05-30 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-19217:
---

Assignee: Sam Tunnicliffe

> Test failure: auth_test.TestAuthUnavailable
> ---
>
> Key: CASSANDRA-19217
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19217
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Jacek Lewandowski
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1233/workflows/bb617340-f1da-4550-9c87-5541469972c4/jobs/62551/tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19158) Reuse native transport-driven futures in Debounce

2024-05-28 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849992#comment-17849992
 ] 

Sam Tunnicliffe commented on CASSANDRA-19158:
-

The latest round of changes LGTM, but I think you missed the minor things (out 
of date comments / double {{;;}} etc) on {{EpochAwareDebounce.java}}. Other 
than that, I'm +1.

We talked offline about the test failure and identified that it's actually a 
problem with the test, which needs updating since it predates automatic retry 
of failed commits. We'll open a separate JIRA for that.

> Reuse native transport-driven futures in Debounce
> -
>
> Key: CASSANDRA-19158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19158
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary-2.html, ci_summary.html
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, we create a future in Debounce, then create one more future in 
> RemoteProcessor#sendWithCallback. This is further exacerbated by chaining 
> calls, when we first attempt to catch up from peer, and then from CMS.
> First of all, we should always only use a native transport timeout driven 
> futures returned from sendWithCallback, since they implement reasonable 
> retries under the hood, and are easy to bulk-configure (ie you can simply 
> change timeout in yaml and have all futures change their behaviour).
> Second, we should _chain_ futures and use map or andThen for fallback 
> operations such as trying to catch up from CMS after an unsuccesful attemp to 
> catch up from peer.
> This should significantly simplify the code and number of blocked/waiting 
> threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-23 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19592:

Reviewers: Sam Tunnicliffe, Stefan Miklosovic  (was: Sam Tunnicliffe)

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-23 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19592:

  Fix Version/s: 5.1
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/7fe30fc313ac35b1156f5a37d2069e29cded710b
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

committed, thanks.

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-23 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19592:

Status: Ready to Commit  (was: Review In Progress)

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-23 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848984#comment-17848984
 ] 

Sam Tunnicliffe commented on CASSANDRA-19592:
-

{quote}Does anything else need to be done except merging?
{quote}
No, I think it just fell between Alex & me. I'll get it rebased & merged.

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19593) Transactional Guardrails

2024-05-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848624#comment-17848624
 ] 

Sam Tunnicliffe commented on CASSANDRA-19593:
-

{quote}This brings us to more general problem of transactional configuration 
which should be done as well. It is questionable if it is desirable to do it as 
part of this ticket or not, however, I would like to look into how we could do 
that as well.
{quote}
We've been working on some proposals for this, some of which were briefly 
discussed in CASSANDRA-12937.  I agree with [~ifesdjeen] in that this warrants 
its own CEP. I know he's been working on document for that, I'll see if we can 
get that ready for circulation.

> Transactional Guardrails
> 
>
> Key: CASSANDRA-19593
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19593
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Guardrails, Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it is time to start to think about this more seriously. TCM is 
> getting into pretty nice shape and we might start to investigate how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail

2024-05-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848544#comment-17848544
 ] 

Sam Tunnicliffe commented on CASSANDRA-19556:
-

[~mck] this certainly isn't critical for 5.1/6.0, my comment was just intended 
as a counterpoint to illustrate why it might but useful in a 5.0.x

To that point, I'd definitely think about adding _something_ to minors in 
branches with upgrade paths to current trunk. Not an actual guardrail, just a 
system property or similar to optionally disable certain operations immediately 
prior to upgrade. If we did go down that route, there is some precedent from 
back in the day for mandating a minimum minor version prior to a major upgrade 
(from {{{}NEWS.txt{}}}):
{code:java}
Upgrade to 3.0 is supported from Cassandra 2.1 versions greater or equal to 
2.1.9,
or Cassandra 2.2 versions greater or equal to 2.2.2. 
{code}
but like I said, this isn't critical for upgrading to current trunk and I'm 
definitely not advocating for anything in 5.0-rc

> Add guardrail to block DDL/DCL queries and replace alter_table_enabled 
> guardrail
> 
>
> Key: CASSANDRA-19556
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19556
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Guardrails
>Reporter: Yuqi Yan
>Assignee: Yuqi Yan
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Sometimes we want to block DDL/DCL queries to stop new schemas being created 
> or roles created. (e.g. when doing live-upgrade)
> For DDL guardrail current implementation won't block the query if it's no-op 
> (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The 
> guardrail check is added in apply() right after all the existence check)
> I don't have preference on either block every DDL query or check whether if 
> it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. 
> at startup, which is no-op but will be blocked by this guardrail and failed 
> to start.
>  
> 4.1 PR: [https://github.com/apache/cassandra/pull/3248]
> trunk PR: [https://github.com/apache/cassandra/pull/3275]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-16 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846952#comment-17846952
 ] 

Sam Tunnicliffe commented on CASSANDRA-19592:
-

[~ifesdjeen] I'm +1 on this version, wdyt?

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-16 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19592:

Attachment: ci_summary-1.html

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-16 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19592:

Status: Review In Progress  (was: Changes Suggested)

Discussed with Alex and made a few tweaks. Pushed the latest version and 
attached updated CI summary. The single test failure is unrelated.

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests

2024-05-16 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19599:

  Fix Version/s: 5.1
 (was: 5.x)
Source Control Link: 
https://github.com/apache/cassandra/commit/a15b137b7c8c84773453dbe264fcd2d4b76222c0
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed, thanks

> Remove unused config params for out of range token requests
> ---
>
> Key: CASSANDRA-19599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19599
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/Config
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html
>
>
> The fields {{log_out_of_token_range_requests}} and 
> {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually 
> been used and are just vestiges from early development on CEP-21. 
> We should remove them and the related accessors in {{DatabaseDescriptor}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   3   4   5   6   7   8   9   10   >