[jira] [Commented] (CASSANDRA-18111) Centralize all snapshot operations to SnapshotManager and cache snapshots
[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881819#comment-17881819 ] Paulo Motta commented on CASSANDRA-18111: - Thanks for the update, the patch is looking good so far but I think there is still some internal snapshot logic leaking to other classes (ie. {{{}Keyspace{}}}/ColumnFamilyStore). It would be ideal if we could centralize most if not all internal snapshot logic on the package *org.apache.cassandra.service.snapshot* as part of this effort. Added some review comments directly to the [PR|https://github.com/apache/cassandra/pull/3374#pullrequestreview-2305171293] and some other comments below. I see some internal code/tests using old snapshot methods from {{StorageService}} - (for example [StandaloneUpgraderOnSStablesTest#testUpgradeSnapshot|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/tools/StandaloneUpgraderOnSStablesTest.java#L96]). Should we deprecate {{StorageService}} snapshot methods to discourage its use (similar to {{{}StorageServiceMBean{}}}) and update all uses to use {{SnapshotManager}} methods ? It looks like some internal code is still referring to {{ColumnFamilyStore}} legacy snapshot verbs (ie. [SnapshotVerbHandler.doVerb|https://github.com/apache/cassandra/blob/fe025c7f79e76d99e0db347518a7872fd4a114bc/src/java/org/apache/cassandra/service/SnapshotVerbHandler.java#L49]) - should we update all uses to use {{SnapshotManager}} and remove {{ColumnFamilyStore}} snapshot methods in favor of {{SnapshotManager}} methods ? It looks like there are some legacy snapshot tests without assertions on [StorageServiceServerTest|https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/service/StorageServiceServerTest.java#L158]. I think we should try to add the missing assertions and move them to {{SnapshotManagerTest}} if they're not already being tested somewhere else. > Centralize all snapshot operations to SnapshotManager and cache snapshots > - > > Key: CASSANDRA-18111 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18111 > Project: Cassandra > Issue Type: Improvement > Components: Local/Snapshots >Reporter: Paulo Motta >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 5h > Remaining Estimate: 0h > > Everytime {{nodetool listsnapshots}} is called, all data directories are > scanned to find snapshots, what is inefficient. > For example, fetching the > {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric > can take half a second (CASSANDRA-13338). > This improvement will also allow snapshots to be efficiently queried via > virtual tables (CASSANDRA-18102). > In order to do this, we should: > a) load all snapshots from disk during initialization > b) keep a collection of snapshots on {{SnapshotManager}} > c) update the snapshots collection anytime a new snapshot is taken or cleared > d) detect when a snapshot is manually removed from disk. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19902) Revert CASSANDRA-11537
[ https://issues.apache.org/jira/browse/CASSANDRA-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19902: Description: Looks like the seemingly harmless cosmetic patch from CASSANDRA-11537 causes the StorageServiceMBean to not be available during bootstrap. This causes commands like "nodetool nestats/status/etc" to not be available on the boostrapping node with the following error: {code:none} - StackTrace -- javax.management.InstanceNotFoundException: org.apache.cassandra.db:type=StorageService at java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1083) at java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:637) {code} This ticket is just to revert CASSANDRA-11537, we can re-add the improvement of that ticket later. was: Looks like the seemingly innocent cosmetic patch from CASSANDRA-11537 causes the StorageServiceMBean to not be available during bootstrap. This causes commands like "nodetool nestats/status/etc" to not be available on the boostrapping node with the following error: {code:none} - StackTrace -- javax.management.InstanceNotFoundException: org.apache.cassandra.db:type=StorageService at java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1083) at java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:637) {code} This ticket is just to revert CASSANDRA-11537, we can re-add the improvement of that ticket later. > Revert CASSANDRA-11537 > -- > > Key: CASSANDRA-19902 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19902 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool > Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Normal > > Looks like the seemingly harmless cosmetic patch from CASSANDRA-11537 causes > the StorageServiceMBean to not be available during bootstrap. This causes > commands like "nodetool nestats/status/etc" to not be available on the > boostrapping node with the following error: > {code:none} > - StackTrace -- > javax.management.InstanceNotFoundException: > org.apache.cassandra.db:type=StorageService > at > java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1083) > at > java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:637) > {code} > This ticket is just to revert CASSANDRA-11537, we can re-add the improvement > of that ticket later. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18111) Centralize all snapshot operations to SnapshotManager and cache snapshots
[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870085#comment-17870085 ] Paulo Motta edited comment on CASSANDRA-18111 at 8/1/24 5:24 AM: - (I structured this review into multiple sections to hopefully make it easier to discuss) I'd like to restate and discuss the goals of this ticket to ensure we're on the same page: * *✅ Goal 1: Improve performance of {color:#ff}+nodetool listsnapshots+{color} / {color:#ff}SELECT * FROM system_views.snapshots{color} by avoiding expensive disk traversal when listing snapshots* To validate this goal is being achieved with the proposed patch, I created a rough benchmark comparing listsnapshot performance in the following implementations: * {*}listsnapshots_disk{*}: Old implementation fetching snapshots from disk in every call to listsnapshots * {*}listsnapshots_cached{*}: New cached implementation * {*}listsnapshots_cached_checkexists{*}: New cached implementation + check manifest file exists during fetch The benchmark consists of a simple junit test ([code here|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/test/unit/org/apache/cassandra/service/snapshot/SnapshotManagerTest.java]) fetching 100 snapshots of 10 tables with 10 sstables each 1000 times for each implementation. I got the following test execution times in my modest SSD laptop for each implementation: * {*}listsnapshots_disk{*}: 37 seconds * {*}listsnapshots_cached{*}: 36 milliseconds * {*}listsnapshots_cached_checkexists{*}: 4 seconds The *listsnapshots_cached* results indicate that caching snapshots greatly improves *listsnapshots* speed compared to the current *listsnapshots_disk* implementation as expected, what accomplishes *Goal 1* and justifies this patch. The additional snapshot manifest existence check from *listsnapshots_cached_checkexists* adds considerable overhead in comparison to {*}listsnapshots_cached{*}, but it's still significantly faster than the previous *listsnapshots_disk* implementation. * *⚠️ Goal 2: Consolidate / centralize "as much as possible" snapshot logic in SnapshotManager (CASSANDRA-18271)* While this patch makes progress towards this goal, there is still considerable amount of snapshot logic in *StorageService* and {*}ColumnFamilyStore{*}. See discussion for each subsystem below: *A) StorageService:* there is some snapshot handling logic in at least the following methods: * [takeSnapshot|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L2735] * [getSnapshotDetails|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3020] * [trueSnapshotsSize|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3042] I think we could simplify a great deal of code by moving remaining snapshot logic from StorageService to SnapshotManager and create a dedicated [SnapshotManagerMbean|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/snapshot/SnapshotManagerMBean.java] to expose snapshot methods via JMX moving forward. WDYT ? This would allow refactoring and simplifying some snapshot logic, for example unifying implementations of [takeMultipleTableSnapshot|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2914] and [takeSnapshot.|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2868] The proposal above would help retire snapshot logic from StorageService and eventually remove deprecated snapshot handling methods from StorageServiceMbean. I'm happy to take these suggestions to a follow-up ticket, but wanted to hear your thoughts on this refactoring proposal. *B) ColumnFamilyStore:* there is a fundamental coupling between ColumnFamilyStore and snapshot creation, since snapshot creation requires flushing and locking sstables while creating the hardlinks. I don't think we can fully remove this dependency but maybe there's room for further cleanup/improvement in a follow-up ticket. *⚠️* *SnapshotWatcher* I am a bit concerned by the additional complexity added by SnapshotWatcher and reliance on WatchService's / inotify implementation to detect when a snapshot was manually removed from outside the process. How about checking if the manifest file exists periodically or during fetch if the user wants to enable this detection ? This seems relatively cheap based in the *listsnapshots_cached_checkexists* r
[jira] [Comment Edited] (CASSANDRA-18111) Centralize all snapshot operations to SnapshotManager and cache snapshots
[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870085#comment-17870085 ] Paulo Motta edited comment on CASSANDRA-18111 at 8/1/24 5:24 AM: - (I structured this review into multiple sections to hopefully make it easier to discuss) I'd like to restate and discuss the goals of this ticket to ensure we're on the same page: * *✅ Goal 1: Improve performance of {color:#ff}+nodetool listsnapshots+{color} / {color:#ff}SELECT * FROM system_views.snapshots{color} by avoiding expensive disk traversal when listing snapshots* To validate this goal is being achieved with the proposed patch, I created a rough benchmark comparing listsnapshot performance in the following implementations: * {*}listsnapshots_disk{*}: Old implementation fetching snapshots from disk in every call to listsnapshots * {*}listsnapshots_cached{*}: New cached implementation * {*}listsnapshots_cached_checkexists{*}: New cached implementation + check manifest file exists during fetch The benchmark consists of a simple junit test ([code here|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/test/unit/org/apache/cassandra/service/snapshot/SnapshotManagerTest.java]) fetching 100 snapshots of 10 tables with 10 sstables each 1000 times for each implementation. I got the following test execution times in my modest SSD laptop for each implementation: * {*}listsnapshots_disk{*}: 37 seconds * {*}listsnapshots_cached{*}: 36 milliseconds * {*}listsnapshots_cached_checkexists{*}: 4 seconds The *listsnapshots_cached* results indicate that caching snapshots greatly improves *listsnapshots* speed compared to the current *listsnapshots_disk* implementation as expected, what accomplishes *Goal 1* and justifies this patch. The additional snapshot manifest existence check from *listsnapshots_cached_checkexists* adds considerable overhead in comparison to {*}listsnapshots_cached{*}, but it's still significantly faster than the previous *listsnapshots_disk* implementation. * *⚠️ Goal 2: Consolidate / centralize "as much as possible" snapshot logic in SnapshotManager (CASSANDRA-18271)* While this patch makes progress towards this goal, there is still considerable amount of snapshot logic in *StorageService* and {*}ColumnFamilyStore{*}. See discussion for each subsystem below: *A) StorageService:* there is some snapshot handling logic in at least the following methods: * [takeSnapshot|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L2735] * [getSnapshotDetails|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3020] * [trueSnapshotsSize|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3042] I think we could simplify a great deal of code by moving remaining snapshot logic from StorageService to SnapshotManager and create a dedicated [SnapshotManagerMbean|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/snapshot/SnapshotManagerMBean.java] to expose snapshot methods via JMX moving forward. WDYT ? This would allow refactoring and simplifying some snapshot logic, for example unifying implementations of [takeMultipleTableSnapshot|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2914] and [takeSnapshot.|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2868] The proposal above would help retire snapshot logic from StorageService and eventually remove deprecated snapshot handling methods from StorageServiceMbean. I'm happy to take these suggestions to a follow-up ticket, but wanted to hear your thoughts on this refactoring proposal. *B) ColumnFamilyStore:* there is a fundamental coupling between ColumnFamilyStore and snapshot creation, since snapshot creation requires flushing and locking sstables while creating the hardlinks. I don't think we can fully remove this dependency but maybe there's room for further cleanup/improvement in a follow-up ticket. *⚠️* *SnapshotWatcher* I am a bit concerned by the additional complexity added by SnapshotWatcher and reliance on WatchService's / inotify implementation to detect when a snapshot was manually removed from outside the process. How about checking if the manifest file exists periodically or during fetch if the user wants to enable this detection ? This seems relatively cheap based in the *listsnapshots_cached_checkexists* r
[jira] [Commented] (CASSANDRA-18111) Centralize all snapshot operations to SnapshotManager and cache snapshots
[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870085#comment-17870085 ] Paulo Motta commented on CASSANDRA-18111: - (I structured this review into multiple sections to hopefully make it easier to discuss) I'd like to restate and discuss the goals of this ticket to ensure we're on the same page: * *✅ Goal 1: Improve performance of {color:#ff}+nodetool listsnapshots+{color} / {color:#ff}SELECT * FROM system_views.snapshots{color} by avoiding expensive disk traversal when listing snapshots* To validate this goal is being achieved with the proposed patch, I created a rough benchmark comparing listsnapshot performance in the following implementations: * {*}listsnapshots_disk{*}: Old implementation fetching snapshots from disk in every call to listsnapshots * {*}listsnapshots_cached{*}: New cached implementation * {*}listsnapshots_cached_checkexists{*}: New cached implementation + check manifest file exists during fetch The benchmark consists of a simple junit test ([code here|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/test/unit/org/apache/cassandra/service/snapshot/SnapshotManagerTest.java]) fetching 100 snapshots of 10 tables with 10 sstables each 1000 times for each implementation. I got the following test execution times in my modest SSD laptop for each implementation: * {*}listsnapshots_disk{*}: 37 seconds * {*}listsnapshots_cached{*}: 36 milliseconds * {*}listsnapshots_cached_checkexists{*}: 4 seconds The *listsnapshots_cached* results indicate that caching snapshots greatly improves *listsnapshots* speed compared to the current *listsnapshots_disk* implementation as expected, what accomplishes *Goal 1* and justifies this patch. The additional snapshot manifest existence check from *listsnapshots_cached_checkexists* adds considerable overhead in comparison to {*}listsnapshots_cached{*}, but it's still significantly faster than the previous *listsnapshots_disk* implementation. * *⚠️ Goal 2: Consolidate / centralize "as much as possible" snapshot logic in SnapshotManager (CASSANDRA-18271)* While this patch makes progress towards this goal, there is still considerable amount of snapshot logic in *StorageService* and {*}ColumnFamilyStore{*}. See discussion for each subsystem below: *A) StorageService:* there is some snapshot handling logic in at least the following methods: * [takeSnapshot|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L2745] * [getSnapshotDetails|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3062] * [trueSnapshotsSize|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/StorageService.java#L3062] I think we could simplify a great deal of code by moving remaining snapshot logic from StorageService to SnapshotManager and create a dedicated [SnapshotManagerMbean|https://github.com/pauloricardomg/cassandra/blob/CASSANDRA-18111-review-2/src/java/org/apache/cassandra/service/snapshot/SnapshotManagerMBean.java] to expose snapshot methods via JMX moving forward. WDYT ? This would allow refactoring and simplifying some snapshot logic, for example unifying implementations of [takeMultipleTableSnapshot|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2914] and [takeSnapshot.|https://github.com/apache/cassandra/blob/f95c1b5bb3d6f8728a00e13ca81993e12a9b14cd/src/java/org/apache/cassandra/service/StorageService.java#L2868] The proposal above would help retire snapshot logic from StorageService and eventually remove deprecated snapshot handling methods from StorageServiceMbean. I'm happy to take these suggestions to a follow-up ticket, but wanted to hear your thoughts on this refactoring proposal. *B) ColumnFamilyStore:* there is a fundamental coupling between ColumnFamilyStore and snapshot creation, since snapshot creation requires flushing and locking sstables while creating the hardlinks. I don't think we can fully remove this dependency but maybe there's room for further cleanup/improvement in a follow-up ticket. *⚠️* *SnapshotWatcher* I am a bit concerned by the additional complexity added by SnapshotWatcher and reliance on WatchService's / inotify implementation to detect when a snapshot was manually removed from outside the process. How about checking if the manifest file exists periodically or during fetch if the user wants to enable this detection ? This seems relatively cheap based in the *listsnapshots_cached_checkexists* results while being considerably simpler than
[jira] [Commented] (CASSANDRA-18111) Centralize all snapshot operations to SnapshotManager and cache snapshots
[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867638#comment-17867638 ] Paulo Motta commented on CASSANDRA-18111: - While testing this I noticed that when a snapshot from a table is split across multiple data directories, and one of the directories is manually removed, then this causes the cleanup mechanism to remove the snapshot files from the other directories. When a snapshot is spread across multiple data directories I think the intent is to only stop tracking the snapshot on SnapshotManager when all snapshot subdirectories are removed? We don't want to clear additional snapshot directories if one of the subdirectories was manually removed. Alternatively we can consider that a snapshot is valid as long as the "manifest.json" exists ? This would create a requirement that all snapshots should contain a "manifest.json" to be tracked by SnapshotManager. I think this is a fair requirement, because without the manifest it's not possible to ensure whether a snapshot was partially corrupted (ie. some files were removed from it). See example: {code:none} $ cat data/data1/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/manifest.json { "files" : [ "oa-2-big-Data.db", "oa-1-big-Data.db" ], "created_at" : "2024-07-22T01:53:47.026Z", "expires_at" : null, "ephemeral" : false } $ls -ltra data/data1/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/ total 48 -rw-rw-r-- 2 user user 16 Jul 21 21:53 oa-1-big-Filter.db -rw-rw-r-- 2 user user 92 Jul 21 21:53 oa-1-big-Summary.db -rw-rw-r-- 2 user user 20 Jul 21 21:53 oa-1-big-Index.db -rw-rw-r-- 2 user user 10 Jul 21 21:53 oa-1-big-Digest.crc32 -rw-rw-r-- 2 user user 137 Jul 21 21:53 oa-1-big-Data.db -rw-rw-r-- 2 user user 92 Jul 21 21:53 oa-1-big-TOC.txt -rw-rw-r-- 2 user user 5429 Jul 21 21:53 oa-1-big-Statistics.db -rw-rw-r-- 2 user user 47 Jul 21 21:53 oa-1-big-CompressionInfo.db drwxrwxr-x 3 user user 4096 Jul 21 21:53 .. drwxrwxr-x 2 user user 4096 Jul 21 21:53 . -rw-rw-r-- 1 user user 149 Jul 21 21:53 manifest.json $ ls -ltra data/data2/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/ total 44 -rw-rw-r-- 2 user user 92 Jul 21 21:53 oa-2-big-Summary.db -rw-rw-r-- 2 user user 20 Jul 21 21:53 oa-2-big-Index.db -rw-rw-r-- 2 user user 16 Jul 21 21:53 oa-2-big-Filter.db -rw-rw-r-- 2 user user 129 Jul 21 21:53 oa-2-big-Data.db -rw-rw-r-- 2 user user 10 Jul 21 21:53 oa-2-big-Digest.crc32 -rw-rw-r-- 2 user user 47 Jul 21 21:53 oa-2-big-CompressionInfo.db -rw-rw-r-- 2 user user 92 Jul 21 21:53 oa-2-big-TOC.txt -rw-rw-r-- 2 user user 5430 Jul 21 21:53 oa-2-big-Statistics.db drwxrwxr-x 3 user user 4096 Jul 21 21:53 .. drwxrwxr-x 2 user user 4096 Jul 21 21:53 . # Remove data2 manually, but keep data1 $ rm -rf data/data2/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/ $ ls -ltra data/data1/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/ total 48 -rw-rw-r-- 2 user user 16 Jul 21 21:53 oa-1-big-Filter.db -rw-rw-r-- 2 user user 92 Jul 21 21:53 oa-1-big-Summary.db -rw-rw-r-- 2 user user 20 Jul 21 21:53 oa-1-big-Index.db -rw-rw-r-- 2 user user 10 Jul 21 21:53 oa-1-big-Digest.crc32 -rw-rw-r-- 2 user user 137 Jul 21 21:53 oa-1-big-Data.db -rw-rw-r-- 2 user user 92 Jul 21 21:53 oa-1-big-TOC.txt -rw-rw-r-- 2 user user 5429 Jul 21 21:53 oa-1-big-Statistics.db -rw-rw-r-- 2 user user 47 Jul 21 21:53 oa-1-big-CompressionInfo.db drwxrwxr-x 3 user user 4096 Jul 21 21:53 .. drwxrwxr-x 2 user user 4096 Jul 21 21:53 . -rw-rw-r-- 1 user user 149 Jul 21 21:53 manifest.json [after some time] INFO [SnapshotCleanup:1] 2024-07-21 22:01:46,818 SnapshotManager.java:243 - Removing snapshot TableSnapshot{keyspaceName='system', tableName='compaction_history', tableId=b4dbb7b4-dc49-3fb5-b3bf-ce6e434832ca, tag='test', createdAt=2024-07-22T01:53:47.026Z, expiresAt=null, snapshotDirs=[/tmp/apache-cassandra-5.1-SNAPSHOT/data/data2/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test, /tmp/apache-cassandra-5.1-SNAPSHOT/data/data1/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test], ephemeral=false} $ ls -ltra data/data1/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/ ls: cannot access 'data/data1/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/snapshots/test/': No such file or directory <-- other snapshot subdirectory "data1" was removed {code} > Centralize all snapshot operations to SnapshotManager and cache snapshots > - > >
[jira] [Comment Edited] (CASSANDRA-18111) Cache snapshots in memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856088#comment-17856088 ] Paulo Motta edited comment on CASSANDRA-18111 at 6/19/24 12:49 AM: --- I was thinking that since this is just a cache, perhaps we could have a {{snapshot_metadata_cache_size: 100MiB}} setting so the amount of memory used for snapshot metadata would be capped while providing the optimization by default ? Users wishing to disable could just set {{snapshot_metadata_cache_size: 0MiB}}. It would be nice to validate how much this improves select * from system_views.snapshots performance for large snapshot * keyspace * table * sstable counts. was (Author: paulo): I was thinking that since this is just a cache, perhaps we could have a {{snapshot_metadata_cache_size: 100MiB }}setting so the amount of memory used for snapshot metadata would be capped while providing the optimization by default ? Users wishing to disable could just set {{{}snapshot_metadata_cache_size: 0MiB{}}}. It would be nice to validate how much this improves select * from system_views.snapshots performance for large snapshot * keyspace * table * sstable counts. > Cache snapshots in memory > - > > Key: CASSANDRA-18111 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18111 > Project: Cassandra > Issue Type: Improvement > Components: Local/Snapshots >Reporter: Paulo Motta >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Everytime {{nodetool listsnapshots}} is called, all data directories are > scanned to find snapshots, what is inefficient. > For example, fetching the > {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric > can take half a second (CASSANDRA-13338). > This improvement will also allow snapshots to be efficiently queried via > virtual tables (CASSANDRA-18102). > In order to do this, we should: > a) load all snapshots from disk during initialization > b) keep a collection of snapshots on {{SnapshotManager}} > c) update the snapshots collection anytime a new snapshot is taken or cleared > d) detect when a snapshot is manually removed from disk. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18111) Cache snapshots in memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856088#comment-17856088 ] Paulo Motta edited comment on CASSANDRA-18111 at 6/19/24 12:48 AM: --- I was thinking that since this is just a cache, perhaps we could have a {{snapshot_metadata_cache_size: 100MiB }}setting so the amount of memory used for snapshot metadata would be capped while providing the optimization by default ? Users wishing to disable could just set {{{}snapshot_metadata_cache_size: 0MiB{}}}. It would be nice to validate how much this improves select * from system_views.snapshots performance for large snapshot * keyspace * table * sstable counts. was (Author: paulo): I was thinking that since this is just a cache, perhaps we could have a {{snapshot_metadata_cache_size: 100MiB}} setting so the amount of memory used for snapshot metadata would be capped while providing the optimization by default ? Users wishing to disable the could just set {{snapshot_metadata_cache_size: 0MiB. }} It would be nice to validate how much this improves select * system_views.snapshots performance for large snapshot * keyspace * table * sstable counts. > Cache snapshots in memory > - > > Key: CASSANDRA-18111 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18111 > Project: Cassandra > Issue Type: Improvement > Components: Local/Snapshots >Reporter: Paulo Motta >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Everytime {{nodetool listsnapshots}} is called, all data directories are > scanned to find snapshots, what is inefficient. > For example, fetching the > {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric > can take half a second (CASSANDRA-13338). > This improvement will also allow snapshots to be efficiently queried via > virtual tables (CASSANDRA-18102). > In order to do this, we should: > a) load all snapshots from disk during initialization > b) keep a collection of snapshots on {{SnapshotManager}} > c) update the snapshots collection anytime a new snapshot is taken or cleared > d) detect when a snapshot is manually removed from disk. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18111) Cache snapshots in memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856088#comment-17856088 ] Paulo Motta commented on CASSANDRA-18111: - I was thinking that since this is just a cache, perhaps we could have a {{snapshot_metadata_cache_size: 100MiB}} setting so the amount of memory used for snapshot metadata would be capped while providing the optimization by default ? Users wishing to disable the could just set {{snapshot_metadata_cache_size: 0MiB. }} It would be nice to validate how much this improves select * system_views.snapshots performance for large snapshot * keyspace * table * sstable counts. > Cache snapshots in memory > - > > Key: CASSANDRA-18111 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18111 > Project: Cassandra > Issue Type: Improvement > Components: Local/Snapshots >Reporter: Paulo Motta >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Everytime {{nodetool listsnapshots}} is called, all data directories are > scanned to find snapshots, what is inefficient. > For example, fetching the > {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric > can take half a second (CASSANDRA-13338). > This improvement will also allow snapshots to be efficiently queried via > virtual tables (CASSANDRA-18102). > In order to do this, we should: > a) load all snapshots from disk during initialization > b) keep a collection of snapshots on {{SnapshotManager}} > c) update the snapshots collection anytime a new snapshot is taken or cleared > d) detect when a snapshot is manually removed from disk. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18111) Cache snapshots in memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856039#comment-17856039 ] Paulo Motta commented on CASSANDRA-18111: - {quote}Is there a way to disable this functionality? I briefly took a look at the implementation but didn't see anything that would allow you to disable it. My concern is that we've seen issues with the amount of snapshots on large clusters, so this can be problematic for some clusters by putting additional memory pressure on individual hosts. {quote} I would like to understand what kind of problems did you encounter so we could try to address them here if possible. The goal of this ticket is exactly to optimize for large number of snapshots by avoiding an expensive directory traversal when snapshots are listed, so I think it would be counterproductive to disable this. See CASSANDRA-13338 which is the original motivation for this ticket. We have not considered memory cost for keeping these snapshot metadata in memory, but perhaps this is something to consider for large amounts of snapshots. Do you have a ballpark number for a very large of amount of snapshots per node in your experience ? 10K, 100K, 1M? > Cache snapshots in memory > - > > Key: CASSANDRA-18111 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18111 > Project: Cassandra > Issue Type: Improvement > Components: Local/Snapshots >Reporter: Paulo Motta >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Everytime {{nodetool listsnapshots}} is called, all data directories are > scanned to find snapshots, what is inefficient. > For example, fetching the > {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric > can take half a second (CASSANDRA-13338). > This improvement will also allow snapshots to be efficiently queried via > virtual tables (CASSANDRA-18102). > In order to do this, we should: > a) load all snapshots from disk during initialization > b) keep a collection of snapshots on {{SnapshotManager}} > c) update the snapshots collection anytime a new snapshot is taken or cleared > d) detect when a snapshot is manually removed from disk. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18111) Cache snapshots in memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-18111: Reviewers: Paulo Motta > Cache snapshots in memory > - > > Key: CASSANDRA-18111 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18111 > Project: Cassandra > Issue Type: Improvement > Components: Local/Snapshots >Reporter: Paulo Motta >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > Everytime {{nodetool listsnapshots}} is called, all data directories are > scanned to find snapshots, what is inefficient. > For example, fetching the > {{org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize}} metric > can take half a second (CASSANDRA-13338). > This improvement will also allow snapshots to be efficiently queried via > virtual tables (CASSANDRA-18102). > In order to do this, we should: > a) load all snapshots from disk during initialization > b) keep a collection of snapshots on {{SnapshotManager}} > c) update the snapshots collection anytime a new snapshot is taken or cleared > d) detect when a snapshot is manually removed from disk. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19512) Add startup flag to load a local snapshot from disk
Paulo Motta created CASSANDRA-19512: --- Summary: Add startup flag to load a local snapshot from disk Key: CASSANDRA-19512 URL: https://issues.apache.org/jira/browse/CASSANDRA-19512 Project: Cassandra Issue Type: Improvement Components: Local/Snapshots Reporter: Paulo Motta Assignee: Paulo Motta Add startup flag "cassandra.load_snapshot_unsafe=snapshot_id" that loads a snapshot with the specified ID into the sstable tracker in the initial startup phase. The flag has the {{_unsafe}} prefix because it may cause data consistency issues if this is used incorrectly. For example, if a given snapshot is loaded in a single replica of a replicated keyspace, it may cause replicas to go out of sync. For this reason, this flag should only be accepted if the "allow_load_snapshot_unsafe" guardrail is enabled (it is disabled by default). When the flag is detected during startup, snapshots with the given tag will be located. If no snapshot with the given tag exists, the startup should fail. The snapshot loading mechanism should create a hard link to existing sstables into a staging area to ensure the existing data is secured. After this, it should replace the existing sstables with the snapshot data into the sstable tracker before proceeding normally with the startup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821475#comment-17821475 ] Paulo Motta commented on CASSANDRA-17401: - Thanks for the detailed reports and repro steps. I've taken a look and this looks to me to be a legitimate race condition that can cause a re-prepare storm under large concurrency and unlucky timing. My understanding is that [these evict statements|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L735] are not required for the correctness of the upgrade compatibility logic and can be safely removed. Would you have some cycles to confirm this [~ifesdjeen] ? In addition to this, I think there's a pending issue from CASSANDRA-17248 that can leak prepared statements between keyspaces during mixed upgrade mode. Since these issues are in a related area I think it makes sense to address them together (in separate commits) to ensure these changes are tested together. I think the {{PreparedStatementCollisionTest}} suite from [this commit|https://github.com/apache/cassandra/pull/1872/commits/758bc4a89d7ca9d0bfe27e6f41000484724261bc] can help improve the validation coverage of this logic. That change looks correct to me but may need some cleanup. We should probably keep the metric changes out of this to keep the scope of this patch to a minimum. After proper review and validation I think there's value in including these fixes in the final 3.X releases to address these outstanding issues as users will still do upgrade cycles as 5.x release approaches. This will make resolution more laborious as we will need to provide patches for 3.x all the way up to trunk + CI for all branches. What do you think [~brandon.williams] [~stefan.miklosovic] ? > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Ivan Senic >Assignee: Jaydeepkumar Chovatia >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17401: Bug Category: Parent values: Correctness(12982)Level 1 values: Transient Incorrect Response(12987) Complexity: Normal Component/s: Messaging/Client Discovered By: User Report Reviewers: Paulo Motta Severity: Normal Assignee: Jaydeepkumar Chovatia Status: Open (was: Triage Needed) > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Ivan Senic >Assignee: Jaydeepkumar Chovatia >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19389) Start UCS docs with examples and use cases
[ https://issues.apache.org/jira/browse/CASSANDRA-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19389: Change Category: Operability Complexity: Low Hanging Fruit Status: Open (was: Triage Needed) > Start UCS docs with examples and use cases > -- > > Key: CASSANDRA-19389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19389 > Project: Cassandra > Issue Type: Improvement > Components: Documentation >Reporter: Jon Haddad >Priority: Normal > > Users interested in UCS are primarily going to be interested in examples of > how UCS should be used for certain types of workloads. We start the current > docs by saying it can replace every other compaction strategy, but leave it > up to the user to figure out exactly what that means for them. > Before the docs that explain how it works, let's describe how it should be > used. Users interested in the nuts and bolts can scroll down to learn the > details, but that shouldn't be a requirement to switch from an existing > compaction strategy to UCS. > A table showing examples of LCS, STCS, and TWCS converted to UCS would > suffice for 99% of people's needs. > More information in this Slack thread: > https://the-asf.slack.com/archives/CK23JSY2K/p1707700814330359 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19389) Start UCS docs with examples and use cases
[ https://issues.apache.org/jira/browse/CASSANDRA-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19389: Labels: lhf (was: ) > Start UCS docs with examples and use cases > -- > > Key: CASSANDRA-19389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19389 > Project: Cassandra > Issue Type: Improvement > Components: Documentation >Reporter: Jon Haddad >Priority: Normal > Labels: lhf > > Users interested in UCS are primarily going to be interested in examples of > how UCS should be used for certain types of workloads. We start the current > docs by saying it can replace every other compaction strategy, but leave it > up to the user to figure out exactly what that means for them. > Before the docs that explain how it works, let's describe how it should be > used. Users interested in the nuts and bolts can scroll down to learn the > details, but that shouldn't be a requirement to switch from an existing > compaction strategy to UCS. > A table showing examples of LCS, STCS, and TWCS converted to UCS would > suffice for 99% of people's needs. > More information in this Slack thread: > https://the-asf.slack.com/archives/CK23JSY2K/p1707700814330359 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17798) Flaky org.apache.cassandra.tools TopPartitionsTest testServiceTopPartitionsSingleTable
[ https://issues.apache.org/jira/browse/CASSANDRA-17798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815902#comment-17815902 ] Paulo Motta commented on CASSANDRA-17798: - FYI failed on https://ci-cassandra.apache.org/job/Cassandra-4.1/465/testReport/ > Flaky org.apache.cassandra.tools TopPartitionsTest > testServiceTopPartitionsSingleTable > -- > > Key: CASSANDRA-17798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17798 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1.x, 5.x > > > h3. > {code:java} > Error Message > If this failed you probably have to raise the beginLocalSampling duration > expected:<1> but was:<0> > Stacktrace > junit.framework.AssertionFailedError: If this failed you probably have to > raise the beginLocalSampling duration expected:<1> but was:<0> at > org.apache.cassandra.tools.TopPartitionsTest.testServiceTopPartitionsSingleTable(TopPartitionsTest.java:83) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Standard Output > INFO [main] 2022-08-02 01:49:49,333 YamlConfigurationLoader.java:104 - > Configuration location: > file:home/cassandra/cassandra/build/test/cassandra.cdc.yaml DEBUG [main] > 2022-08-02 01:49:49,339 YamlConfigurationLoader.java:124 - Loading settings > from file:home/cassandra/cassandra/build/test/cassandra.cdc.yaml INFO > [main] 2022-08-02 01:49:49,642 Config.java:1167 - Node > configuration:[allocate_tokens_for_keyspace=null; > allocate_tokens_for_local_replication_factor=null; allow_extra_insecure > ...[truncated 50809 chars]... lizing counter cache with capacity of 2 MiBs > INFO [MemtableFlushWriter:1] 2022-08-02 01:49:53,519 CacheService.java:163 - > Scheduling counter cache save to every 7200 seconds (going to save all keys). > DEBUG [MemtableFlushWriter:1] 2022-08-02 01:49:53,575 > ColumnFamilyStore.java:1330 - Flushed to > [BigTableReader(path='/home/cassandra/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/nb-1-big-Data.db')] > (1 sstables, 4.915KiB), biggest 4.915KiB, smallest 4.915KiB > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17298) Test Failure: org.apache.cassandra.cql3.MemtableSizeTest
[ https://issues.apache.org/jira/browse/CASSANDRA-17298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815684#comment-17815684 ] Paulo Motta commented on CASSANDRA-17298: - Looks like this is failing consistently in both 4.0/4.1: * https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.0/cassandra-4.0 * [https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.0/cassandra-4.1] [~e.dimitrova] I wonder if this should've been addressed by CASSANDRA-16684 or if it's a new issue. I am able to reproduce the failure locally on cassandra-4.1, even after increasing rerunsOnFailure from 2 to 4. {noformat} java.lang.AssertionError: Expected heap usage close to 75.335MiB, got 71.163MiB. at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.cassandra.cql3.MemtableSizeTest.testSizeFlaky(MemtableSizeTest.java:149) at org.apache.cassandra.Util.runCatchingAssertionError(Util.java:696) {noformat} > Test Failure: org.apache.cassandra.cql3.MemtableSizeTest > > > Key: CASSANDRA-17298 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17298 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Josh McKenzie >Priority: Normal > Fix For: 4.0.x > > > [https://ci-cassandra.apache.org/job/Cassandra-4.0/313/testReport/org.apache.cassandra.cql3/MemtableSizeTest/testTruncationReleasesLogSpace_2/] > Failed 4 times in the last 30 runs. Flakiness: 27%, Stability: 86% > Error Message > Expected heap usage close to 49.930MiB, got 41.542MiB. > {code} > Stacktrace > junit.framework.AssertionFailedError: Expected heap usage close to 49.930MiB, > got 41.542MiB. > at > org.apache.cassandra.cql3.MemtableSizeTest.testSize(MemtableSizeTest.java:130) > at org.apache.cassandra.Util.runCatchingAssertionError(Util.java:644) > at org.apache.cassandra.Util.flakyTest(Util.java:669) > at > org.apache.cassandra.cql3.MemtableSizeTest.testTruncationReleasesLogSpace(MemtableSizeTest.java:61) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17298) Test Failure: org.apache.cassandra.cql3.MemtableSizeTest
[ https://issues.apache.org/jira/browse/CASSANDRA-17298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17298: Summary: Test Failure: org.apache.cassandra.cql3.MemtableSizeTest (was: Test Failure: org.apache.cassandra.cql3.MemtableSizeTest.testTruncationReleasesLogSpace) > Test Failure: org.apache.cassandra.cql3.MemtableSizeTest > > > Key: CASSANDRA-17298 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17298 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Josh McKenzie >Priority: Normal > Fix For: 4.0.x > > > [https://ci-cassandra.apache.org/job/Cassandra-4.0/313/testReport/org.apache.cassandra.cql3/MemtableSizeTest/testTruncationReleasesLogSpace_2/] > Failed 4 times in the last 30 runs. Flakiness: 27%, Stability: 86% > Error Message > Expected heap usage close to 49.930MiB, got 41.542MiB. > {code} > Stacktrace > junit.framework.AssertionFailedError: Expected heap usage close to 49.930MiB, > got 41.542MiB. > at > org.apache.cassandra.cql3.MemtableSizeTest.testSize(MemtableSizeTest.java:130) > at org.apache.cassandra.Util.runCatchingAssertionError(Util.java:644) > at org.apache.cassandra.Util.flakyTest(Util.java:669) > at > org.apache.cassandra.cql3.MemtableSizeTest.testTruncationReleasesLogSpace(MemtableSizeTest.java:61) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812033#comment-17812033 ] Paulo Motta commented on CASSANDRA-17401: - [~chovatia.jayd...@gmail.com] I was not able to review this yet, will send an update when I get a chance to review it. If anyone else subscribed wants to review this on the meantime feel free to take it. > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19291) Fix NEWS.txt Compact Storage section
[ https://issues.apache.org/jira/browse/CASSANDRA-19291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811309#comment-17811309 ] Paulo Motta edited comment on CASSANDRA-19291 at 1/26/24 3:00 PM: -- Thanks Ekaterina and apologies for the delay. LGTM, feel free to merge it. was (Author: paulo): Thanks Ekaterina and apologies for the delay. Feel free to merge it. > Fix NEWS.txt Compact Storage section > > > Key: CASSANDRA-19291 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19291 > Project: Cassandra > Issue Type: Task > Components: Documentation >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Low > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > In CASSANDRA-16733 we added a note that Compact Storage will no longer be > supported in 5.0. The idea was that drop_compact_storage would be pulled out > of the experimental version. > This did not happen, and compact storage is still around. > I think this will not be handled at least until 6.0 (major breaking changes) > and it is good to be corrected. More and more people are upgrading to 4.0+ > and they are confused. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19291) Fix NEWS.txt Compact Storage section
[ https://issues.apache.org/jira/browse/CASSANDRA-19291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19291: Status: Ready to Commit (was: Review In Progress) Thanks Ekaterina and apologies for the delay. Feel free to merge it. > Fix NEWS.txt Compact Storage section > > > Key: CASSANDRA-19291 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19291 > Project: Cassandra > Issue Type: Task > Components: Documentation >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Low > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > In CASSANDRA-16733 we added a note that Compact Storage will no longer be > supported in 5.0. The idea was that drop_compact_storage would be pulled out > of the experimental version. > This did not happen, and compact storage is still around. > I think this will not be handled at least until 6.0 (major breaking changes) > and it is good to be corrected. More and more people are upgrading to 4.0+ > and they are confused. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19291) Fix NEWS.txt Compact Storage section
[ https://issues.apache.org/jira/browse/CASSANDRA-19291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810101#comment-17810101 ] Paulo Motta commented on CASSANDRA-19291: - Is there a ticket to take "DROP COMPACT STORAGE" out of experimental mode? If so it would probably be nice to link the Jira# in the message so people can track it. Otherwise LGTM. > Fix NEWS.txt Compact Storage section > > > Key: CASSANDRA-19291 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19291 > Project: Cassandra > Issue Type: Task > Components: Documentation >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Low > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > In CASSANDRA-16733 we added a note that Compact Storage will no longer be > supported in 5.0. The idea was that drop_compact_storage would be pulled out > of the experimental version. > This did not happen, and compact storage is still around. > I think this will not be handled at least until 6.0 (major breaking changes) > and it is good to be corrected. More and more people are upgrading to 4.0+ > and they are confused. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRASC-94) Reduce filesystem calls while streaming SSTables
[ https://issues.apache.org/jira/browse/CASSANDRASC-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809735#comment-17809735 ] Paulo Motta commented on CASSANDRASC-94: Cool, thanks for clarifying! I can create a follow-up sidecar ticket if there's movement on CASSANDRA-18111. > Reduce filesystem calls while streaming SSTables > > > Key: CASSANDRASC-94 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-94 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > Labels: pull-request-available > > When streaming snapshotted SSTables from Cassandra Sidecar, Sidecar will > perform multiple filesystem calls: > - Traverse the data directories to determine the keyspace / table path > - Once found determine if the SSTable file exists under the snapshots > directory > - Read the filesystem to obtain the file type and file size > - Read the requested range of the file and stream it > The amount of filesystem calls is manageable for streaming a single SSTable, > but when a client(s) read multiple SSTables, for example in the case of > Cassandra Analytics bulk reads, hundred to thousand of requests are performed > requiring every request to perform the above system calls. > In this improvement, it is proposed introducing several caches to reduce the > amount of system calls while streaming SSTables. > - *snapshot list cache*: to maintain a cache of recently listed snapshot > files under a snapshot directory. This cache avoids having to access the > filesystem every time a bulk read client list the snapshot directory. > - *table dir cache*: to maintain a cache of recently streamed table directory > paths. This cache helps avoiding having to traverse the filesystem searching > for the table directory while running bulk reads for example. Since bulk > reads can stream tens to hundreds of SSTable components from a snapshot > directory, this cache helps avoid having to resolve the table directory each > time. > - *snapshot path cache*: to maintain a cache of recently streamed snapshot > SSTable components. This cache avoids having to resolve the snapshot SSTable > component path during bulk reads. Since bulk reads streams sub-ranges of an > SSTable component, the resolution can happen multiple times during bulk reads > for a single SSTable component. > - *file props cache*: to maintain a cache of FileProps of recently streamed > files. This cache avoids having to validate file properties during bulk reads > for example where sub-ranges of an SSTable component are streamed, therefore > reading the file properties can occur multiple times during bulk reads of the > same file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRASC-94) Reduce filesystem calls while streaming SSTables
[ https://issues.apache.org/jira/browse/CASSANDRASC-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809601#comment-17809601 ] Paulo Motta commented on CASSANDRASC-94: I am planning to add support to caching snapshots in memory in the server as part of CASSANDRA-18111 (I have an draft patch but need to cleanup/rebase/test, should take a couple of weeks to wrap up). Do you think caching snapshots in the sidecar will be relevant with that in place? One issue I see is that that functionality will probably land in 5.x, so it's still probably useful to have sidecar caching for 4.x. > Reduce filesystem calls while streaming SSTables > > > Key: CASSANDRASC-94 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-94 > Project: Sidecar for Apache Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > Labels: pull-request-available > > When streaming snapshotted SSTables from Cassandra Sidecar, Sidecar will > perform multiple filesystem calls: > - Traverse the data directories to determine the keyspace / table path > - Once found determine if the SSTable file exists under the snapshots > directory > - Read the filesystem to obtain the file type and file size > - Read the requested range of the file and stream it > The amount of filesystem calls is manageable for streaming a single SSTable, > but when a client(s) read multiple SSTables, for example in the case of > Cassandra Analytics bulk reads, hundred to thousand of requests are performed > requiring every request to perform the above system calls. > In this improvement, it is proposed introducing several caches to reduce the > amount of system calls while streaming SSTables. > - *snapshot list cache*: to maintain a cache of recently listed snapshot > files under a snapshot directory. This cache avoids having to access the > filesystem every time a bulk read client list the snapshot directory. > - *table dir cache*: to maintain a cache of recently streamed table directory > paths. This cache helps avoiding having to traverse the filesystem searching > for the table directory while running bulk reads for example. Since bulk > reads can stream tens to hundreds of SSTable components from a snapshot > directory, this cache helps avoid having to resolve the table directory each > time. > - *snapshot path cache*: to maintain a cache of recently streamed snapshot > SSTable components. This cache avoids having to resolve the snapshot SSTable > component path during bulk reads. Since bulk reads streams sub-ranges of an > SSTable component, the resolution can happen multiple times during bulk reads > for a single SSTable component. > - *file props cache*: to maintain a cache of FileProps of recently streamed > files. This cache avoids having to validate file properties during bulk reads > for example where sub-ranges of an SSTable component are streamed, therefore > reading the file properties can occur multiple times during bulk reads of the > same file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809211#comment-17809211 ] Paulo Motta edited comment on CASSANDRA-17401 at 1/22/24 1:54 AM: -- Ok thanks [~chovatia.jayd...@gmail.com]! I'm not familiar with this area but will try to look at it if I find cycles in the next few days and nobody beats me to it. :) Btw did you observe a single occurrence of this issue, or is it recurrent? was (Author: paulo): Ok thanks [~chovatia.jayd...@gmail.com]! I'm not familiar with this area but will try to look at it if I find cycles in the next few days and nobody beats me to it. :) Btw did you just observe a single occurrence of this issue, or is it recurrent? > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809211#comment-17809211 ] Paulo Motta commented on CASSANDRA-17401: - Ok thanks [~chovatia.jayd...@gmail.com]! I'm not familiar with this area but will try to look at it if I find cycles in the next few days and nobody beats me to it. :) Btw did you just observe a single occurrence of this issue, or is it recurrent? > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache
[ https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809139#comment-17809139 ] Paulo Motta commented on CASSANDRA-17401: - Hi [~chovatia.jayd...@gmail.com] can you provide a regression test case reproducing this issue and a patch with a proposed fix ? > Race condition in QueryProcessor causes just prepared statement not to be in > the prepared statements cache > -- > > Key: CASSANDRA-17401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17401 > Project: Cassandra > Issue Type: Bug >Reporter: Ivan Senic >Priority: Normal > > The changes in the > [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638] > method that were introduced in versions *4.0.2* and *3.11.12* can cause a > race condition between two threads trying to concurrently prepare the same > statement. This race condition can cause removing of a prepared statement > from the cache, after one of the threads has received the result of the > prepare and eventually uses MD5Digest to call > [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215]. > The race condition looks like this: > * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 executes eviction of hashes > * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false > * Thread1 prepares the statement and caches it > * Thread1 returns the result of the prepare > * Thread2 executes eviction of hashes > * Thread1 tries to execute the prepared statement with the received > MD5Digest, but statement is not in the cache as it was evicted by Thread2 > I tried to reproduce this by using a Java driver, but hitting this case from > a client side is highly unlikely and I can not simulate the needed race > condition. However, we can easily reproduce this in Stargate (details > [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to > QueryProcessor. > Reproducing this in a unit test is fairly easy. I am happy to showcase this > if needed. > Note that the issue can occur only when safeToReturnCached is resolved as > false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-10821) OOM Killer terminates Cassandra when Compactions use too much memory then won't restart
[ https://issues.apache.org/jira/browse/CASSANDRA-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-10821: Description: We were writing to the DB from EC2 instances in us-east-1 at a rate of about 3000 per second, replication us-east:2 us-west:2, LeveledCompaction and DeflateCompressor. After about 48 hours some nodes had over 800 pending compactions and a few of them started getting killed for Linux OOM. Priam attempts to restart the nodes, but they fail because of corrupted saved_cahce files. Loading has finished, and the cluster is mostly idle, but 6 of the nodes were killed again last night by OOM. This is the log message where the node won't restart: ERROR [main] 2015-12-05 13:59:13,754 CassandraDaemon.java:635 - Detected unreadable sstables /media/ephemeral0/cassandra/saved_caches/KeyCache-ca.db, please check NEWS.txt and ensure that you have upgraded through all required intermediate versions, running upgradesstables This is the dmesg where the node is terminated: [360803.234422] Out of memory: Kill process 10809 (java) score 949 or sacrifice child [360803.237544] Killed process 10809 (java) total-vm:438484092kB, anon-rss:29228012kB, file-rss:107576kB This is what Compaction Stats look like currently: pending tasks: 1096 id compaction type keyspace table completed total unit progress 93eb3200-9b58-11e5-b9f1-ffef1041ec45 Compaction overlordpreprod document 8670748796 839129219651 bytes 1.03% Compaction system hints 30 1921326518 bytes 0.00% Active compaction remaining time : 27h33m47s Only 6 of the 32 nodes have compactions pending, and all on the order of 1000. was: We were writing to the DB from EC2 instances in us-east-1 at a rate of about 3000 per second, replication us-east:2 us-west:2, LeveledCompaction and DeflateCompressor. After about 48 hours some nodes had over 800 pending compactions and a few of them started getting killed for Linux OOM. Priam attempts to restart the nodes, but they fail because of corrupted saved_cahce files. Loading has finished, and the cluster is mostly idle, but 6 of the nodes were killed again last night by OOM. This is the log message where the node won't restart: ERROR [main] 2015-12-05 13:59:13,754 CassandraDaemon.java:635 - Detected unreadable sstables /media/ephemeral0/cassandra/saved_caches/KeyCache-ca.db, please check NEWS.txt and ensure that you have upgraded through all required intermediate versions, running upgradesstables This is the dmesg where the node is terminated: [360803.234422] Out of memory: Kill process 10809 (java) score 949 or sacrifice child [360803.237544] Killed process 10809 (java) total-vm:438484092kB, anon-rss:29228012kB, file-rss:107576kB This is what Compaction Stats look like currently: pending tasks: 1096 id compaction type keyspace tablecompleted totalunit progress 93eb3200-9b58-11e5-b9f1-ffef1041ec45Compaction overlordpreprod document 8670748796 839129219651 bytes 1.03% Compactionsystem hints 30 1921326518 bytes 0.00% Active compaction remaining time : 27h33m47s Only 6 of the 32 nodes have compactions pending, and all on the order of 1000. > OOM Killer terminates Cassandra when Compactions use too much memory then > won't restart > --- > > Key: CASSANDRA-10821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10821 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction > Environment: EC2 32 x i2.xlarge split between us-east-1a,c and > us-west 2a,b > Linux 4.1.10-17.31.amzn1.x86_64 #1 SMP Sat Oct 24 01:31:37 UTC 2015 x86_64 > x86_64 x86_64 GNU/Linux > Java(TM) SE Runtime Environment (build 1.8.0_65-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode) > Cassandra version: 2.2.3 >Reporter: tbartold >Priority: Normal > > > We were writing to the DB from EC2 instances in us-east-1 at a rate of about > 3000 per second, replication us-east:2 us-west:2, LeveledCompaction and > DeflateCompressor. > After about 48 hours some nodes had over 800 pending compactions and a few of > them started getting killed for Linux OOM. Priam attempts to restart the > nodes, but they fail because of corrupted saved_cahce files. > Loading has finished, and the cluster is mostly idle, but 6 of the nodes were > killed again last night by OOM. > This is the log message where the node won't restart: > ERROR [main] 2015-12-05 13:59:13,754 CassandraDaemon.java:635 - Detected > unreadable ss
[jira] [Commented] (CASSANDRASC-92) Add restore SSTables from S3 into Cassandra feature to Cassandra Sidecar
[ https://issues.apache.org/jira/browse/CASSANDRASC-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808429#comment-17808429 ] Paulo Motta commented on CASSANDRASC-92: Feel free to merge [~frankgh] - I’ll follow up later if needed when I have a chance to test this feature. Thanks! > Add restore SSTables from S3 into Cassandra feature to Cassandra Sidecar > > > Key: CASSANDRASC-92 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-92 > Project: Sidecar for Apache Cassandra > Issue Type: New Feature > Components: Rest API >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > > Through this proposal we want to add restore capability to Sidecar, for > Sidecar to allow restoring data from S3. As part of this patch we want to add > APIs for creating, updating and getting information about the restore jobs. > We also want to add background tasks for managing these restore jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807855#comment-17807855 ] Paulo Motta commented on CASSANDRA-19001: - {quote}If JAVA_HOME is not defined, Cassandra checks what is in PATH first. Do we expect users to do more modifications to PATH to adhere? It sounds a bit risky to me; I hope I do not overengineer it. WDYT? {quote} As far as I understand there is no reliable way to detect if there's a local JDK other than check if javac exists in JAVA_HOME or PATH. So the only way to figure out if the user is running on a JDK is to to check if javac exists in JAVA_HOME/bin first, if not check on PATH - this does not look like overengineering to me. What do you mean by "expect users to do more modifications to PATH to adhere"? > Check whether the startup warnings for unknown modules represent a legit > problem or cosmetic issue > -- > > Key: CASSANDRA-19001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19001 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.0.x, 5.x > > > During the 5.0 alpha 2 release > [vote|https://lists.apache.org/thread/lt3x0obr5cpbcydf5490pj6b2q0mz5zr], > [~paulo] raised the following concerns: > {code:java} > Launched a tarball-based 5.0-alpha2 container on top of > "eclipse-temurin:17-jre-focal" and the server starts up fine, can run > nodetool and cqlsh. > I got these seemingly harmless JDK17 warnings during startup and when > running nodetool (no warnings on JDK11): > WARNING: Unknown module: jdk.attach specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-opens > WARNING: A terminally deprecated method in java.lang.System has been called > WARNING: System::setSecurityManager has been called by > org.apache.cassandra.security.ThreadAwareSecurityManager > (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar) > WARNING: Please consider reporting this to the maintainers of > org.apache.cassandra.security.ThreadAwareSecurityManager > WARNING: System::setSecurityManager will be removed in a future release > Anybody knows if these warnings are legit/expected ? We can create > follow-up tickets if needed. > $ java --version > openjdk 17.0.9 2023-10-17 > OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9) > OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, > sharing) > {code} > {code:java} > Clarification: - When running nodetool only the "Unknown module" warnings > show up. All warnings show up during startup.{code} > We need to verify whether this presents a real problem in the features where > those modules are expected to be used, or if it is a false alarm. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRASC-92) Add restore SSTables from S3 into Cassandra feature to Cassandra Sidecar
[ https://issues.apache.org/jira/browse/CASSANDRASC-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807853#comment-17807853 ] Paulo Motta commented on CASSANDRASC-92: Thanks for the context [~frankgh] I plan to test this functionality at some point but please don't block this review on me. I'll add any comments later if needed. > Add restore SSTables from S3 into Cassandra feature to Cassandra Sidecar > > > Key: CASSANDRASC-92 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-92 > Project: Sidecar for Apache Cassandra > Issue Type: New Feature > Components: Rest API >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > > Through this proposal we want to add restore capability to Sidecar, for > Sidecar to allow restoring data from S3. As part of this patch we want to add > APIs for creating, updating and getting information about the restore jobs. > We also want to add background tasks for managing these restore jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807408#comment-17807408 ] Paulo Motta commented on CASSANDRA-19001: - Thanks Ekaterina! bq. 1) Decide whether we want to be checking for runtime or javac (considering the case I mentioned) I think checking for 'javac' should be fine when JAVA_HOME is not defined. If JAVA_HOME is defined, then we check for the existence of "${JAVA_HOME}/bin/javac" to determine if it's running on a JDK. Would this fix your edge case? bq. 2) IMHO, we should not prevent all sjk commands from running if JRE is detected sounds good to me, the warning from sjk itself {{ERROR 14:04:02,644 Java home points to /Library/Java/JavaVirtualMachines/temurin-17.jre/Contents/Home make sure it is not a JRE path}} should be sufficient > Check whether the startup warnings for unknown modules represent a legit > problem or cosmetic issue > -- > > Key: CASSANDRA-19001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19001 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.0.x, 5.x > > > During the 5.0 alpha 2 release > [vote|https://lists.apache.org/thread/lt3x0obr5cpbcydf5490pj6b2q0mz5zr], > [~paulo] raised the following concerns: > {code:java} > Launched a tarball-based 5.0-alpha2 container on top of > "eclipse-temurin:17-jre-focal" and the server starts up fine, can run > nodetool and cqlsh. > I got these seemingly harmless JDK17 warnings during startup and when > running nodetool (no warnings on JDK11): > WARNING: Unknown module: jdk.attach specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-opens > WARNING: A terminally deprecated method in java.lang.System has been called > WARNING: System::setSecurityManager has been called by > org.apache.cassandra.security.ThreadAwareSecurityManager > (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar) > WARNING: Please consider reporting this to the maintainers of > org.apache.cassandra.security.ThreadAwareSecurityManager > WARNING: System::setSecurityManager will be removed in a future release > Anybody knows if these warnings are legit/expected ? We can create > follow-up tickets if needed. > $ java --version > openjdk 17.0.9 2023-10-17 > OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9) > OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, > sharing) > {code} > {code:java} > Clarification: - When running nodetool only the "Unknown module" warnings > show up. All warnings show up during startup.{code} > We need to verify whether this presents a real problem in the features where > those modules are expected to be used, or if it is a false alarm. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRASC-92) Add restore SSTables from S3 into Cassandra feature to Cassandra Sidecar
[ https://issues.apache.org/jira/browse/CASSANDRASC-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806164#comment-17806164 ] Paulo Motta commented on CASSANDRASC-92: This looks interesting! I'll take a look at this patch. Are there plans to support sstable export capability to S3, or just restore for the time being? > Add restore SSTables from S3 into Cassandra feature to Cassandra Sidecar > > > Key: CASSANDRASC-92 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-92 > Project: Sidecar for Apache Cassandra > Issue Type: New Feature > Components: Rest API >Reporter: Saranya Krishnakumar >Assignee: Saranya Krishnakumar >Priority: Normal > > Through this proposal we want to add restore capability to Sidecar, for > Sidecar to allow restoring data from S3. As part of this patch we want to add > APIs for creating, updating and getting information about the restore jobs. > We also want to add background tasks for managing these restore jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19259) upgrade_tests.upgrade_through_versions_test consistently failing on circleci
[ https://issues.apache.org/jira/browse/CASSANDRA-19259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805218#comment-17805218 ] Paulo Motta commented on CASSANDRA-19259: - Failing tests are: * upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD * upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD * upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD * upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD * upgrade_tests.upgrade_through_versions_test.TestProtoV5Upgrade_AllVersions_EndsAt_Trunk_HEAD * upgrade_tests.upgrade_through_versions_test.TestProtoV5Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD > upgrade_tests.upgrade_through_versions_test consistently failing on circleci > > > Key: CASSANDRA-19259 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19259 > Project: Cassandra > Issue Type: Task > Components: Local/Other >Reporter: Paulo Motta >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0-rc > > > This suite is consistently failing in > [4.0|https://app.circleci.com/pipelines/github/driftx/cassandra/1454/workflows/0357136e-cee3-42e4-900b-3347fc8d42d3/jobs/71008/tests] > and > [4.1|https://app.circleci.com/pipelines/github/driftx/cassandra/1453/workflows/dd1732df-271c-43bc-bc5f-8577c605c746/jobs/71009/tests] > with the following stack trace: > {noformat} > self = > process = > def _update_pid(self, process): > """ > Reads pid from cassandra.pid file and stores in the self.pid > After setting up pid updates status (UP, DOWN, etc) and node.conf > """ > pidfile = os.path.join(self.get_path(), 'cassandra.pid') > > start = time.time() > while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0): > if (time.time() - start > 30.0): > common.error("Timed out waiting for pidfile to be filled > (current time is {})".format(datetime.now())) > break > else: > time.sleep(0.1) > > try: > > with open(pidfile, 'rb') as f: > E FileNotFoundError: [Errno 2] No such file or directory: > '/tmp/dtest-_8rdmjs0/test/node1/cassandra.pid' > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2100: FileNotFoundError > During handling of the above exception, another exception occurred: > self = > object at 0x7f4c01419438> > def test_parallel_upgrade(self): > """ > Test upgrading cluster all at once (requires cluster downtime). > """ > > self.upgrade_scenario() > upgrade_tests/upgrade_through_versions_test.py:387: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > upgrade_tests/upgrade_through_versions_test.py:491: in upgrade_scenario > self.upgrade_to_version(version_meta, internode_ssl=internode_ssl) > upgrade_tests/upgrade_through_versions_test.py:580: in upgrade_to_version > jvm_args=['-Dcassandra.disable_max_protocol_auto_override=true']) # > prevent protocol capping in mixed version clusters > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:906: in start > if not self._wait_for_running(process, timeout_s=7): > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:931: in _wait_for_running > self._update_pid(process) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > process = > def _update_pid(self, process): > """ > Reads pid from cassandra.pid file and stores in the self.pid > After setting up pid updates status (UP, DOWN, etc) and node.conf > """ > pidfile = os.path.join(self.get_path(), 'cassandra.pid') > > start = time.time() > while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0): > if (time.time() - start > 30.0): > common.error("Timed out waiting for pidfile to be filled > (current time is {})".format(datetime.now())) > break > else: > time.sleep(0.1) > > try: > with open(pidfile, 'rb') as f: >
[jira] [Updated] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-18999: Source Control Link: https://github.com/apache/cassandra/commit/475c0035e6e04526eaf50805d33156ac9b828ab6 Resolution: Fixed Status: Resolved (was: Ready to Commit) Thanks Brandon and Stefan. I'm confident these failures are unrelated so I optimistically committed this to 4.0+ on [475c0035e6e04526eaf50805d33156ac9b828ab6|https://github.com/apache/cassandra/commit/475c0035e6e04526eaf50805d33156ac9b828ab6] to avoid dragging this for any longer given current CI restrictions. I created CASSANDRA-19259 to address these failures separately. We should attempt a green upgrade CI run before next 4.0/4.1 releases. > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-18999: Fix Version/s: 4.0.12 4.1.4 5.0-beta2 (was: 4.0.x) (was: 4.1.x) (was: 5.0.x) > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.12, 4.1.4, 5.0-beta2 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) 01/01: Merge branch 'cassandra-4.0' into cassandra-4.1
This is an automated email from the ASF dual-hosted git repository. paulo pushed a commit to branch cassandra-4.1 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 3d1b981d8968635660eb015292891e57d1212c2e Merge: 4dd69dc62d 475c0035e6 Author: Paulo Motta AuthorDate: Wed Jan 10 11:17:01 2024 -0500 Merge branch 'cassandra-4.0' into cassandra-4.1 Closes #2968 CHANGES.txt| 1 + src/java/org/apache/cassandra/gms/Gossiper.java| 15 --- .../schema/SystemDistributedKeyspace.java | 2 +- .../apache/cassandra/tracing/TraceKeyspace.java| 4 +- test/unit/org/apache/cassandra/Util.java | 2 +- .../org/apache/cassandra/gms/GossiperTest.java | 50 +- 6 files changed, 61 insertions(+), 13 deletions(-) diff --cc CHANGES.txt index 66144ce1e6,d944415f76..ec0e7c60d7 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,13 -1,5 +1,14 @@@ -4.0.12 +4.1.4 + * Memoize Cassandra verion and add a backoff interval for failed schema pulls (CASSANDRA-18902) + * Fix StackOverflowError on ALTER after many previous schema changes (CASSANDRA-19166) + * Fixed the inconsistency between distributedKeyspaces and distributedAndLocalKeyspaces (CASSANDRA-18747) + * Internode legacy SSL storage port certificate is not hot reloaded on update (CASSANDRA-18681) + * Nodetool paxos-only repair is no longer incremental (CASSANDRA-18466) + * Waiting indefinitely on ReceivedMessage response in StreamSession#receive() can cause deadlock (CASSANDRA-18733) + * Allow empty keystore_password in encryption_options (CASSANDRA-18778) + * Skip ColumnFamilyStore#topPartitions initialization when client or tool mode (CASSANDRA-18697) +Merged from 4.0: + * Fix Gossiper::hasMajorVersion3Nodes to return false during minor upgrade (CASSANDRA-18999) * Revert unnecessary read lock acquisition when reading ring version in TokenMetadata introduced in CASSANDRA-16286 (CASSANDRA-19107) * Support max SSTable size in sorted CQLSSTableWriter (CASSANDRA-18941) * Fix nodetool repair_admin summarize-pending command to not throw exception (CASSANDRA-19014) diff --cc src/java/org/apache/cassandra/gms/Gossiper.java index 0d5db5f81c,22595b299a..018e20542d --- a/src/java/org/apache/cassandra/gms/Gossiper.java +++ b/src/java/org/apache/cassandra/gms/Gossiper.java @@@ -1614,33 -1555,11 +1615,33 @@@ public class Gossiper implements IFailu localState.addApplicationStates(updatedStates); // get rid of legacy fields once the cluster is not in mixed mode - if (!hasMajorVersion3Nodes()) + if (!hasMajorVersion3OrUnknownNodes()) localState.removeMajorVersion3LegacyApplicationStates(); +// need to run STATUS or STATUS_WITH_PORT first to handle BOOT_REPLACE correctly (else won't be a member, so TOKENS won't be processed) +for (Entry updatedEntry : updatedStates) +{ +switch (updatedEntry.getKey()) +{ +default: +continue; +case STATUS: +if (localState.containsApplicationState(ApplicationState.STATUS_WITH_PORT)) +continue; +case STATUS_WITH_PORT: +} +doOnChangeNotifications(addr, updatedEntry.getKey(), updatedEntry.getValue()); +} + for (Entry updatedEntry : updatedStates) { +switch (updatedEntry.getKey()) +{ +// We should have alredy handled these two states above: +case STATUS_WITH_PORT: +case STATUS: +continue; +} // filters out legacy change notifications // only if local state already indicates that the peer has the new fields if ((ApplicationState.INTERNAL_IP == updatedEntry.getKey() && localState.containsApplicationState(ApplicationState.INTERNAL_ADDRESS_AND_PORT)) diff --cc src/java/org/apache/cassandra/schema/SystemDistributedKeyspace.java index dc40093d4d,00..d63bbace79 mode 100644,00..100644 --- a/src/java/org/apache/cassandra/schema/SystemDistributedKeyspace.java +++ b/src/java/org/apache/cassandra/schema/SystemDistributedKeyspace.java @@@ -1,409 -1,0 +1,409 @@@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed unde
(cassandra) branch cassandra-5.0 updated (14c773d8bc -> e04a3176ff)
This is an automated email from the ASF dual-hosted git repository. paulo pushed a change to branch cassandra-5.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git from 14c773d8bc Merge branch 'cassandra-4.1' into cassandra-5.0 new 475c0035e6 [CASSANDRA-18999] Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes. new 3d1b981d89 Merge branch 'cassandra-4.0' into cassandra-4.1 new e04a3176ff Merge branch 'cassandra-4.1' into cassandra-5.0 The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGES.txt| 1 + src/java/org/apache/cassandra/gms/Gossiper.java| 14 +++--- .../schema/SystemDistributedKeyspace.java | 2 +- .../apache/cassandra/tracing/TraceKeyspace.java| 4 +- test/unit/org/apache/cassandra/Util.java | 2 +- .../org/apache/cassandra/gms/GossiperTest.java | 50 +- 6 files changed, 61 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch cassandra-4.0 updated: [CASSANDRA-18999] Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
This is an automated email from the ASF dual-hosted git repository. paulo pushed a commit to branch cassandra-4.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/cassandra-4.0 by this push: new 475c0035e6 [CASSANDRA-18999] Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes. 475c0035e6 is described below commit 475c0035e6e04526eaf50805d33156ac9b828ab6 Author: Isaac Reath AuthorDate: Fri Jan 5 12:57:21 2024 -0500 [CASSANDRA-18999] Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes. This commit fixes Gossiper::hasMajorVersion3Nodes so that it does not return true when all hosts have a known version, no hosts are on a version earlier than 4.0, and there is a 4.x minor version or patch version upgrade in progress. Additionally, this commit improves the clarity of Gossiper::hasMajorVersion3Nodes's name to indicate that it will return true when the cluster has 3.x nodes or if the cluster state is unknown, matching the description in the in-line comment. patch by Isaac Reath; reviewed by Paulo Motta and Stefan Miklosovic for CASSANDRA-18999 Closes #2967 --- CHANGES.txt| 1 + src/java/org/apache/cassandra/gms/Gossiper.java| 15 --- .../repair/SystemDistributedKeyspace.java | 2 +- .../apache/cassandra/tracing/TraceKeyspace.java| 4 +- test/unit/org/apache/cassandra/Util.java | 2 +- .../org/apache/cassandra/gms/GossiperTest.java | 50 +- 6 files changed, 61 insertions(+), 13 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index 0edb216735..d944415f76 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0.12 + * Fix Gossiper::hasMajorVersion3Nodes to return false during minor upgrade (CASSANDRA-18999) * Revert unnecessary read lock acquisition when reading ring version in TokenMetadata introduced in CASSANDRA-16286 (CASSANDRA-19107) * Support max SSTable size in sorted CQLSSTableWriter (CASSANDRA-18941) * Fix nodetool repair_admin summarize-pending command to not throw exception (CASSANDRA-19014) diff --git a/src/java/org/apache/cassandra/gms/Gossiper.java b/src/java/org/apache/cassandra/gms/Gossiper.java index f88ee44edf..22595b299a 100644 --- a/src/java/org/apache/cassandra/gms/Gossiper.java +++ b/src/java/org/apache/cassandra/gms/Gossiper.java @@ -170,6 +170,7 @@ public class Gossiper implements IFailureDetectionEventListener, GossiperMBean * This property and anything that checks it should be removed in 5.0 */ private volatile boolean upgradeInProgressPossible = true; +private volatile boolean hasNodeWithUnknownVersion = false; public void clearUnsafe() { @@ -206,14 +207,14 @@ public class Gossiper implements IFailureDetectionEventListener, GossiperMBean } // Check the release version of all the peers it heard of. Not necessary the peer that it has/had contacted with. -boolean allHostsHaveKnownVersion = true; +hasNodeWithUnknownVersion = false; for (InetAddressAndPort host : endpointStateMap.keySet()) { CassandraVersion version = getReleaseVersion(host); //Raced with changes to gossip state, wait until next iteration if (version == null) -allHostsHaveKnownVersion = false; +hasNodeWithUnknownVersion = true; else if (version.compareTo(minVersion) < 0) minVersion = version; } @@ -221,7 +222,7 @@ public class Gossiper implements IFailureDetectionEventListener, GossiperMBean if (minVersion.compareTo(SystemKeyspace.CURRENT_VERSION) < 0) return new ExpiringMemoizingSupplier.Memoized<>(minVersion); -if (!allHostsHaveKnownVersion) +if (hasNodeWithUnknownVersion) return new ExpiringMemoizingSupplier.NotMemoized<>(minVersion); upgradeInProgressPossible = false; @@ -1466,7 +1467,7 @@ public class Gossiper implements IFailureDetectionEventListener, GossiperMBean EndpointState localEpStatePtr = endpointStateMap.get(ep); EndpointState remoteState = entry.getValue(); -if (!hasMajorVersion3Nodes()) +if (!hasMajorVersion3OrUnknownNodes()) remoteState.removeMajorVersion3LegacyApplicationStates(); /* @@ -1554,7 +1555,7 @@ public class Gossiper implements IFailureDetectionEventListener, GossiperMBean localState.addApplicationStates(updatedStates); // get rid of legacy fields once the cluster is not in mixed mode -if (!hasMajorVersion3Nodes()) +if (!hasMajorVersion3OrUnknownNodes()) localState.removeMajorVersion3LegacyApplicationStates();
(cassandra) branch cassandra-4.1 updated (4dd69dc62d -> 3d1b981d89)
This is an automated email from the ASF dual-hosted git repository. paulo pushed a change to branch cassandra-4.1 in repository https://gitbox.apache.org/repos/asf/cassandra.git from 4dd69dc62d Merge branch 'cassandra-4.0' into cassandra-4.1 new 475c0035e6 [CASSANDRA-18999] Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes. new 3d1b981d89 Merge branch 'cassandra-4.0' into cassandra-4.1 The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGES.txt| 1 + src/java/org/apache/cassandra/gms/Gossiper.java| 15 --- .../schema/SystemDistributedKeyspace.java | 2 +- .../apache/cassandra/tracing/TraceKeyspace.java| 4 +- test/unit/org/apache/cassandra/Util.java | 2 +- .../org/apache/cassandra/gms/GossiperTest.java | 50 +- 6 files changed, 61 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19259) upgrade_tests.upgrade_through_versions_test consistently failing on circleci
[ https://issues.apache.org/jira/browse/CASSANDRA-19259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19259: Description: This suite is consistently failing in [4.0|https://app.circleci.com/pipelines/github/driftx/cassandra/1454/workflows/0357136e-cee3-42e4-900b-3347fc8d42d3/jobs/71008/tests] and [4.1|https://app.circleci.com/pipelines/github/driftx/cassandra/1453/workflows/dd1732df-271c-43bc-bc5f-8577c605c746/jobs/71009/tests] with the following stack trace: {noformat} self = process = def _update_pid(self, process): """ Reads pid from cassandra.pid file and stores in the self.pid After setting up pid updates status (UP, DOWN, etc) and node.conf """ pidfile = os.path.join(self.get_path(), 'cassandra.pid') start = time.time() while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0): if (time.time() - start > 30.0): common.error("Timed out waiting for pidfile to be filled (current time is {})".format(datetime.now())) break else: time.sleep(0.1) try: > with open(pidfile, 'rb') as f: E FileNotFoundError: [Errno 2] No such file or directory: '/tmp/dtest-_8rdmjs0/test/node1/cassandra.pid' ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2100: FileNotFoundError During handling of the above exception, another exception occurred: self = def test_parallel_upgrade(self): """ Test upgrading cluster all at once (requires cluster downtime). """ > self.upgrade_scenario() upgrade_tests/upgrade_through_versions_test.py:387: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ upgrade_tests/upgrade_through_versions_test.py:491: in upgrade_scenario self.upgrade_to_version(version_meta, internode_ssl=internode_ssl) upgrade_tests/upgrade_through_versions_test.py:580: in upgrade_to_version jvm_args=['-Dcassandra.disable_max_protocol_auto_override=true']) # prevent protocol capping in mixed version clusters ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:906: in start if not self._wait_for_running(process, timeout_s=7): ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:931: in _wait_for_running self._update_pid(process) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = process = def _update_pid(self, process): """ Reads pid from cassandra.pid file and stores in the self.pid After setting up pid updates status (UP, DOWN, etc) and node.conf """ pidfile = os.path.join(self.get_path(), 'cassandra.pid') start = time.time() while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0): if (time.time() - start > 30.0): common.error("Timed out waiting for pidfile to be filled (current time is {})".format(datetime.now())) break else: time.sleep(0.1) try: with open(pidfile, 'rb') as f: if common.is_modern_windows_install(self.get_base_cassandra_version()): self.pid = int(f.readline().strip().decode('utf-16').strip()) else: self.pid = int(f.readline().strip()) except IOError as e: > raise NodeError('Problem starting node %s due to %s' % (self.name, > e), process) E ccmlib.node.NodeError: Problem starting node node1 due to [Errno 2] No such file or directory: '/tmp/dtest-_8rdmjs0/test/node1/cassandra.pid' ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2106: NodeError {noformat} It's not clear whether this reproduces locally or just on circleci. We should address these failures before next 4.0.12 and 4.1.4 releases. was: This suite is consistently failing in [4.0|https://app.circleci.com/pipelines/github/driftx/cassandra/1454/workflows/0357136e-cee3-42e4-900b-3347fc8d42d3/jobs/71008/tests] and [4.1|https://app.circleci.com/pipelines/github/driftx/cassandra/1453/workflows/dd1732df-271c-43bc-bc5f-8577c605c746/jobs/71009/tests] with the following stack trace: {noformat} self = process = def _update_pid(self, process): """ Reads pid from cassandra.pid file and stores in the self.pid After setting up pid updates status (UP, DOWN, etc) and node.conf """ pidfile = os.path.join(self.get_path(), 'cassandra.pid') start = time.time() while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0):
(cassandra) branch trunk updated (2e7c0ee5c6 -> 7d6cc31b21)
This is an automated email from the ASF dual-hosted git repository. paulo pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git from 2e7c0ee5c6 Merge branch 'cassandra-5.0' into trunk new 475c0035e6 [CASSANDRA-18999] Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes. new 3d1b981d89 Merge branch 'cassandra-4.0' into cassandra-4.1 new e04a3176ff Merge branch 'cassandra-4.1' into cassandra-5.0 new 7d6cc31b21 Merge branch 'cassandra-5.0' into trunk The 4 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) 01/01: Merge branch 'cassandra-4.1' into cassandra-5.0
This is an automated email from the ASF dual-hosted git repository. paulo pushed a commit to branch cassandra-5.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit e04a3176ffbcb1192ac9081cd980d1e4592ba3f5 Merge: 14c773d8bc 3d1b981d89 Author: Paulo Motta AuthorDate: Wed Jan 10 11:20:54 2024 -0500 Merge branch 'cassandra-4.1' into cassandra-5.0 Closes #3004 CHANGES.txt| 1 + src/java/org/apache/cassandra/gms/Gossiper.java| 14 +++--- .../schema/SystemDistributedKeyspace.java | 2 +- .../apache/cassandra/tracing/TraceKeyspace.java| 4 +- test/unit/org/apache/cassandra/Util.java | 2 +- .../org/apache/cassandra/gms/GossiperTest.java | 50 +- 6 files changed, 61 insertions(+), 12 deletions(-) diff --cc CHANGES.txt index 0e2306dc68,ec0e7c60d7..95047150c0 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,42 -1,15 +1,43 @@@ -4.1.4 +5.0-beta2 + * Creating a SASI index after creating an SAI index does not break secondary index queries (CASSANDRA-18939) + * Optionally fail when a non-partition-restricted query is issued against an index (CASSANDRA-18796) + * Add a startup check to fail startup when using invalid configuration with certain Kernel and FS type (CASSANDRA-19196) + * UCS min_sstable_size should not be lower than target_sstable_size lower bound (CASSANDRA-19112) + * Fix the correspondingMessagingVersion of SSTable format and improve TTL overflow tests coverage (CASSANDRA-19197) + * Fix resource cleanup after SAI query timeouts (CASSANDRA-19177) + * Suppress CVE-2023-6481 (CASSANDRA-19184) +Merged from 4.1: * Memoize Cassandra verion and add a backoff interval for failed schema pulls (CASSANDRA-18902) * Fix StackOverflowError on ALTER after many previous schema changes (CASSANDRA-19166) - * Fixed the inconsistency between distributedKeyspaces and distributedAndLocalKeyspaces (CASSANDRA-18747) - * Internode legacy SSL storage port certificate is not hot reloaded on update (CASSANDRA-18681) - * Nodetool paxos-only repair is no longer incremental (CASSANDRA-18466) - * Waiting indefinitely on ReceivedMessage response in StreamSession#receive() can cause deadlock (CASSANDRA-18733) - * Allow empty keystore_password in encryption_options (CASSANDRA-18778) - * Skip ColumnFamilyStore#topPartitions initialization when client or tool mode (CASSANDRA-18697) Merged from 4.0: + * Fix Gossiper::hasMajorVersion3Nodes to return false during minor upgrade (CASSANDRA-18999) * Revert unnecessary read lock acquisition when reading ring version in TokenMetadata introduced in CASSANDRA-16286 (CASSANDRA-19107) +Merged from 3.11: +Merged from 3.0: + + +5.0-beta1 + * Fix SAI intersection queries (CASSANDRA-19011) + * Clone EndpointState before sending GossipShutdown message (CASSANDRA-19115) + * SAI indexes are marked queryable during truncation (CASSANDRA-19032) + * Enable Direct-IO feature for CommitLog files using Java native API's. (CASSANDRA-18464) + * SAI fixes for composite partitions, and static and non-static rows intersections (CASSANDRA-19034) + * Improve SAI IndexContext handling of indexed and non-indexed columns in queries (CASSANDRA-18166) + * Fixed bug where UnifiedCompactionTask constructor was calling the wrong base constructor of CompactionTask (CASSANDRA-18757) + * Fix SAI unindexed contexts not considering CONTAINS KEY (CASSANDRA-19040) + * Ensure that empty SAI column indexes do not fail on validation after full-SSTable streaming (CASSANDRA-19017) + * SAI in-memory index should check max term size (CASSANDRA-18926) + * Set default disk_access_mode to mmap_index_only (CASSANDRA-19021) + * Exclude net.java.dev.jna:jna dependency from dependencies of org.caffinitas.ohc:ohc-core (CASSANDRA-18992) + * Add UCS sstable_growth and min_sstable_size options (CASSANDRA-18945) + * Make cqlsh's min required Python version 3.7+ instead of 3.6+ (CASSANDRA-18960) + * Fix incorrect seeking through the sstable iterator by IndexState (CASSANDRA-18932) + * Upgrade Python driver to 3.28.0 (CASSANDRA-18960) + * Add retries to IR messages (CASSANDRA-18962) + * Add metrics and logging to repair retries (CASSANDRA-18952) + * Remove deprecated code in Cassandra 1.x and 2.x (CASSANDRA-18959) + * ClientRequestSize metrics should not treat CONTAINS restrictions as being equality-based (CASSANDRA-18896) +Merged from 4.0: * Support max SSTable size in sorted CQLSSTableWriter (CASSANDRA-18941) * Fix nodetool repair_admin summarize-pending command to not throw exception (CASSANDRA-19014) * Fix cassandra-stress in simplenative mode with prepared statements (CASSANDRA-18744) diff --cc src/java/org/apache/cassandra/gms/Gossiper.java index b5b0caec77,018e20542d..5a616a4eae --- a/src/java/org/apache/cassandra/gms/Gossiper.java +++ b/src/java/org/apache/cassandra/gms/Gossiper.java @@@ -229,15 -219,14 +230,15 @@@ p
(cassandra) 01/01: Merge branch 'cassandra-5.0' into trunk
This is an automated email from the ASF dual-hosted git repository. paulo pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 7d6cc31b216f464e404527eeb10c6f1ab97ab828 Merge: 2e7c0ee5c6 e04a3176ff Author: Paulo Motta AuthorDate: Wed Jan 10 11:22:03 2024 -0500 Merge branch 'cassandra-5.0' into trunk - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19259) upgrade_tests.upgrade_through_versions_test consistently failing on circleci
[ https://issues.apache.org/jira/browse/CASSANDRA-19259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805209#comment-17805209 ] Paulo Motta commented on CASSANDRA-19259: - [~stefan.miklosovic] can you try to reproduce this locally if you have a dtest setup? I can try but still need to setup my environment. I want to check if this reproduces locally or if it's a CI issue. > upgrade_tests.upgrade_through_versions_test consistently failing on circleci > > > Key: CASSANDRA-19259 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19259 > Project: Cassandra > Issue Type: Task > Components: Local/Other >Reporter: Paulo Motta >Priority: Normal > Fix For: 4.0.12, 4.1.4, 5.0-beta2 > > > This suite is consistently failing in > [4.0|https://app.circleci.com/pipelines/github/driftx/cassandra/1454/workflows/0357136e-cee3-42e4-900b-3347fc8d42d3/jobs/71008/tests] > and > [4.1|https://app.circleci.com/pipelines/github/driftx/cassandra/1453/workflows/dd1732df-271c-43bc-bc5f-8577c605c746/jobs/71009/tests] > with the following stack trace: > {noformat} > self = > process = > def _update_pid(self, process): > """ > Reads pid from cassandra.pid file and stores in the self.pid > After setting up pid updates status (UP, DOWN, etc) and node.conf > """ > pidfile = os.path.join(self.get_path(), 'cassandra.pid') > > start = time.time() > while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0): > if (time.time() - start > 30.0): > common.error("Timed out waiting for pidfile to be filled > (current time is {})".format(datetime.now())) > break > else: > time.sleep(0.1) > > try: > > with open(pidfile, 'rb') as f: > E FileNotFoundError: [Errno 2] No such file or directory: > '/tmp/dtest-_8rdmjs0/test/node1/cassandra.pid' > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2100: FileNotFoundError > During handling of the above exception, another exception occurred: > self = > object at 0x7f4c01419438> > def test_parallel_upgrade(self): > """ > Test upgrading cluster all at once (requires cluster downtime). > """ > > self.upgrade_scenario() > upgrade_tests/upgrade_through_versions_test.py:387: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > upgrade_tests/upgrade_through_versions_test.py:491: in upgrade_scenario > self.upgrade_to_version(version_meta, internode_ssl=internode_ssl) > upgrade_tests/upgrade_through_versions_test.py:580: in upgrade_to_version > jvm_args=['-Dcassandra.disable_max_protocol_auto_override=true']) # > prevent protocol capping in mixed version clusters > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:906: in start > if not self._wait_for_running(process, timeout_s=7): > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:931: in _wait_for_running > self._update_pid(process) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > process = > def _update_pid(self, process): > """ > Reads pid from cassandra.pid file and stores in the self.pid > After setting up pid updates status (UP, DOWN, etc) and node.conf > """ > pidfile = os.path.join(self.get_path(), 'cassandra.pid') > > start = time.time() > while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0): > if (time.time() - start > 30.0): > common.error("Timed out waiting for pidfile to be filled > (current time is {})".format(datetime.now())) > break > else: > time.sleep(0.1) > > try: > with open(pidfile, 'rb') as f: > if > common.is_modern_windows_install(self.get_base_cassandra_version()): > self.pid = > int(f.readline().strip().decode('utf-16').strip()) > else: > self.pid = int(f.readline().strip()) > except IOError as e: > > raise NodeError('Problem starting node %
[jira] [Updated] (CASSANDRA-19259) upgrade_tests.upgrade_through_versions_test consistently failing on circleci
[ https://issues.apache.org/jira/browse/CASSANDRA-19259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19259: Change Category: Quality Assurance Complexity: Normal Component/s: Local/Other Fix Version/s: 4.0.12 4.1.4 5.0-beta2 Status: Open (was: Triage Needed) > upgrade_tests.upgrade_through_versions_test consistently failing on circleci > > > Key: CASSANDRA-19259 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19259 > Project: Cassandra > Issue Type: Task > Components: Local/Other >Reporter: Paulo Motta >Priority: Normal > Fix For: 4.0.12, 4.1.4, 5.0-beta2 > > > This suite is consistently failing in > [4.0|https://app.circleci.com/pipelines/github/driftx/cassandra/1454/workflows/0357136e-cee3-42e4-900b-3347fc8d42d3/jobs/71008/tests] > and > [4.1|https://app.circleci.com/pipelines/github/driftx/cassandra/1453/workflows/dd1732df-271c-43bc-bc5f-8577c605c746/jobs/71009/tests] > with the following stack trace: > {noformat} > self = > process = > def _update_pid(self, process): > """ > Reads pid from cassandra.pid file and stores in the self.pid > After setting up pid updates status (UP, DOWN, etc) and node.conf > """ > pidfile = os.path.join(self.get_path(), 'cassandra.pid') > > start = time.time() > while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0): > if (time.time() - start > 30.0): > common.error("Timed out waiting for pidfile to be filled > (current time is {})".format(datetime.now())) > break > else: > time.sleep(0.1) > > try: > > with open(pidfile, 'rb') as f: > E FileNotFoundError: [Errno 2] No such file or directory: > '/tmp/dtest-_8rdmjs0/test/node1/cassandra.pid' > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2100: FileNotFoundError > During handling of the above exception, another exception occurred: > self = > object at 0x7f4c01419438> > def test_parallel_upgrade(self): > """ > Test upgrading cluster all at once (requires cluster downtime). > """ > > self.upgrade_scenario() > upgrade_tests/upgrade_through_versions_test.py:387: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > upgrade_tests/upgrade_through_versions_test.py:491: in upgrade_scenario > self.upgrade_to_version(version_meta, internode_ssl=internode_ssl) > upgrade_tests/upgrade_through_versions_test.py:580: in upgrade_to_version > jvm_args=['-Dcassandra.disable_max_protocol_auto_override=true']) # > prevent protocol capping in mixed version clusters > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:906: in start > if not self._wait_for_running(process, timeout_s=7): > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:931: in _wait_for_running > self._update_pid(process) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > process = > def _update_pid(self, process): > """ > Reads pid from cassandra.pid file and stores in the self.pid > After setting up pid updates status (UP, DOWN, etc) and node.conf > """ > pidfile = os.path.join(self.get_path(), 'cassandra.pid') > > start = time.time() > while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0): > if (time.time() - start > 30.0): > common.error("Timed out waiting for pidfile to be filled > (current time is {})".format(datetime.now())) > break > else: > time.sleep(0.1) > > try: > with open(pidfile, 'rb') as f: > if > common.is_modern_windows_install(self.get_base_cassandra_version()): > self.pid = > int(f.readline().strip().decode('utf-16').strip()) > else: > self.pid = int(f.readline().strip()) > except IOError as e: > > raise NodeError('Problem starting node %s due to %s' % > > (self.name, e), process) > E ccmlib.node
[jira] [Updated] (CASSANDRA-19259) upgrade_tests.upgrade_through_versions_test consistently failing on circleci
[ https://issues.apache.org/jira/browse/CASSANDRA-19259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19259: Workflow: Copy of Cassandra Default Workflow (was: Copy of Cassandra Bug Workflow) Issue Type: Task (was: Bug) > upgrade_tests.upgrade_through_versions_test consistently failing on circleci > > > Key: CASSANDRA-19259 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19259 > Project: Cassandra > Issue Type: Task > Reporter: Paulo Motta >Priority: Normal > > This suite is consistently failing in > [4.0|https://app.circleci.com/pipelines/github/driftx/cassandra/1454/workflows/0357136e-cee3-42e4-900b-3347fc8d42d3/jobs/71008/tests] > and > [4.1|https://app.circleci.com/pipelines/github/driftx/cassandra/1453/workflows/dd1732df-271c-43bc-bc5f-8577c605c746/jobs/71009/tests] > with the following stack trace: > {noformat} > self = > process = > def _update_pid(self, process): > """ > Reads pid from cassandra.pid file and stores in the self.pid > After setting up pid updates status (UP, DOWN, etc) and node.conf > """ > pidfile = os.path.join(self.get_path(), 'cassandra.pid') > > start = time.time() > while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0): > if (time.time() - start > 30.0): > common.error("Timed out waiting for pidfile to be filled > (current time is {})".format(datetime.now())) > break > else: > time.sleep(0.1) > > try: > > with open(pidfile, 'rb') as f: > E FileNotFoundError: [Errno 2] No such file or directory: > '/tmp/dtest-_8rdmjs0/test/node1/cassandra.pid' > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2100: FileNotFoundError > During handling of the above exception, another exception occurred: > self = > object at 0x7f4c01419438> > def test_parallel_upgrade(self): > """ > Test upgrading cluster all at once (requires cluster downtime). > """ > > self.upgrade_scenario() > upgrade_tests/upgrade_through_versions_test.py:387: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > upgrade_tests/upgrade_through_versions_test.py:491: in upgrade_scenario > self.upgrade_to_version(version_meta, internode_ssl=internode_ssl) > upgrade_tests/upgrade_through_versions_test.py:580: in upgrade_to_version > jvm_args=['-Dcassandra.disable_max_protocol_auto_override=true']) # > prevent protocol capping in mixed version clusters > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:906: in start > if not self._wait_for_running(process, timeout_s=7): > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:931: in _wait_for_running > self._update_pid(process) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > process = > def _update_pid(self, process): > """ > Reads pid from cassandra.pid file and stores in the self.pid > After setting up pid updates status (UP, DOWN, etc) and node.conf > """ > pidfile = os.path.join(self.get_path(), 'cassandra.pid') > > start = time.time() > while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0): > if (time.time() - start > 30.0): > common.error("Timed out waiting for pidfile to be filled > (current time is {})".format(datetime.now())) > break > else: > time.sleep(0.1) > > try: > with open(pidfile, 'rb') as f: > if > common.is_modern_windows_install(self.get_base_cassandra_version()): > self.pid = > int(f.readline().strip().decode('utf-16').strip()) > else: > self.pid = int(f.readline().strip()) > except IOError as e: > > raise NodeError('Problem starting node %s due to %s' % > > (self.name, e), process) > E ccmlib.node.NodeError: Problem starting node node1 due to [Errno > 2] No such file or directory: '/tmp/dtest-_8rdmjs0/test/node1/cassandra.pid' > ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2106: NodeError > {noformat} > It's not clear whether this reproduces locally or just on circleci. > We should address these failures before next 4.0.13 and 4.1.4 releases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19259) upgrade_tests.upgrade_through_versions_test consistently failing on circleci
Paulo Motta created CASSANDRA-19259: --- Summary: upgrade_tests.upgrade_through_versions_test consistently failing on circleci Key: CASSANDRA-19259 URL: https://issues.apache.org/jira/browse/CASSANDRA-19259 Project: Cassandra Issue Type: Bug Reporter: Paulo Motta This suite is consistently failing in [4.0|https://app.circleci.com/pipelines/github/driftx/cassandra/1454/workflows/0357136e-cee3-42e4-900b-3347fc8d42d3/jobs/71008/tests] and [4.1|https://app.circleci.com/pipelines/github/driftx/cassandra/1453/workflows/dd1732df-271c-43bc-bc5f-8577c605c746/jobs/71009/tests] with the following stack trace: {noformat} self = process = def _update_pid(self, process): """ Reads pid from cassandra.pid file and stores in the self.pid After setting up pid updates status (UP, DOWN, etc) and node.conf """ pidfile = os.path.join(self.get_path(), 'cassandra.pid') start = time.time() while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0): if (time.time() - start > 30.0): common.error("Timed out waiting for pidfile to be filled (current time is {})".format(datetime.now())) break else: time.sleep(0.1) try: > with open(pidfile, 'rb') as f: E FileNotFoundError: [Errno 2] No such file or directory: '/tmp/dtest-_8rdmjs0/test/node1/cassandra.pid' ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2100: FileNotFoundError During handling of the above exception, another exception occurred: self = def test_parallel_upgrade(self): """ Test upgrading cluster all at once (requires cluster downtime). """ > self.upgrade_scenario() upgrade_tests/upgrade_through_versions_test.py:387: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ upgrade_tests/upgrade_through_versions_test.py:491: in upgrade_scenario self.upgrade_to_version(version_meta, internode_ssl=internode_ssl) upgrade_tests/upgrade_through_versions_test.py:580: in upgrade_to_version jvm_args=['-Dcassandra.disable_max_protocol_auto_override=true']) # prevent protocol capping in mixed version clusters ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:906: in start if not self._wait_for_running(process, timeout_s=7): ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:931: in _wait_for_running self._update_pid(process) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = process = def _update_pid(self, process): """ Reads pid from cassandra.pid file and stores in the self.pid After setting up pid updates status (UP, DOWN, etc) and node.conf """ pidfile = os.path.join(self.get_path(), 'cassandra.pid') start = time.time() while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0): if (time.time() - start > 30.0): common.error("Timed out waiting for pidfile to be filled (current time is {})".format(datetime.now())) break else: time.sleep(0.1) try: with open(pidfile, 'rb') as f: if common.is_modern_windows_install(self.get_base_cassandra_version()): self.pid = int(f.readline().strip().decode('utf-16').strip()) else: self.pid = int(f.readline().strip()) except IOError as e: > raise NodeError('Problem starting node %s due to %s' % (self.name, > e), process) E ccmlib.node.NodeError: Problem starting node node1 due to [Errno 2] No such file or directory: '/tmp/dtest-_8rdmjs0/test/node1/cassandra.pid' ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2106: NodeError {noformat} It's not clear whether this reproduces locally or just on circleci. We should address these failures before next 4.0.13 and 4.1.4 releases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804104#comment-17804104 ] Paulo Motta commented on CASSANDRA-18999: - 5.0 precommit tests are looking good. I can't make a lot of sense from the [upgrade dtests failures|https://app.circleci.com/pipelines/github/driftx/cassandra/1444/workflows/ddfe8a3c-4b36-4b9e-8f01-c85249fd8488/jobs/70142/tests] but they don't seem related to this ticket. It looks like in both runs tests from {{upgrade_through_versions_test}} failed with: {noformat} Problem starting node node1 due to [Errno 2] No such file or directory: '/tmp/dtest-jbrcckw7/test/node1/cassandra.pid' {noformat} This looks like an environmental issue to me as I didn't find any open ticket for this particular issue. While the [4.1|https://app.circleci.com/pipelines/github/driftx/cassandra/1444/workflows/ddfe8a3c-4b36-4b9e-8f01-c85249fd8488] job completed the [4.0|https://app.circleci.com/pipelines/github/driftx/cassandra/1445/workflows/d346af10-7b34-41a0-b2b7-c1c3290a6696] seems to have gotten stuck. I'm inclined to commit this to avoid dragging this ticket longer and re-run the upgrade dtest before the next 4.X release to catch any outstanding upgrade issues. WDYT? > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 50m > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19097) Test Failure: bootstrap_test.TestBootstrap.*
[ https://issues.apache.org/jira/browse/CASSANDRA-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803682#comment-17803682 ] Paulo Motta commented on CASSANDRA-19097: - Seen {{test_read_from_bootstrapped_node}} failure in [5.0-18999-j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1446/workflows/e0726e79-e517-4a82-828c-c7931fc9d99b/jobs/70130/tests] > Test Failure: bootstrap_test.TestBootstrap.* > > > Key: CASSANDRA-19097 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19097 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Michael Semb Wever >Priority: Normal > Fix For: 4.0.x, 5.0-rc > > > test_killed_wiped_node_cannot_join > test_read_from_bootstrapped_node > test_shutdown_wiped_node_cannot_join > Seen in dtests_offheap in CASSANDRA-19034 > https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/258/workflows/cea7d697-ca33-40bb-8914-fb9fc662820a/jobs/21162/parallel-runs/38 > {noformat} > self = > def test_killed_wiped_node_cannot_join(self): > > self._wiped_node_cannot_join_test(gently=False) > bootstrap_test.py:608: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = , gently = False > def _wiped_node_cannot_join_test(self, gently): > """ > @jira_ticket CASSANDRA-9765 > Test that if we stop a node and wipe its data then the node cannot > join > when it is not a seed. Test both a nice shutdown or a forced > shutdown, via > the gently parameter. > """ > cluster = self.cluster > > cluster.set_environment_variable('CASSANDRA_TOKEN_PREGENERATION_DISABLED', > 'True') > cluster.populate(3) > cluster.start() > > stress_table = 'keyspace1.standard1' > > # write some data > node1 = cluster.nodelist()[0] > node1.stress(['write', 'n=10K', 'no-warmup', '-rate', 'threads=8']) > > session = self.patient_cql_connection(node1) > original_rows = list(session.execute("SELECT * FROM > {}".format(stress_table,))) > > # Add a new node, bootstrap=True ensures that it is not a seed > node4 = new_node(cluster, bootstrap=True) > node4.start(wait_for_binary_proto=True) > > session = self.patient_cql_connection(node4) > > assert original_rows == list(session.execute("SELECT * FROM > > {}".format(stress_table,))) > E assert [Row(key=b'PP...e9\xbb'), ...] == [Row(key=b'PP...e9\xbb'), > ...] > E At index 587 diff: Row(key=b'OP2656L630', > C0=b"E02\xd2\x8clBv\tr\n\xe3\x01\xdd\xf2\x8a\x91\x7f-\x9dm'\xa5\xe7PH\xef\xc1xlO\xab+d", > > C1=b"\xb2\xc0j\xff\xcb'\xe3\xcc\x0b\x93?\x18@\xc4\xc7tV\xb7q\xeeF\x82\xa4\xd3\xdcFl\xd9\x87 > \x9a\xde\xdc\xa3", > C2=b'\xed\xf8\x8d%\xa4\xa6LPs;\x98f\xdb\xca\x913\xba{M\x8d6XW\x01\xea-\xb5 > C3=b'\x9ec\xcf\xc7\xec\xa5\x85Z]\xa6\x19\xeb\xc4W\x1d%lyZj\xb9\x94I\x90\xebZ\xdba\xdd\xdc\x9e\x82\x95\x1c', > > C4=b'\xab\x9e\x13\x8b\xc6\x15D\x9b\xccl\xdcX\xb23\xd0\x8b\xa3\xba7\xc1c\xf7F\x1d\xf8e\xbd\x89\xcb\xd8\xd1)f\xdd') > != Row(key=b'4LN78NONP0', > C0=b"\xdf\x90\xb3/u\xc9/C\xcdOYG3\x070@#\xc3k\xaa$M'\x19\xfb\xab\xc0\x10]\xa6\xac\x1d\x81\xad", > > C1=b'\x8a\xb7j\x95\xf9\xbd?&\x11\xaaH\xcd\x87\xaa\xd2\x85\x08X\xea9\x94\xae8U\x92\xad\xb0\x1b9\xff\x87Z\xe81', > > C2=b'6\x1d\xa1-\xf77\xc7\xde+`\xb7\x89\xaa\xcd\xb5_\xe5\xb3\x04\xc7\xb1\x95e\x81s\t1\x8b\x16sc\x0eMm', > > C3=b'\xfbi\x08;\xc9\x94\x15}r\xfe\x1b\xae5\xf6v\x83\xae\xff\x82\x9b`J\xc2D\xa6k+\xf3\xd3\xff{C\xd0;', > > C4=b'\x8f\x87\x18\x0f\xfa\xadK"\x9e\x96\x87:tiu\xa5\x99\xe1_Ax\xa3\x12\xb4Z\xc9v\xa5\xad\xb8{\xc0\xa3\x93') > E Left contains 2830 more items, first extra item: > Row(key=b'5N7N172K30', > C0=b'Y\x81\xa6\x02\x89\xa0hyp\x00O\xe9kFp$\x86u\xea\n\x7fK\x99\xe1\xf6G\xf77\xf7\xd7\xe1\xc7L\x...0\x87a\x03\xee', > > C4=b'\xe8\xd8\x17\xf3\x14\x16Q\x9d\\jb\xde=\x81\xc1B\x9c;T\xb1\xa2O-\x87zF=\x04`\x04\xbd\xc9\x95\xad') > E Full diff: > E [ > … > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803676#comment-17803676 ] Paulo Motta commented on CASSANDRA-18999: - Thanks Brandon! Looks like {{test_read_from_bootstrapped_node}} already failed in [5.0-j17|https://app.circleci.com/pipelines/github/driftx/cassandra/1446/workflows/e0726e79-e517-4a82-828c-c7931fc9d99b] but this is being tracked in CASSANDRA-19097. I will check back when CI finishes and commit if it looks good. > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 50m > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803651#comment-17803651 ] Paulo Motta commented on CASSANDRA-18999: - Please find updated patches prepared for commit: * [4.0-18999|https://github.com/pauloricardomg/cassandra/tree/4.0-18999] * [4.1-18999|https://github.com/pauloricardomg/cassandra/tree/4.1-18999] * [5.0-18999|https://github.com/pauloricardomg/cassandra/tree/5.0-18999] > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 50m > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19248) "nodetool bootstrap resume" starts unnecessary streaming session on joining node
[ https://issues.apache.org/jira/browse/CASSANDRA-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19248: Description: Calling {{nodetool boostrap resume}} triggers a new bootstrap streaming session on a joining node, even if there's a bootstrap streaming session currently running. Each time this command is called, a new bootstrap streaming session is started, causing the same data to be needlessly streamed from peers. It should only be possible to call {{nodetool bootstrap resume}} if a previous bootstrap attempt has failed. An example of multiple invocations of {{nodetool bootstrap resume}} in a joining node is shown below: {noformat} $ nodetool netstats Mode: JOINING Bootstrap a1cf3bf0-ab3a-11ee-9fcf-5746a7aee9ca /A.B.C.D Receiving 13 files, 14302312660 bytes total. Already received 2 files, 52389676 bytes total ks1/tbl1 80/80 bytes(100%) received from idx:0/A.B.C.D ks2/tbl2 471/471 bytes(100%) received from idx:0/A.B.C.D /E.F.G.H /I.J.K.L Bootstrap 7f1e7000-ab3d-11ee-9fcf-5746a7aee9ca /A.B.C.D Receiving 13 files, 14302312660 bytes total. Already received 0 files, 0 bytes total /E.F.G.H /I.J.K.L Bootstrap 9ca42500-ab3a-11ee-9fcf-5746a7aee9ca /A.B.C.D /E.F.G.H Receiving 13 files, 14302312660 bytes total. Already received 2 files, 104838752 bytes total ks1/tbl1 80/80 bytes(100%) received from idx:0/E.F.G.H ks2/tbl2 471/471 bytes(100%) received from idx:0/E.F.G.H /I.J.K.L {noformat} was: Calling {{nodetool boostrap resume}} triggers a new bootstrap streaming session on a joining node, even if there's a bootstrap streaming session currently running. Each time this command is called, a new bootstrap streaming session is started, causing the same data to be needlessly streamed from peers. It should only be possible to call {{nodetool bootstrap resume}} if a previous bootstrap attempt has failed. An example of multiple invocations of {{nodetool bootstrap resume}} in a joining node is shown below: {noformat} $ nodetool netstats Mode: JOINING Bootstrap a1cf3bf0-ab3a-11ee-9fcf-5746a7aee9ca /A.B.C.D Receiving 13 files, 14302312660 bytes total. Already received 2 files, 52389676 bytes total ks1/tbl1 80/80 bytes(100%) received from idx:0/10.34.194.220 ks2/tbl2 471/471 bytes(100%) received from idx:0/10.34.194.220 /E.F.G.H /I.J.K.L Bootstrap 7f1e7000-ab3d-11ee-9fcf-5746a7aee9ca /A.B.C.D Receiving 13 files, 14302312660 bytes total. Already received 0 files, 0 bytes total /E.F.G.H /I.J.K.L Bootstrap 9ca42500-ab3a-11ee-9fcf-5746a7aee9ca /A.B.C.D /E.F.G.H Receiving 13 files, 14302312660 bytes total. Already received 2 files, 104838752 bytes total ks1/tbl1 80/80 bytes(100%) received from idx:0/10.34.194.220 ks2/tbl2 471/471 bytes(100%) received from idx:0/10.34.194.220 /I.J.K.L {noformat} > "nodetool bootstrap resume" starts unnecessary streaming session on joining > node > > > Key: CASSANDRA-19248 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19248 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Paulo Motta >Priority: Normal > Labels: lhf > > Calling {{nodetool boostrap resume}} triggers a new bootstrap streaming > session on a joining node, even if there's a bootstrap streaming session > currently running. > Each time this command is called, a new bootstrap streaming session is > started, causing the same data to be needlessly streamed from peers. > It should only be possible to call {{nodetool bootstrap resume}} if a > previous bootstrap attempt has failed. > An example of multiple invocations of {{nodetool bootstrap resume}} in a > joining node is shown below: > {noformat} > $ nodetool netstats > Mode: JOINING > Bootstrap a1cf3bf0-ab3a-11ee-9fcf-5746a7aee9ca > /A.B.C.D > Receiving 13 files, 14302312660 bytes total. Already received 2 > files, 52389676 bytes total > ks1/tbl1 80/80 bytes(100%) received from idx:0/A.B.C.D > ks2/tbl2 471/471 bytes(100%) received from idx:0/A.B.C.D > /E.F.G.H > /I.J.K.L > Bootstrap 7f1e7000-ab3d-11ee-9fcf-5746a7aee9ca > /A.B.C.D > Receiving 13 files, 14302312660 bytes total. Already received 0 > files, 0 bytes total > /E.F.G.H > /I.J.K.L > Bootstrap 9ca42500-ab3a-11ee-9fcf-5746a7aee9ca > /A.B.C.D > /E.F.G.H > Receiving 13 files, 14302312660 bytes total. Already received 2 > fil
[jira] [Updated] (CASSANDRA-19248) "nodetool bootstrap resume" starts unnecessary streaming session on joining node
[ https://issues.apache.org/jira/browse/CASSANDRA-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19248: Since Version: 2.2.0 beta 1 Labels: lhf (was: ) > "nodetool bootstrap resume" starts unnecessary streaming session on joining > node > > > Key: CASSANDRA-19248 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19248 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Paulo Motta >Priority: Normal > Labels: lhf > > Calling {{nodetool boostrap resume}} triggers a new bootstrap streaming > session on a joining node, even if there's a bootstrap streaming session > currently running. > Each time this command is called, a new bootstrap streaming session is > started, causing the same data to be needlessly streamed from peers. > It should only be possible to call {{nodetool bootstrap resume}} if a > previous bootstrap attempt has failed. > An example of multiple invocations of {{nodetool bootstrap resume}} in a > joining node is shown below: > {noformat} > $ nodetool netstats > Mode: JOINING > Bootstrap a1cf3bf0-ab3a-11ee-9fcf-5746a7aee9ca > /A.B.C.D > Receiving 13 files, 14302312660 bytes total. Already received 2 > files, 52389676 bytes total > ks1/tbl1 80/80 bytes(100%) received from idx:0/10.34.194.220 > ks2/tbl2 471/471 bytes(100%) received from idx:0/10.34.194.220 > /E.F.G.H > /I.J.K.L > Bootstrap 7f1e7000-ab3d-11ee-9fcf-5746a7aee9ca > /A.B.C.D > Receiving 13 files, 14302312660 bytes total. Already received 0 > files, 0 bytes total > /E.F.G.H > /I.J.K.L > Bootstrap 9ca42500-ab3a-11ee-9fcf-5746a7aee9ca > /A.B.C.D > /E.F.G.H > Receiving 13 files, 14302312660 bytes total. Already received 2 > files, 104838752 bytes total > ks1/tbl1 80/80 bytes(100%) received from idx:0/10.34.194.220 > ks2/tbl2 471/471 bytes(100%) received from idx:0/10.34.194.220 > /I.J.K.L {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19248) "nodetool bootstrap resume" starts unnecessary streaming session on joining node
[ https://issues.apache.org/jira/browse/CASSANDRA-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19248: Bug Category: Parent values: Correctness(12982)Level 1 values: API / Semantic Implementation(12988) Complexity: Low Hanging Fruit Component/s: Cluster/Membership Discovered By: User Report Severity: Low Status: Open (was: Triage Needed) > "nodetool bootstrap resume" starts unnecessary streaming session on joining > node > > > Key: CASSANDRA-19248 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19248 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Paulo Motta >Priority: Normal > > Calling {{nodetool boostrap resume}} triggers a new bootstrap streaming > session on a joining node, even if there's a bootstrap streaming session > currently running. > Each time this command is called, a new bootstrap streaming session is > started, causing the same data to be needlessly streamed from peers. > It should only be possible to call {{nodetool bootstrap resume}} if a > previous bootstrap attempt has failed. > An example of multiple invocations of {{nodetool bootstrap resume}} in a > joining node is shown below: > {noformat} > $ nodetool netstats > Mode: JOINING > Bootstrap a1cf3bf0-ab3a-11ee-9fcf-5746a7aee9ca > /A.B.C.D > Receiving 13 files, 14302312660 bytes total. Already received 2 > files, 52389676 bytes total > ks1/tbl1 80/80 bytes(100%) received from idx:0/10.34.194.220 > ks2/tbl2 471/471 bytes(100%) received from idx:0/10.34.194.220 > /E.F.G.H > /I.J.K.L > Bootstrap 7f1e7000-ab3d-11ee-9fcf-5746a7aee9ca > /A.B.C.D > Receiving 13 files, 14302312660 bytes total. Already received 0 > files, 0 bytes total > /E.F.G.H > /I.J.K.L > Bootstrap 9ca42500-ab3a-11ee-9fcf-5746a7aee9ca > /A.B.C.D > /E.F.G.H > Receiving 13 files, 14302312660 bytes total. Already received 2 > files, 104838752 bytes total > ks1/tbl1 80/80 bytes(100%) received from idx:0/10.34.194.220 > ks2/tbl2 471/471 bytes(100%) received from idx:0/10.34.194.220 > /I.J.K.L {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19248) "nodetool bootstrap resume" starts unnecessary streaming session on joining node
Paulo Motta created CASSANDRA-19248: --- Summary: "nodetool bootstrap resume" starts unnecessary streaming session on joining node Key: CASSANDRA-19248 URL: https://issues.apache.org/jira/browse/CASSANDRA-19248 Project: Cassandra Issue Type: Bug Reporter: Paulo Motta Calling {{nodetool boostrap resume}} triggers a new bootstrap streaming session on a joining node, even if there's a bootstrap streaming session currently running. Each time this command is called, a new bootstrap streaming session is started, causing the same data to be needlessly streamed from peers. It should only be possible to call {{nodetool bootstrap resume}} if a previous bootstrap attempt has failed. An example of multiple invocations of {{nodetool bootstrap resume}} in a joining node is shown below: {noformat} $ nodetool netstats Mode: JOINING Bootstrap a1cf3bf0-ab3a-11ee-9fcf-5746a7aee9ca /A.B.C.D Receiving 13 files, 14302312660 bytes total. Already received 2 files, 52389676 bytes total ks1/tbl1 80/80 bytes(100%) received from idx:0/10.34.194.220 ks2/tbl2 471/471 bytes(100%) received from idx:0/10.34.194.220 /E.F.G.H /I.J.K.L Bootstrap 7f1e7000-ab3d-11ee-9fcf-5746a7aee9ca /A.B.C.D Receiving 13 files, 14302312660 bytes total. Already received 0 files, 0 bytes total /E.F.G.H /I.J.K.L Bootstrap 9ca42500-ab3a-11ee-9fcf-5746a7aee9ca /A.B.C.D /E.F.G.H Receiving 13 files, 14302312660 bytes total. Already received 2 files, 104838752 bytes total ks1/tbl1 80/80 bytes(100%) received from idx:0/10.34.194.220 ks2/tbl2 471/471 bytes(100%) received from idx:0/10.34.194.220 /I.J.K.L {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801825#comment-17801825 ] Paulo Motta commented on CASSANDRA-18999: - {quote}I have tested 5.0 already, links are in my last comments above. Have you done some changes to it? I think 5.0 is fully covered, the only thing we need is upgrade dtests for 4.0 and 4.1. {quote} The 5.0 version you submitted was based on the [CASSANDRA-18999-5.0-hasMajVer3removal|https://github.com/apache/cassandra/compare/trunk...instaclustr:cassandra:CASSANDRA-18999-5.0-hasMajVer3removal] branch which removes {{hasMajorVersion3Nodes}} from 5.0. We need to submit CI for [this branch|https://github.com/pauloricardomg/cassandra/tree/cassandra-5.0] where {{hasMajorVersion3Nodes}} is fixed but not removed (the removal will be done on CASSANDRA-19243). We also need to submit upgrade tests for 4.0/4.1/5.0. Can you do this in circle? If not I guess we'll have to wait until asf ci is bac. > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 50m > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-18999: Status: Ready to Commit (was: Changes Suggested) > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 50m > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801031#comment-17801031 ] Paulo Motta commented on CASSANDRA-18999: - I created CASSANDRA-19243 to have a wider review on removal of pre-4.0 compatibility code. For this ticket, let's just merge the original 4.0/4.1/5.0 PRs fixing {{Gossiper::hasMajorVersion3Nodes}} without removing pre-4.0 compatibility code from 5.0: I have prepared the patches for commit: * [cassandra-4.0|https://github.com/pauloricardomg/cassandra/tree/cassandra-4.0] * [cassandra-4.1|https://github.com/pauloricardomg/cassandra/tree/cassandra-4.1] * [cassandra-5.0|https://github.com/pauloricardomg/cassandra/tree/cassandra-5.0] I've submitted a [devbranch job|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2649/] for the cassandra-5.0 branch but it seems ci-cassandra.a.o is unavailable. I don't have circle environment setup, so I will wait until jenkins is back or someone submits a circle job for cassandra-5.0 before committing this. > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 50m > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19243) Remove pre-4.0 compatibility code for 5.0
[ https://issues.apache.org/jira/browse/CASSANDRA-19243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801027#comment-17801027 ] Paulo Motta edited comment on CASSANDRA-19243 at 12/28/23 4:14 PM: --- It was identified on CASSANDRA-18999 that {{Gossiper::hasMajorVersion3Nodes }}was removed from trunk, effectively removing pre-4.0 compatibility from trunk. This [PR|https://github.com/apache/cassandra/pull/3004] removes the method {{Gossiper::hasMajorVersion3Nodes}} from cassandra-5.0 branch, which removes pre-4.0 compatibility from 5.0. In addition to reviewing the changes above, we need to ensure that no more pre-4.0 compatibility code remains in 5.0+ Since the backward compatibility code will be removed, I propose adding a new StartupCheck to prevent upgrade from version < 4.0 and a flag to override (if this is not already there). was (Author: paulo): It was identified on CASSANDRA-18999 that {{Gossiper::hasMajorVersion3Nodes }}was removed from trunk, effectively removing pre-4.0 compatibility from trunk. This [PR|https://github.com/apache/cassandra/pull/3004] removes the method {{Gossiper::hasMajorVersion3Nodes}} from cassandra-5.0 branch, which removes pre-4.0 compatibility from 5.0. In addition to reviewing the changes above, we need to ensure that no more pre-4.0 compatibility code remains in 5.0+ Since the backward compatibility code will be removed, I propose adding a new StartupCheck to prevent upgrade from version < 4.0 and a flag to override (if this is not already there). > Remove pre-4.0 compatibility code for 5.0 > - > > Key: CASSANDRA-19243 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19243 > Project: Cassandra > Issue Type: Improvement >Reporter: Paulo Motta >Priority: Normal > > This is an umbrella ticket to discuss removing pre-4.0 compatibility code > from 5.0, similar to CASSANDRA-12716 for 4.x. > A few considerations: > - Discuss/ratify removal of pre-compatibility code on dev mailing list > - What compatibility features are being removed? > - What upgrade tests are being removed ? Are they still relevant and can be > reused? > - Should upgrade from 3.x to 5.X fail on startup with an override flag? > - Can/should we make it easier to deprecate/remove compatibility code for > future major releases? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19243) Remove pre-4.0 compatibility code for 5.0
[ https://issues.apache.org/jira/browse/CASSANDRA-19243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801027#comment-17801027 ] Paulo Motta commented on CASSANDRA-19243: - It was identified on CASSANDRA-18999 that {{Gossiper::hasMajorVersion3Nodes }}was removed from trunk, effectively removing pre-4.0 compatibility from trunk. This [PR|https://github.com/apache/cassandra/pull/3004] removes the method {{Gossiper::hasMajorVersion3Nodes}} from cassandra-5.0 branch, which removes pre-4.0 compatibility from 5.0. In addition to reviewing the changes above, we need to ensure that no more pre-4.0 compatibility code remains in 5.0+ Since the backward compatibility code will be removed, I propose adding a new StartupCheck to prevent upgrade from version < 4.0 and a flag to override (if this is not already there). > Remove pre-4.0 compatibility code for 5.0 > - > > Key: CASSANDRA-19243 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19243 > Project: Cassandra > Issue Type: Improvement >Reporter: Paulo Motta >Priority: Normal > > This is an umbrella ticket to discuss removing pre-4.0 compatibility code > from 5.0, similar to CASSANDRA-12716 for 4.x. > A few considerations: > - Discuss/ratify removal of pre-compatibility code on dev mailing list > - What compatibility features are being removed? > - What upgrade tests are being removed ? Are they still relevant and can be > reused? > - Should upgrade from 3.x to 5.X fail on startup with an override flag? > - Can/should we make it easier to deprecate/remove compatibility code for > future major releases? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19243) Remove pre-4.0 compatibility code for 5.0
[ https://issues.apache.org/jira/browse/CASSANDRA-19243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19243: Workflow: Copy of Cassandra Default Workflow (was: Copy of Cassandra Bug Workflow) Issue Type: Improvement (was: Bug) > Remove pre-4.0 compatibility code for 5.0 > - > > Key: CASSANDRA-19243 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19243 > Project: Cassandra > Issue Type: Improvement > Reporter: Paulo Motta >Priority: Normal > > This is an umbrella ticket to discuss removing pre-4.0 compatibility code > from 5.0, similar to CASSANDRA-12716 for 4.x. > A few considerations: > - Discuss/ratify removal of pre-compatibility code on dev mailing list > - What compatibility features are being removed? > - What upgrade tests are being removed ? Are they still relevant and can be > reused? > - Should upgrade from 3.x to 5.X fail on startup with an override flag? > - Can/should we make it easier to deprecate/remove compatibility code for > future major releases? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19243) Remove pre-4.0 compatibility code for 5.0
[ https://issues.apache.org/jira/browse/CASSANDRA-19243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19243: Description: This is an umbrella ticket to discuss removing pre-4.0 compatibility code from 5.0, similar to CASSANDRA-12716 for 4.x. A few considerations: - Discuss/ratify removal of pre-compatibility code on dev mailing list - What compatibility features are being removed? - What upgrade tests are being removed ? Are they still relevant and can be reused? - Should upgrade from 3.x to 5.X fail on startup with an override flag? - Can/should we make it easier to deprecate/remove compatibility code for future major releases? was:TBD > Remove pre-4.0 compatibility code for 5.0 > - > > Key: CASSANDRA-19243 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19243 > Project: Cassandra > Issue Type: Bug > Reporter: Paulo Motta >Priority: Normal > > This is an umbrella ticket to discuss removing pre-4.0 compatibility code > from 5.0, similar to CASSANDRA-12716 for 4.x. > A few considerations: > - Discuss/ratify removal of pre-compatibility code on dev mailing list > - What compatibility features are being removed? > - What upgrade tests are being removed ? Are they still relevant and can be > reused? > - Should upgrade from 3.x to 5.X fail on startup with an override flag? > - Can/should we make it easier to deprecate/remove compatibility code for > future major releases? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19243) Remove pre-4.0 compatibility code for 5.0
Paulo Motta created CASSANDRA-19243: --- Summary: Remove pre-4.0 compatibility code for 5.0 Key: CASSANDRA-19243 URL: https://issues.apache.org/jira/browse/CASSANDRA-19243 Project: Cassandra Issue Type: Bug Reporter: Paulo Motta TBD -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800879#comment-17800879 ] Paulo Motta commented on CASSANDRA-19001: - Added [this commit|https://github.com/pauloricardomg/cassandra/commit/cdc4124873f2b29c4d42e3265a9c7f408bcd98c4] to [pauloricardomg/19001-5.0-patch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:cassandra:19001-5.0-patch] to fail "nodetool sjk" with a nicer message when a JDK is not found: "nodetool sjk jps" output with JDK17: {noformat} $ bin/nodetool sjk jps 28270 org.apache.cassandra.tools.NodeTool -p 7199 sjk jps {noformat} "nodetool sjk jps" output with JRE17: {noformat} $ docker run --rm -it cassandra-test:5.0-19001 nodetool sjk jps | cat | head -n10 ERROR: JDK not detected and nodetool sjk requires JDK to work. {noformat} > Check whether the startup warnings for unknown modules represent a legit > problem or cosmetic issue > -- > > Key: CASSANDRA-19001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19001 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.0.x, 5.x > > > During the 5.0 alpha 2 release > [vote|https://lists.apache.org/thread/lt3x0obr5cpbcydf5490pj6b2q0mz5zr], > [~paulo] raised the following concerns: > {code:java} > Launched a tarball-based 5.0-alpha2 container on top of > "eclipse-temurin:17-jre-focal" and the server starts up fine, can run > nodetool and cqlsh. > I got these seemingly harmless JDK17 warnings during startup and when > running nodetool (no warnings on JDK11): > WARNING: Unknown module: jdk.attach specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-opens > WARNING: A terminally deprecated method in java.lang.System has been called > WARNING: System::setSecurityManager has been called by > org.apache.cassandra.security.ThreadAwareSecurityManager > (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar) > WARNING: Please consider reporting this to the maintainers of > org.apache.cassandra.security.ThreadAwareSecurityManager > WARNING: System::setSecurityManager will be removed in a future release > Anybody knows if these warnings are legit/expected ? We can create > follow-up tickets if needed. > $ java --version > openjdk 17.0.9 2023-10-17 > OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9) > OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, > sharing) > {code} > {code:java} > Clarification: - When running nodetool only the "Unknown module" warnings > show up. All warnings show up during startup.{code} > We need to verify whether this presents a real problem in the features where > those modules are expected to be used, or if it is a false alarm. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19001: Reviewers: Paulo Motta, Paulo Motta Paulo Motta, Paulo Motta (was: Paulo Motta) Status: Review In Progress (was: Patch Available) > Check whether the startup warnings for unknown modules represent a legit > problem or cosmetic issue > -- > > Key: CASSANDRA-19001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19001 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.0.x, 5.x > > > During the 5.0 alpha 2 release > [vote|https://lists.apache.org/thread/lt3x0obr5cpbcydf5490pj6b2q0mz5zr], > [~paulo] raised the following concerns: > {code:java} > Launched a tarball-based 5.0-alpha2 container on top of > "eclipse-temurin:17-jre-focal" and the server starts up fine, can run > nodetool and cqlsh. > I got these seemingly harmless JDK17 warnings during startup and when > running nodetool (no warnings on JDK11): > WARNING: Unknown module: jdk.attach specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-opens > WARNING: A terminally deprecated method in java.lang.System has been called > WARNING: System::setSecurityManager has been called by > org.apache.cassandra.security.ThreadAwareSecurityManager > (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar) > WARNING: Please consider reporting this to the maintainers of > org.apache.cassandra.security.ThreadAwareSecurityManager > WARNING: System::setSecurityManager will be removed in a future release > Anybody knows if these warnings are legit/expected ? We can create > follow-up tickets if needed. > $ java --version > openjdk 17.0.9 2023-10-17 > OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9) > OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, > sharing) > {code} > {code:java} > Clarification: - When running nodetool only the "Unknown module" warnings > show up. All warnings show up during startup.{code} > We need to verify whether this presents a real problem in the features where > those modules are expected to be used, or if it is a false alarm. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19001: Status: Changes Suggested (was: Review In Progress) > Check whether the startup warnings for unknown modules represent a legit > problem or cosmetic issue > -- > > Key: CASSANDRA-19001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19001 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.0.x, 5.x > > > During the 5.0 alpha 2 release > [vote|https://lists.apache.org/thread/lt3x0obr5cpbcydf5490pj6b2q0mz5zr], > [~paulo] raised the following concerns: > {code:java} > Launched a tarball-based 5.0-alpha2 container on top of > "eclipse-temurin:17-jre-focal" and the server starts up fine, can run > nodetool and cqlsh. > I got these seemingly harmless JDK17 warnings during startup and when > running nodetool (no warnings on JDK11): > WARNING: Unknown module: jdk.attach specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-opens > WARNING: A terminally deprecated method in java.lang.System has been called > WARNING: System::setSecurityManager has been called by > org.apache.cassandra.security.ThreadAwareSecurityManager > (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar) > WARNING: Please consider reporting this to the maintainers of > org.apache.cassandra.security.ThreadAwareSecurityManager > WARNING: System::setSecurityManager will be removed in a future release > Anybody knows if these warnings are legit/expected ? We can create > follow-up tickets if needed. > $ java --version > openjdk 17.0.9 2023-10-17 > OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9) > OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, > sharing) > {code} > {code:java} > Clarification: - When running nodetool only the "Unknown module" warnings > show up. All warnings show up during startup.{code} > We need to verify whether this presents a real problem in the features where > those modules are expected to be used, or if it is a false alarm. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19001: Status: Patch Available (was: In Progress) > Check whether the startup warnings for unknown modules represent a legit > problem or cosmetic issue > -- > > Key: CASSANDRA-19001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19001 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.0.x, 5.x > > > During the 5.0 alpha 2 release > [vote|https://lists.apache.org/thread/lt3x0obr5cpbcydf5490pj6b2q0mz5zr], > [~paulo] raised the following concerns: > {code:java} > Launched a tarball-based 5.0-alpha2 container on top of > "eclipse-temurin:17-jre-focal" and the server starts up fine, can run > nodetool and cqlsh. > I got these seemingly harmless JDK17 warnings during startup and when > running nodetool (no warnings on JDK11): > WARNING: Unknown module: jdk.attach specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-opens > WARNING: A terminally deprecated method in java.lang.System has been called > WARNING: System::setSecurityManager has been called by > org.apache.cassandra.security.ThreadAwareSecurityManager > (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar) > WARNING: Please consider reporting this to the maintainers of > org.apache.cassandra.security.ThreadAwareSecurityManager > WARNING: System::setSecurityManager will be removed in a future release > Anybody knows if these warnings are legit/expected ? We can create > follow-up tickets if needed. > $ java --version > openjdk 17.0.9 2023-10-17 > OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9) > OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, > sharing) > {code} > {code:java} > Clarification: - When running nodetool only the "Unknown module" warnings > show up. All warnings show up during startup.{code} > We need to verify whether this presents a real problem in the features where > those modules are expected to be used, or if it is a false alarm. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19001: Status: Open (was: Patch Available) > Check whether the startup warnings for unknown modules represent a legit > problem or cosmetic issue > -- > > Key: CASSANDRA-19001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19001 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.0.x, 5.x > > > During the 5.0 alpha 2 release > [vote|https://lists.apache.org/thread/lt3x0obr5cpbcydf5490pj6b2q0mz5zr], > [~paulo] raised the following concerns: > {code:java} > Launched a tarball-based 5.0-alpha2 container on top of > "eclipse-temurin:17-jre-focal" and the server starts up fine, can run > nodetool and cqlsh. > I got these seemingly harmless JDK17 warnings during startup and when > running nodetool (no warnings on JDK11): > WARNING: Unknown module: jdk.attach specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-opens > WARNING: A terminally deprecated method in java.lang.System has been called > WARNING: System::setSecurityManager has been called by > org.apache.cassandra.security.ThreadAwareSecurityManager > (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar) > WARNING: Please consider reporting this to the maintainers of > org.apache.cassandra.security.ThreadAwareSecurityManager > WARNING: System::setSecurityManager will be removed in a future release > Anybody knows if these warnings are legit/expected ? We can create > follow-up tickets if needed. > $ java --version > openjdk 17.0.9 2023-10-17 > OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9) > OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, > sharing) > {code} > {code:java} > Clarification: - When running nodetool only the "Unknown module" warnings > show up. All warnings show up during startup.{code} > We need to verify whether this presents a real problem in the features where > those modules are expected to be used, or if it is a false alarm. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800870#comment-17800870 ] Paulo Motta commented on CASSANDRA-19001: - I finally got a chance to take a look at this, apologies for the delay. It looks like the [JDK detection check|https://github.com/ekaterinadimitrova2/cassandra/blob/613bb6d2cbc40924479eac044f78e0c4e584521b/bin/cassandra.in.sh#L153] does not work when the JRE is on {{/opt/java/openjdk/bin/java}} which is the case for the official docker image. I updated the check [on this commit|https://github.com/pauloricardomg/cassandra/commit/97472afcc4f63291ebbbcc6aab476b0ccf12ce06] to check for the presence of the {{javac}} executable on the {{$PATH}} or {{$JAVA_HOME}} to detect whether a JDK is present. Let me know what do you think. I checked that no more warnings "Unknown module: jdk.attach specified to --add-exports" are logged during server initialization, nor when calling nodetool commands when using JRE17: *BEFORE:* {noformat} $ docker run --rm -it cassandra:5 nodetool help | cat | head -n10 WARNING: Unknown module: jdk.attach specified to --add-exports WARNING: Unknown module: jdk.compiler specified to --add-exports WARNING: Unknown module: jdk.compiler specified to --add-opens usage: nodetool [(-p | --port )] [(-u | --username )] [(-pw | --password )] [(-pwf | --password-file )] [(-pp | --print-port)] [(-h | --host )] [] {noformat} *AFTER:* {noformat} $ docker run --rm -it cassandra-test:5.0-19001 nodetool help | cat | head -n10 usage: nodetool [(-pw | --password )] [(-p | --port )] [(-pwf | --password-file )] [(-pp | --print-port)] [(-h | --host )] [(-u | --username )] [] {noformat} I also checked that nodetool sjk fails with this message on JRE17: {noformat} $ docker run --rm -it cassandra-test:5.0-19001 nodetool sjk jps | head -n10 ERROR 17:22:29,631 Java home points to /opt/java/openjdk make sure it is not a JRE path ERROR 17:22:29,632 Failed to add tools.jar to classpath java.lang.ClassNotFoundException: com.sun.tools.attach.VirtualMachine at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown Source) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown Source) at java.base/java.lang.ClassLoader.loadClass(Unknown Source) at org.gridkit.lab.jvm.attach.AttachAPI.(AttachAPI.java:52) {noformat} But works when a JDK17 is present: {noformat} $ bin/nodetool sjk jps 22825 org.apache.cassandra.tools.NodeTool -p 7199 sjk jps {noformat} I checked that all commands above have the same output on JRE11. I briefly tested the full query logger on a JRE17 with the patch above and it seems to be working: {noformat} root@6c9f22a89594:/# nodetool enablefullquerylog --path /tmp/bla root@6c9f22a89594:/# cqlsh Connected to Test Cluster at 127.0.0.1:9042 [cqlsh 6.2.0 | Cassandra 5.0-beta1-SNAPSHOT | CQL spec 3.4.7 | Native protocol v5] Use HELP for help. cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> exit root@6c9f22a89594:/# /opt/cassandra/tools/bin/fqltool dump /tmp/bla INFO [main] 2023-12-27 16:56:34,673 DatabaseDescriptor.java:1557 - Supported sstable formats are: big -> org.apache.cassandra.io.sstable.format.big.BigFormat with singleton components: [Data.db, Index.db, Statistics.db, CompressionInfo.db, Filter.db, Summary.db, Digest.crc32, CRC.db, TOC.txt], bti -> org.apache.cassandra.io.sstable.format.bti.BtiFormat with singleton components: [Data.db, Partitions.db, Rows.db, Statistics.db, CompressionInfo.db, Filter.db, Digest.crc32, CRC.db, TOC.txt] INFO [main] 2023-12-27 16:56:34,723 Jvm.java:174 - Chronicle core loaded from file:/opt/cassandra/lib/chronicle-core-2.23.36.jar INFO [main] 2023-12-27 16:56:34,817 Slf4jExceptionHandler.java:44 - Took 6 ms to add mapping for /tmp/bla/metadata.cq4t INFO [main] 2023-12-27 16:56:34,859 Slf4jExceptionHandler.java:44 - Running under OpenJDK Runtime Environment 17.0.9+9 with 16 processors reported. INFO [main] 2023-12-27 16:56:34,860 Slf4jExceptionHandler.java:44 - Leave your e-mail to get information about the latest releases and patches at https://chronicle.software/release-notes/ INFO [main] 2023-12-27 16:56:34,861 Slf4jExceptionHandler.java:44 - Process id: 1015 :: Chronicle Queue (5.23.37) Type: single-query Query start time: 1703696157539 Protocol version: 5 Generated timestamp:-9223372036854775808 Generated nowInSeconds:1703696157 Query: SELECT * FROM system.peers_v2 Values: Type: single-query Query start time: 1703696157544 Protocol version: 5 Generated timestamp:-9223372036854775808 Generated nowInSeconds:1703696157 Query: SELECT * FROM system.local WHERE key='local' Values: {noformat} I inspecte
[jira] [Commented] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799057#comment-17799057 ] Paulo Motta commented on CASSANDRA-19001: - I'll take a look at this today, will get back soon. > Check whether the startup warnings for unknown modules represent a legit > problem or cosmetic issue > -- > > Key: CASSANDRA-19001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19001 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.0.x, 5.x > > > During the 5.0 alpha 2 release > [vote|https://lists.apache.org/thread/lt3x0obr5cpbcydf5490pj6b2q0mz5zr], > [~paulo] raised the following concerns: > {code:java} > Launched a tarball-based 5.0-alpha2 container on top of > "eclipse-temurin:17-jre-focal" and the server starts up fine, can run > nodetool and cqlsh. > I got these seemingly harmless JDK17 warnings during startup and when > running nodetool (no warnings on JDK11): > WARNING: Unknown module: jdk.attach specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-opens > WARNING: A terminally deprecated method in java.lang.System has been called > WARNING: System::setSecurityManager has been called by > org.apache.cassandra.security.ThreadAwareSecurityManager > (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar) > WARNING: Please consider reporting this to the maintainers of > org.apache.cassandra.security.ThreadAwareSecurityManager > WARNING: System::setSecurityManager will be removed in a future release > Anybody knows if these warnings are legit/expected ? We can create > follow-up tickets if needed. > $ java --version > openjdk 17.0.9 2023-10-17 > OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9) > OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, > sharing) > {code} > {code:java} > Clarification: - When running nodetool only the "Unknown module" warnings > show up. All warnings show up during startup.{code} > We need to verify whether this presents a real problem in the features where > those modules are expected to be used, or if it is a false alarm. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-18999: Status: Changes Suggested (was: Review In Progress) > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798298#comment-17798298 ] Paulo Motta commented on CASSANDRA-18999: - {quote}I think we should keep some version of hasMajorVersion3Nodes still around, something like this: {quote} Where will this method be ever used if we're removing {{Gossiper::hasMajorVersion3Nodes}} and all references to it? There is no place in the code that requires checking if there are unknown nodes in gossip, except inside [upgradeFromVersionSupplier|https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/gms/Gossiper.java#L219], where the check will be kept. {quote}I just dont understand that when the original version was dealing with unknown versions and it could evaluate that method as returning true, then us removing the unknown check will change behavior in 5.0 as well. {quote} As far as I understand the objective of hasMajorVersion3Nodes methods is to *not* do things when a cluster node is identified to be in version 3.x. It was not possible to know if a node with unknown version was on 3.x or not, so hasMajorVersion3Nodes returned true if a node version was not known (since it could potentially be a 3.x nodes). On 5.x we no longer need to identify if a node is on version 3.x since direct upgrade from 3.x is not supported, so there is no reason to keep hasMajorVersion3Nodes or hasUnknownNodes around. > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798298#comment-17798298 ] Paulo Motta edited comment on CASSANDRA-18999 at 12/18/23 5:50 PM: --- {quote}I think we should keep some version of hasMajorVersion3Nodes still around, something like this: {quote} Where will this method be ever used if we're removing {{Gossiper::hasMajorVersion3Nodes}} and all references to it? There is no place in the code that requires checking if there are unknown version nodes in gossip, except inside [upgradeFromVersionSupplier|https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/gms/Gossiper.java#L219], where the check will be kept. {quote}I just dont understand that when the original version was dealing with unknown versions and it could evaluate that method as returning true, then us removing the unknown check will change behavior in 5.0 as well. {quote} As far as I understand the objective of hasMajorVersion3Nodes methods is to *not* do things when a cluster node is identified to be in version 3.x. It was not possible to know if a node with unknown version was on 3.x or not, so hasMajorVersion3Nodes returned true if a node version was not known (since it could potentially be a 3.x nodes). On 5.x we no longer need to identify if a node is on version 3.x since direct upgrade from 3.x is not supported, so there is no reason to keep hasMajorVersion3Nodes or hasUnknownNodes around. was (Author: paulo): {quote}I think we should keep some version of hasMajorVersion3Nodes still around, something like this: {quote} Where will this method be ever used if we're removing {{Gossiper::hasMajorVersion3Nodes}} and all references to it? There is no place in the code that requires checking if there are unknown nodes in gossip, except inside [upgradeFromVersionSupplier|https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/gms/Gossiper.java#L219], where the check will be kept. {quote}I just dont understand that when the original version was dealing with unknown versions and it could evaluate that method as returning true, then us removing the unknown check will change behavior in 5.0 as well. {quote} As far as I understand the objective of hasMajorVersion3Nodes methods is to *not* do things when a cluster node is identified to be in version 3.x. It was not possible to know if a node with unknown version was on 3.x or not, so hasMajorVersion3Nodes returned true if a node version was not known (since it could potentially be a 3.x nodes). On 5.x we no longer need to identify if a node is on version 3.x since direct upgrade from 3.x is not supported, so there is no reason to keep hasMajorVersion3Nodes or hasUnknownNodes around. > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.'
[jira] [Commented] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798294#comment-17798294 ] Paulo Motta commented on CASSANDRA-18999: - {quote}So, if this is removed in 5.0, that also means that the places where that method is called are not relevant anymore - as you showed its usage in your first comment to this ticket. That means that we would need a little bit more refactoring in 5.0 around that. {quote} Yes, [~isaacreath] I think we need to update the 5.0 patch to remove {{Gossiper::hasMajorVersion3Nodes}} and any references to it. {quote}check this, that comment in particular (1). It seems to me that unknown version can happen in 4.0+ as well. {quote} We shouldn't remove this variable from {{upgradeFromVersionSupplier}} since it will still be needed there. We just don't need the method {{hasMajorVersion3OrUnknownNodes}} nor {{{}hasNodeWithUnknownVersion{}}}, since these will no longer be required anywhere in 5.x. For the 5.x patch after removing {{Gossiper::hasMajorVersion3Nodes}} we can keep the [upgradeFromVersionSupplier|https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/gms/Gossiper.java#L219] in the original configuration, where {{allHostsHaveKnownVersion}} is a local variable within that method used [here|https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/gms/Gossiper.java#L253]. In summary: - 4.0/4.1 patches: LGTM - 5.0 patch: only remove Gossiper::hasMajorVersion3Nodes and any references to it. - Trunk (no change) WDYT? > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apach
[jira] [Updated] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-18999: Reviewers: Paulo Motta, Stefan Miklosovic (was: Stefan Miklosovic) > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798276#comment-17798276 ] Paulo Motta commented on CASSANDRA-18999: - Fwiw I'm +1 on the patch, but let's wait a bit to see if Mick/Brandon have any input. If you're good can you trigger ci [~smiklosovic] ? I need to setup my circleci stuff to be able to submit, will setup this soon. > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798274#comment-17798274 ] Paulo Motta edited comment on CASSANDRA-18999 at 12/18/23 4:27 PM: --- {quote}So I can see an argument for completely removing this in 5.0, but on the other hand, there is also this "or unknown nodes" and that is still valid question to ask. Hence, would not it be more appropriate to remove "isUpgradingFromVersionLowerThan" and base this method just on "hasNodeWithUnknownVersion" ? {quote} Upgrade from 3.x to 5.x is not supported, so this method should be removed. The unknown version check is a pessimistic guard against a 3.x node possibly not having its version propagated via gossip. Since upgrade from 3.x is no longer supported on 5.x, the unknown version check should no longer exist. was (Author: paulo): {quote}So I can see an argument for completely removing this in 5.0, but on the other hand, there is also this "or unknown nodes" and that is still valid question to ask. Hence, would not it be more appropriate to remove "isUpgradingFromVersionLowerThan" and base this method just on "hasNodeWithUnknownVersion" ? {quote} Upgrade from 3.x to 5.x is not supported, so this method should be removed. The unknown version check is a pessimistic guard against a 3.x node possibly not having its version propagated via gossip. Since upgrade from 3.x is no longer supported on 3.x, this should no longer be guarded against. > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18999) Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading patch version without Cassandra 3 nodes.
[ https://issues.apache.org/jira/browse/CASSANDRA-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798274#comment-17798274 ] Paulo Motta commented on CASSANDRA-18999: - {quote}So I can see an argument for completely removing this in 5.0, but on the other hand, there is also this "or unknown nodes" and that is still valid question to ask. Hence, would not it be more appropriate to remove "isUpgradingFromVersionLowerThan" and base this method just on "hasNodeWithUnknownVersion" ? {quote} Upgrade from 3.x to 5.x is not supported, so this method should be removed. The unknown version check is a pessimistic guard against a 3.x node possibly not having its version propagated via gossip. Since upgrade from 3.x is no longer supported on 3.x, this should no longer be guarded against. > Gossiper::hasMajorVersion3Nodes returns true when a cluster is upgrading > patch version without Cassandra 3 nodes. > - > > Key: CASSANDRA-18999 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18999 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata >Reporter: Isaac Reath >Assignee: Isaac Reath >Priority: Low > Labels: lhf > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > When working on https://issues.apache.org/jira/browse/CASSANDRA-18968 we > found that {{Gossiper::hasMajorVersion3Nodes}} will return true when the > cluster is undergoing an upgrade from a patch version even if the cluster has > no Cassandra 3 nodes in it. > This can be reproduced by running this Gossiper test: > {code:java} > @Test > public void > testHasVersion3NodesShouldReturnFalseWhenNoVersion3NodesDetectedAndCassandra4UpgradeInProgress() > throws Exception > { > Gossiper.instance.start(0); > Gossiper.instance.expireUpgradeFromVersion(); > VersionedValue.VersionedValueFactory factory = new > VersionedValue.VersionedValueFactory(null); > EndpointState es = new EndpointState((HeartBeatState) null); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(CURRENT_VERSION.toString())); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.1"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.1")); > es = new EndpointState((HeartBeatState) null); > String previousPatchVersion = String.valueOf(CURRENT_VERSION.major) + > '.' + (CURRENT_VERSION.minor) + '.' + (CURRENT_VERSION.patch - 1); > es.addApplicationState(ApplicationState.RELEASE_VERSION, > factory.releaseVersion(previousPatchVersion)); > > Gossiper.instance.endpointStateMap.put(InetAddressAndPort.getByName("127.0.0.2"), > es); > > Gossiper.instance.liveEndpoints.add(InetAddressAndPort.getByName("127.0.0.2")); > assertFalse(Gossiper.instance.hasMajorVersion3Nodes()); > } > {code} > This seems to be because of > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L2360], > where an upgrade in progress is possible but we are not upgrading from a > lower family version (i.e from 4.1.1 to 4.1.2). > From the comment in this function, it seems instead of the existing check, we > would want to iterate over all known endpoints in gossip and return true if > any of them do not have a version (similar to > [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236) > > |https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/gms/Gossiper.java#L227-L236).] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793739#comment-17793739 ] Paulo Motta commented on CASSANDRA-16418: - bq. However, from the API pov CompactionManager.performCleanup can be now called anytime - I think it was important precondition for that method - wouldn't be good to keep it there, just changing the condition to check pending ranges rather than joining status? Good point, this was overlooked during review - I suggested removing that just to cleanup but looking back I think there is value in keeping it for safety if this API is used elsewhere. Feel free to create a new ticket to add it back or piggyback in some other ticket, I'd be glad to review. To me it'd be nice that CompactionManager API is a dumb local API unaware of token ranges/membership status since it's just a local operation, but practically these concerns are mixed across the codebase so developers expect that any local API is safe from a distributed standpoint. > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16418) Unsafe to run nodetool cleanup during bootstrap or decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-16418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17792847#comment-17792847 ] Paulo Motta commented on CASSANDRA-16418: - {quote}Why that check in CompactionManager was removed? Was it needed for tests to make them run? I'm afraid that the check could have been legit for production use. {quote} I think that check was deemed unnecessary after a new check was added to [StorageService.forceKeyspaceCleanup|https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/service/StorageService.java#L3907] to prevent starting cleanup when there are pending ranges (ie. when a node is joining). It's not clear to me why this latter check is not present in [trunk|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L2524] (while it's present in 4.0/4.1). > Unsafe to run nodetool cleanup during bootstrap or decommission > --- > > Key: CASSANDRA-16418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16418 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Bootstrap and Decommission >Reporter: James Baker >Assignee: Lindsey Zurovchak >Priority: Normal > Fix For: 4.0.8, 4.1.1, 5.0-alpha1, 5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > What we expected: Running a cleanup is a safe operation; the result of > running a query after a cleanup should be the same as the result of running a > query before a cleanup. > What actually happened: We ran a cleanup during a decommission. All the > streamed data was silently deleted, the bootstrap did not fail, the cluster's > data after the decommission was very different to the state before. > Why: Cleanups do not take into account pending ranges and so the cleanup > thought that all the data that had just been streamed was redundant and so > deleted it. We think that this is symmetric with bootstraps, though have not > verified. > Not sure if this is technically a bug but it was very surprising (and > seemingly undocumented) behaviour. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791828#comment-17791828 ] Paulo Motta commented on CASSANDRA-19001: - [~e.dimitrova] thanks for the patch! I'll take a look ASAP, hopefully tomorrow. > Check whether the startup warnings for unknown modules represent a legit > problem or cosmetic issue > -- > > Key: CASSANDRA-19001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19001 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 5.0-rc, 5.0.x, 5.x > > > During the 5.0 alpha 2 release > [vote|https://lists.apache.org/thread/lt3x0obr5cpbcydf5490pj6b2q0mz5zr], > [~paulo] raised the following concerns: > {code:java} > Launched a tarball-based 5.0-alpha2 container on top of > "eclipse-temurin:17-jre-focal" and the server starts up fine, can run > nodetool and cqlsh. > I got these seemingly harmless JDK17 warnings during startup and when > running nodetool (no warnings on JDK11): > WARNING: Unknown module: jdk.attach specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-opens > WARNING: A terminally deprecated method in java.lang.System has been called > WARNING: System::setSecurityManager has been called by > org.apache.cassandra.security.ThreadAwareSecurityManager > (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar) > WARNING: Please consider reporting this to the maintainers of > org.apache.cassandra.security.ThreadAwareSecurityManager > WARNING: System::setSecurityManager will be removed in a future release > Anybody knows if these warnings are legit/expected ? We can create > follow-up tickets if needed. > $ java --version > openjdk 17.0.9 2023-10-17 > OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9) > OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, > sharing) > {code} > {code:java} > Clarification: - When running nodetool only the "Unknown module" warnings > show up. All warnings show up during startup.{code} > We need to verify whether this presents a real problem in the features where > those modules are expected to be used, or if it is a false alarm. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19033) Add virtual table with GC pause history
[ https://issues.apache.org/jira/browse/CASSANDRA-19033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786927#comment-17786927 ] Paulo Motta commented on CASSANDRA-19033: - Seems like it could be useful to expose formatted gc info via a vtable for troubleshooting/tuning. If GC logging is not enabled I think it's fine to error out or perhaps not even load the virtual table. Would a specific GC logging format be required? Would this support just gc.log.current or compressed rolled over files? Do you have an idea on what the table schema would look like and possible queries? > Add virtual table with GC pause history > --- > > Key: CASSANDRA-19033 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19033 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Virtual Tables >Reporter: Jon Haddad >Priority: Normal > > We should be able to view GC pause history in a virtual table. > I think the best approach here is to read from the GC logs. The format was > unified in Java 9, and we've dropped older JVM support so I think this is > reasonable. The benefits of using logs are that we can preserve it across > restarts and we enable GC logs by default. > The downside is people might not have GC logs configured and it seems weird > that a feature would just stop working because logs aren't enabled. Maybe > that's OK if we call it out, or error if people try to read from it and the > logs aren't enabled. I think if someone disables -Xlog:gc then an error > might be fine as I don't expect it to happen often. I think I lean towards > this from a usability perspective, and Microsoft has a > [project|https://github.com/microsoft/gctoolkit] to parse them, but I haven't > used it so I'm not sure if it's suitable for us. > At a minimum, pause time should be it's own field so we can query for pauses > over a specific threshold, but there may be other data we want to explicitly > split out as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786885#comment-17786885 ] Paulo Motta commented on CASSANDRA-19001: - {quote}A warning would not "break" it. It would inform users of the docker image of known limitations. This buys us time to then deal with the issue properly as we wish. (And the docker image maintainers may notice and change to JDK anyway…) {quote} It would deprecate JRE support which was previously supported (ie. JRE_HOME is mentioned [here|https://cassandra.apache.org/doc/latest/cassandra/reference/java17.html] and other places). If there are no hard dependencies on the JDK for core features, I would prefer to just require it for optional features like SJK and audit log. WDYT? One question that arises is whether we want to continue JRE support to core features. The benefits I can think of are smaller image size and fewer runtime dependencies. > Check whether the startup warnings for unknown modules represent a legit > problem or cosmetic issue > -- > > Key: CASSANDRA-19001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19001 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0-rc, 5.0.x, 5.x > > > During the 5.0 alpha 2 release > [vote|https://lists.apache.org/thread/lt3x0obr5cpbcydf5490pj6b2q0mz5zr], > [~paulo] raised the following concerns: > {code:java} > Launched a tarball-based 5.0-alpha2 container on top of > "eclipse-temurin:17-jre-focal" and the server starts up fine, can run > nodetool and cqlsh. > I got these seemingly harmless JDK17 warnings during startup and when > running nodetool (no warnings on JDK11): > WARNING: Unknown module: jdk.attach specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-opens > WARNING: A terminally deprecated method in java.lang.System has been called > WARNING: System::setSecurityManager has been called by > org.apache.cassandra.security.ThreadAwareSecurityManager > (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar) > WARNING: Please consider reporting this to the maintainers of > org.apache.cassandra.security.ThreadAwareSecurityManager > WARNING: System::setSecurityManager will be removed in a future release > Anybody knows if these warnings are legit/expected ? We can create > follow-up tickets if needed. > $ java --version > openjdk 17.0.9 2023-10-17 > OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9) > OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, > sharing) > {code} > {code:java} > Clarification: - When running nodetool only the "Unknown module" warnings > show up. All warnings show up during startup.{code} > We need to verify whether this presents a real problem in the features where > those modules are expected to be used, or if it is a false alarm. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-18762: Resolution: (was: Fixed) Status: Open (was: Resolved) > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+ExitOnOutOfMemoryError > -XX:+HeapDumpOnOutOfMemoryError > -XX:+Parallel
[jira] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762 ] Paulo Motta deleted comment on CASSANDRA-18762: - was (Author: paulo): Thanks for the follow-up. I will close this for now, please re-open if you observe the issue after 4.0.10. > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+ExitOnOutOfMemoryError > -XX:+HeapDu
[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-18762: Resolution: Cannot Reproduce Status: Resolved (was: Open) > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+ExitOnOutOfMemoryError > -XX:+HeapDumpOnOutOfMemoryError > -XX:+Parallel
[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-18762: Fix Version/s: (was: 5.x) (was: 4.0.x) (was: 4.1.x) (was: 5.0.x) > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Attachments: Cluster-dm-metrics-1.PNG > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysPreTouch > -XX:+CrashOnOutOfMemoryError > -XX:+Exi
[jira] [Commented] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786596#comment-17786596 ] Paulo Motta commented on CASSANDRA-18762: - Thanks for the follow-up. I will close this for now, please re-open if you observe the issue after 4.0.10. > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > Attachments: Cluster-dm-metrics-1.PNG > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat}
[jira] [Updated] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-18762: Resolution: Fixed Status: Resolved (was: Open) Thanks for the follow-up. I will close this for now, please re-open if you observe the issue after 4.0.10. > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > Attachments: Cluster-dm-metrics-1.PNG > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > don
[jira] [Commented] (CASSANDRA-18762) Repair triggers OOM with direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-18762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786588#comment-17786588 ] Paulo Motta commented on CASSANDRA-18762: - [~bschoeni] Did you confirm CASSANDRA-16681 fixes this issue? > Repair triggers OOM with direct buffer memory > - > > Key: CASSANDRA-18762 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18762 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Brad Schoening >Priority: Normal > Labels: OutOfMemoryError > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > Attachments: Cluster-dm-metrics-1.PNG > > > We are seeing repeated failures of nodes with 16GB of heap and the same size > (16GB) for direct memory (derived from -Xms). This seems to be related to > CASSANDRA-15202 which moved merkel trees off-heap in 4.0. Using Cassandra > 4.0.6. > {noformat} > 2023-08-09 04:30:57,470 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e55a3b0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_a from > /169.102.200.241:7000 > 2023-08-09 04:30:57,567 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e0d2900-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.93.192.29:7000 > 2023-08-09 04:30:57,568 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e1dcad0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_c from > /169.104.171.134:7000 > 2023-08-09 04:30:57,591 [INFO ] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 RepairSession.java:202 - [repair > #5e69a0e0-366d-11ee-a644-d91df26add5e] Received merkle tree for table_b from > /169.79.232.67:7000 > 2023-08-09 04:30:57,876 [INFO ] [Service Thread] cluster_id=101 > ip_address=169.0.0.1 GCInspector.java:294 - G1 Old Generation GC in 282ms. > Compressed Class Space: 8444560 -> 8372152; G1 Eden Space: 7809794048 -> 0; > G1 Old Gen: 1453478400 -> 820942800; G1 Survivor Space: 419430400 -> 0; > Metaspace: 80411136 -> 80176528 > 2023-08-09 04:30:58,387 [ERROR] [AntiEntropyStage:1] cluster_id=101 > ip_address=169.0.0.1 JVMStabilityInspector.java:102 - OutOfMemory error > letting the JVM handle the error: > java.lang.OutOfMemoryError: Direct buffer memory > at java.base/java.nio.Bits.reserveMemory(Bits.java:175) > at java.base/java.nio.DirectByteBuffer.(DirectByteBuffer.java:118) > at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:318) > at org.apache.cassandra.utils.MerkleTree.allocate(MerkleTree.java:742) > at > org.apache.cassandra.utils.MerkleTree.deserializeOffHeap(MerkleTree.java:780) > at org.apache.cassandra.utils.MerkleTree.deserializeTree(MerkleTree.java:751) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:720) > at org.apache.cassandra.utils.MerkleTree.deserialize(MerkleTree.java:698) > at > org.apache.cassandra.utils.MerkleTrees$MerkleTreesSerializer.deserialize(MerkleTrees.java:416) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:100) > at > org.apache.cassandra.repair.messages.ValidationResponse$1.deserialize(ValidationResponse.java:84) > at > org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:782) > at org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:642) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.deserialize(InboundMessageHandler.java:364) > at > org.apache.cassandra.net.InboundMessageHandler$LargeMessage.access$1100(InboundMessageHandler.java:317) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessLargeMessage.provideMessage(InboundMessageHandler.java:504) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:429) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834)no* further _formatting_ is > done here{noformat} > > -XX:+AlwaysP
[jira] [Updated] (CASSANDRA-18661) Update to cassandra-stress to use Apache Commons CLI
[ https://issues.apache.org/jira/browse/CASSANDRA-18661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-18661: Labels: lhf (was: ) > Update to cassandra-stress to use Apache Commons CLI > > > Key: CASSANDRA-18661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18661 > Project: Cassandra > Issue Type: Improvement > Components: Tool/stress >Reporter: Brad Schoening >Priority: Normal > Labels: lhf > > The Apache Commons CLI library provides an API for parsing command line > options with the package org.apache.commons.cli and this is already used by a > dozen of existing Cassandra utilities including: > {quote}SSTableMetadataViewer, StandaloneScrubber, StandaloneSplitter, > SSTableExport, BulkLoader, and others. > {quote} > However, cassandra-stress is an outlier which uses its own custom classes to > parse command line options with classes such as OptionsSimple. In addition, > the options syntax for username, password, and others are not aligned with > the format used by CQLSH. > Currently, there are > 5K lines of code in 'settings' which appears to just > process command line args. > This suggestion is to: > > a) Upgrade cassandra-stress to use Apache Commons CLI (no new dependencies > are required as this library is already used by the project) > > b) Align the cassandra-stress CLI options with those in CQLSH, > > {quote}For example, using the new syntax like CQLSH: > {quote} > > cassandra-stress -username foo -password bar > {quote}and replacing the old syntax: > {quote} > cassandra-stress -mode username=foo and password=bar > > This will simplify and unify the code base, eliminate code and reduce the > confusion between similar named classes such as > org.apache.cassandra.stress.settings.\{Option, OptionsMulti, OptionsSimple} > and org.apache.commons.cli.{Option, OptionGroup, Options) > > Note: documentation will need to be updated as well -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19021) Update default disk_access_mode to mmap_index_only
[ https://issues.apache.org/jira/browse/CASSANDRA-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786483#comment-17786483 ] Paulo Motta commented on CASSANDRA-19021: - LGTM, feel free to merge if tests looks good and nobody objects in the ML thread by tomorrow. Please include this [NEWS.txt entry|https://github.com/pauloricardomg/cassandra/commit/f8d08719712c895ee0684fd5e9aa4a911dd33ed3] on commit. > Update default disk_access_mode to mmap_index_only > -- > > Key: CASSANDRA-19021 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19021 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Paulo Motta >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.0-beta, 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > https://lists.apache.org/thread/nhp6vftc4kc3dxskngxy5rpo1lp19drw -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785848#comment-17785848 ] Paulo Motta commented on CASSANDRA-19001: - bq. Our quickest fix is to just fail or warn that using JRE is not recommended (and that some features like audit logging and sjk may not work) This would break the official Cassandra [docker image| https://github.com/docker-library/cassandra/blob/master/5.0/Dockerfile] that is built on top of JRE. Do we want to drop the unintended JRE support that has been proven to work over the years on this image ? I see the following options: a) Make the JDK dependency optional, failing-fast if features that strictly require it are enabled. Add testing with JRE, preferably with the official docker image. b) Make the JDK dependency strictly required, properly document this and work with the official docker image maintainers to update the image to use JDK instead. Wdyt? > Check whether the startup warnings for unknown modules represent a legit > problem or cosmetic issue > -- > > Key: CASSANDRA-19001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19001 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0-beta, 5.0.x, 5.x > > > During the 5.0 alpha 2 release > [vote|https://lists.apache.org/thread/lt3x0obr5cpbcydf5490pj6b2q0mz5zr], > [~paulo] raised the following concerns: > {code:java} > Launched a tarball-based 5.0-alpha2 container on top of > "eclipse-temurin:17-jre-focal" and the server starts up fine, can run > nodetool and cqlsh. > I got these seemingly harmless JDK17 warnings during startup and when > running nodetool (no warnings on JDK11): > WARNING: Unknown module: jdk.attach specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-opens > WARNING: A terminally deprecated method in java.lang.System has been called > WARNING: System::setSecurityManager has been called by > org.apache.cassandra.security.ThreadAwareSecurityManager > (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar) > WARNING: Please consider reporting this to the maintainers of > org.apache.cassandra.security.ThreadAwareSecurityManager > WARNING: System::setSecurityManager will be removed in a future release > Anybody knows if these warnings are legit/expected ? We can create > follow-up tickets if needed. > $ java --version > openjdk 17.0.9 2023-10-17 > OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9) > OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, > sharing) > {code} > {code:java} > Clarification: - When running nodetool only the "Unknown module" warnings > show up. All warnings show up during startup.{code} > We need to verify whether this presents a real problem in the features where > those modules are expected to be used, or if it is a false alarm. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19001) Check whether the startup warnings for unknown modules represent a legit problem or cosmetic issue
[ https://issues.apache.org/jira/browse/CASSANDRA-19001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785691#comment-17785691 ] Paulo Motta commented on CASSANDRA-19001: - bq. As I explained, the exports/opens for Chronicle have unclear impact on audit ligging. Audit logging is an optional functionality as far as I understand. We can prevent startup if audit logging is enabled and a JDK is not detected. > Check whether the startup warnings for unknown modules represent a legit > problem or cosmetic issue > -- > > Key: CASSANDRA-19001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19001 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0-beta, 5.0.x, 5.x > > > During the 5.0 alpha 2 release > [vote|https://lists.apache.org/thread/lt3x0obr5cpbcydf5490pj6b2q0mz5zr], > [~paulo] raised the following concerns: > {code:java} > Launched a tarball-based 5.0-alpha2 container on top of > "eclipse-temurin:17-jre-focal" and the server starts up fine, can run > nodetool and cqlsh. > I got these seemingly harmless JDK17 warnings during startup and when > running nodetool (no warnings on JDK11): > WARNING: Unknown module: jdk.attach specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-exports > WARNING: Unknown module: jdk.compiler specified to --add-opens > WARNING: A terminally deprecated method in java.lang.System has been called > WARNING: System::setSecurityManager has been called by > org.apache.cassandra.security.ThreadAwareSecurityManager > (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar) > WARNING: Please consider reporting this to the maintainers of > org.apache.cassandra.security.ThreadAwareSecurityManager > WARNING: System::setSecurityManager will be removed in a future release > Anybody knows if these warnings are legit/expected ? We can create > follow-up tickets if needed. > $ java --version > openjdk 17.0.9 2023-10-17 > OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9) > OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, > sharing) > {code} > {code:java} > Clarification: - When running nodetool only the "Unknown module" warnings > show up. All warnings show up during startup.{code} > We need to verify whether this presents a real problem in the features where > those modules are expected to be used, or if it is a false alarm. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19021) Update default disk_access_mode to mmap_index_only
[ https://issues.apache.org/jira/browse/CASSANDRA-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-19021: Workflow: Copy of Cassandra Default Workflow (was: Copy of Cassandra Bug Workflow) Issue Type: Improvement (was: Bug) > Update default disk_access_mode to mmap_index_only > -- > > Key: CASSANDRA-19021 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19021 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Paulo Motta >Priority: Normal > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19021) Update default disk_access_mode to mmap_index_only
Paulo Motta created CASSANDRA-19021: --- Summary: Update default disk_access_mode to mmap_index_only Key: CASSANDRA-19021 URL: https://issues.apache.org/jira/browse/CASSANDRA-19021 Project: Cassandra Issue Type: Bug Components: Local/Config Reporter: Paulo Motta -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19020) cqlsh should allow failure to import cqlshlib.serverversion
[ https://issues.apache.org/jira/browse/CASSANDRA-19020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785530#comment-17785530 ] Paulo Motta commented on CASSANDRA-19020: - +1 after CI looks good. > cqlsh should allow failure to import cqlshlib.serverversion > --- > > Key: CASSANDRA-19020 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19020 > Project: Cassandra > Issue Type: Bug > Components: CQL/Interpreter >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > > cqlshlib.serverversion is created by ant, recording the server's version so > that python can see if it matches cqlsh later. This can make work for other > things that need to be aware of it like CASSANDRA-18594, so we should relax > it a bit since this really has no value outside of warning humans they have a > mismatch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18968) StartupClusterConnectivityChecker fails on upgrade from 3.X
[ https://issues.apache.org/jira/browse/CASSANDRA-18968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784225#comment-17784225 ] Paulo Motta edited comment on CASSANDRA-18968 at 11/8/23 10:26 PM: --- bq. Whole "waiting for gossip to settle" machinery is ... not ideal. Yes, it works in most of the situations but there are edge cases when it does not, e.g. when there are large clusters, it may happen that it may evaluate that gossip is "settled" falsely because it took so much time to detect any changes that it was thinking it is settled. I'm aware waitToSettle is not reliable. Nevertheless I think having a "best-effort" skipping of this check when 3.X nodes are detected in gossip is valuable. This will mostly work as long as gossip with a single node was successful, since it will get the latest known versions of the other nodes. In the case where the gossip information is absent and there are 3.X nodes present in the cluster, it's not a big deal - the check will just be executed and the timeout warning above will be unnecessarily emitted. We just don't want to skip this check when *all nodes are upgraded to 4.x* but I don't think this would happen if waitToSettle fails. bq. I think it would make a lot of sense to run the upgrade tests here. Good call! Thanks was (Author: paulo): bq. Whole "waiting for gossip to settle" machinery is ... not ideal. Yes, it works in most of the situations but there are edge cases when it does not, e.g. when there are large clusters, it may happen that it may evaluate that gossip is "settled" falsely because it took so much time to detect any changes that it was thinking it is settled. I'm aware waitToSettle is not reliable. Nevertheless I think having a "best-effort" skipping of this check when 3.X nodes are detected in gossip is valuable. This will mostly work as long as gossip with a single node was successful, since it will get the latest known versions of the other nodes. In the case where the gossip information is absent and there are 3.X nodes present in the cluster, it's not a big deal - the check will just be executed and the timeout warning above will be unnecessarily emitted. We just don't want to skip this check when *all nodes are upgraded to 4.x* but I don't think this would happen if waitToSettle fails. > StartupClusterConnectivityChecker fails on upgrade from 3.X > --- > > Key: CASSANDRA-18968 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18968 > Project: Cassandra > Issue Type: Bug > Components: Local/Startup and Shutdown >Reporter: Paulo Motta >Assignee: Isaac Reath >Priority: Normal > Labels: lhf > Fix For: 4.0.x, 4.1.x > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Starting up a new 4.X node on a 3.x cluster throws the following warning: > {noformat} > WARN [main] 2023-10-27 15:58:22,234 > StartupClusterConnectivityChecker.java:183 - Timed out after 10002 > milliseconds, was waiting for remaining peers to connect: {dc1=[X.Y.Z.W, > A.B.C.D]} > {noformat} > I think this is because the PING messages used by the startup check are not > available on 3.X. > To provide a smoother upgrade experience we should probably disable this > check on a mixed version clusters, or skip peers on versions < 4.x when doing > the connectivity check. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org