[jira] [Comment Edited] (CASSANDRA-16983) Separating CQLSH credentials from the cqlshrc file
[ https://issues.apache.org/jira/browse/CASSANDRA-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526184#comment-17526184 ] Brian Houser edited comment on CASSANDRA-16983 at 4/22/22 3:37 AM: --- I think we agreed to make some minor changes to this (plain_text_auth) in credentials working with the new custom loading system see https://issues.apache.org/jira/browse/CASSANDRA-16456 was (Author: bhouser): I think we agreed to make some minor changes to this (plain_text_auth) in credentials working with the new custom loading system > Separating CQLSH credentials from the cqlshrc file > -- > > Key: CASSANDRA-16983 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16983 > Project: Cassandra > Issue Type: Improvement > Components: Tool/cqlsh >Reporter: Bowen Song >Assignee: Bowen Song >Priority: Normal > Labels: lhf > Fix For: 4.1 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently, the CQLSH tool accepts credentials (username & password) from the > following 3 places: > 1. the command line parameter "-p" > 2. the cqlshrc file > 3. prompt the user > This is not ideal. > Credentials in the command line is a security risk, because it could be see > by other users on a shared system. > The cqlshrc file is better, but still not good enough. Because the cqlshrc > file is a config file, it's often acceptable to have it as a world readable > file, and share it with other users. It also prevents user from having > multiple sets of credentials, either for the same Cassandra cluster or > different clusters. > To improve the security of CQLSH and make it secure by design, I purpose the > following changes: > * Warn the user if a password is giving in the command line, and recommend > them to use a credential file instead > * Warn the user if credentials are present in the cqlshrc file and the > cqlshrc file is not secure (e.g.: world readable or owned by a different user) > * Deprecate credentials in the cqlshrc, and recommend the user to move them > to a separate credential file. The aim is to not break anything at the > moment, but eventually stop accepting credentials from the cqlshrc file. > * Reject the credentials file if it's not secure, and tell the user how to > secure it. Optionally, prompt the user for password if it's an interactive > session. (Think how does OpenSSH handle insecure credential files) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16983) Separating CQLSH credentials from the cqlshrc file
[ https://issues.apache.org/jira/browse/CASSANDRA-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526184#comment-17526184 ] Brian Houser commented on CASSANDRA-16983: -- I think we agreed to make some minor changes to this (plain_text_auth) in credentials working with the new custom loading system > Separating CQLSH credentials from the cqlshrc file > -- > > Key: CASSANDRA-16983 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16983 > Project: Cassandra > Issue Type: Improvement > Components: Tool/cqlsh >Reporter: Bowen Song >Assignee: Bowen Song >Priority: Normal > Labels: lhf > Fix For: 4.1 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently, the CQLSH tool accepts credentials (username & password) from the > following 3 places: > 1. the command line parameter "-p" > 2. the cqlshrc file > 3. prompt the user > This is not ideal. > Credentials in the command line is a security risk, because it could be see > by other users on a shared system. > The cqlshrc file is better, but still not good enough. Because the cqlshrc > file is a config file, it's often acceptable to have it as a world readable > file, and share it with other users. It also prevents user from having > multiple sets of credentials, either for the same Cassandra cluster or > different clusters. > To improve the security of CQLSH and make it secure by design, I purpose the > following changes: > * Warn the user if a password is giving in the command line, and recommend > them to use a credential file instead > * Warn the user if credentials are present in the cqlshrc file and the > cqlshrc file is not secure (e.g.: world readable or owned by a different user) > * Deprecate credentials in the cqlshrc, and recommend the user to move them > to a separate credential file. The aim is to not break anything at the > moment, but eventually stop accepting credentials from the cqlshrc file. > * Reject the credentials file if it's not secure, and tell the user how to > secure it. Optionally, prompt the user for password if it's an interactive > session. (Think how does OpenSSH handle insecure credential files) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-website] branch asf-staging updated (015350e5 -> 4cb38fd9)
This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a change to branch asf-staging in repository https://gitbox.apache.org/repos/asf/cassandra-website.git discard 015350e5 generate docs for 8fd077a6 new 4cb38fd9 generate docs for 8fd077a6 This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (015350e5) \ N -- N -- N refs/heads/asf-staging (4cb38fd9) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../cassandra/configuration/cass_yaml_file.html| 84 +++-- .../cassandra/configuration/cass_yaml_file.html| 84 +++-- .../cassandra/configuration/cass_yaml_file.html| 84 +++-- content/search-index.js| 2 +- site-ui/build/ui-bundle.zip| Bin 4740078 -> 4740078 bytes 5 files changed, 187 insertions(+), 67 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17571) Config upper bound should be handled earlier
[ https://issues.apache.org/jira/browse/CASSANDRA-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526150#comment-17526150 ] Ekaterina Dimitrova edited comment on CASSANDRA-17571 at 4/22/22 1:51 AM: -- Prototype in this [commit |https://github.com/ekaterinadimitrova2/cassandra/commit/1ab9f32ef34402a0f74036d768a22449170052b6] - only a few parameters were migrated for test purposes and to see how it will look like. Also, I will split in separate commits the parameters in groups on migration with attached tests to them and CI to be sure gradually nothing Is missed but I want to confirm that the approach is still what we want. CC [~adelapena] in case he has time to provide input. Currently if people provide the new config with the new format we handle the former int parameters by returning cast value from their getters, but on startup the user might set a bigger long value and think wrongly that one will be used when in practice the Integer.MAX_VALUE will be used. We need just to fail the user they can't set that big value, mimic the behavior of when they provide old value bigger than int. We also limit with these classes that people cannot set anything that will overflow during conversion to the smallest allowed unit instead of setting MAX_VALUE silently. was (Author: e.dimitrova): Prototype in this [commit |https://github.com/ekaterinadimitrova2/cassandra/commit/1ab9f32ef34402a0f74036d768a22449170052b6] - only a few parameters were migrated for test purposes and to see how it will look like. Also, I will split in separate commits the parameters in groups on migration with attached tests to them and CI to be sure gradually nothing Is missed but I want to confirm that the approach is still what we want. CC [~adelapena] in case he has time to provide input. Currently if people provide the new config with the new format we handle the former int parameters by returning cast value from their getters, but on startup the user might set a bigger long value and think wrongly that one will be used when in practice the Integer.MAX_VALUE will be used. We need just to fail the user they can't set that big value, mimic the behavior of when they provide old value bigger than int. We also limit with these classes that people cannot set anything that will overflow during conversion to the smallest allowed unit instead of setting MAX_VALUE silently. > Config upper bound should be handled earlier > > > Key: CASSANDRA-17571 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17571 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1 > > > Config upper bound should be handled on startup/config setup and not during > conversion -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17571) Config upper bound should be handled earlier
[ https://issues.apache.org/jira/browse/CASSANDRA-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526150#comment-17526150 ] Ekaterina Dimitrova commented on CASSANDRA-17571: - Prototype in this [commit |https://github.com/ekaterinadimitrova2/cassandra/commit/1ab9f32ef34402a0f74036d768a22449170052b6] - only a few parameters were migrated for test purposes and to see how it will look like. Also, I will split in separate commits the parameters in groups on migration with attached tests to them and CI to be sure gradually nothing Is missed but I want to confirm that the approach is still what we want. CC [~adelapena] in case he has time to provide input. Currently if people provide the new config with the new format we handle the former int parameters by returning cast value from their getters, but on startup the user might set a bigger long value and think wrongly that one will be used when in practice the Integer.MAX_VALUE will be used. We need just to fail the user they can't set that big value, mimic the behavior of when they provide old value bigger than int. We also limit with these classes that people cannot set anything that will overflow during conversion to the smallest allowed unit instead of setting MAX_VALUE silently. > Config upper bound should be handled earlier > > > Key: CASSANDRA-17571 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17571 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1 > > > Config upper bound should be handled on startup/config setup and not during > conversion -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16456) Add Plugin Support for CQLSH
[ https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526146#comment-17526146 ] Brian Houser commented on CASSANDRA-16456: -- Ok... cool. I think we've finally cracked the desired behavior. I'm going to go ahead and write it out the spec here. Implementing this should be quick. * in the cqlshrc file you can list an Auth_provider section, and specify a module and class name. if you do than we will dynamically load that class using the remaining properties in the auth_provider section as well as the properties found in credentials under that class name. * If you don't provide an auth_provider module and class name, we will assume you specified the PlainTextAuthProvider. * You can provide a user name and a password on the command line. if you do, these two properties will be passed to whatever auth provider is specified, and will override any other username and password provided in the credentials or other file * you can provide a user name and password under the Authentication section. If you do, those properties will be passed to whatever auth_provider specified and will override any other specification of username and password in credentials or cqlshrc file. * Any properties in credentials file will override the properties in the auth_provider section of the cqlshrc file. * If you are using the PlainTextAuthProvider and only provide username, you will be prompted for a password. I'll implement the above and add tests for the behavior. Please let me know if this spec isn't accurate. > Add Plugin Support for CQLSH > > > Key: CASSANDRA-16456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16456 > Project: Cassandra > Issue Type: New Feature > Components: Tool/cqlsh >Reporter: Brian Houser >Assignee: Brian Houser >Priority: Normal > Labels: gsoc2021, mentor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently the Cassandra drivers offer a plugin authenticator architecture for > the support of different authentication methods. This has been leveraged to > provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, > cqlsh, the included CLI tool, does not offer such support. Switching to a new > enhanced authentication scheme thus means being cut off from using cqlsh in > normal operation. > We should have a means of using the same plugins and authentication providers > as the Python Cassandra driver. > Here's a link to an initial draft of > [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17180: Status: Open (was: Patch Available) > Implement startup check to prevent Cassandra start to spread zombie data > > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 9.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526115#comment-17526115 ] Paulo Motta commented on CASSANDRA-17180: - bq. After spending more time on this, I identified an issue Nice catch! bq. I have not detected this by my unit tests because I was, more or less, mocking it but once I actually tried it on the running node, to my surprise it was not detecting the tables which should be causing violations. Can we create a (in-jvm or python) dtest to ensure this is being properly tested and any future regressions caught? bq. I think it is viable to do via "SchemaKeyspace.fetchNonSystemKeyspaces()". Sounds good to me. bq. I am not sure I can make this method publicly visible without any conseqencies yet. I think this should be fine. bq. On the other hand, it will check tables in "system_distributed" as well as "system_auth". These tables do not have gc = 0 and they are not excluded from fetchNonSystemKeyspaces call. that's ok, it's probably a good idea to check these tables anyway. > Implement startup check to prevent Cassandra start to spread zombie data > > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 9.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17568) Implement nodetool command to list data directories of existing tables
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526109#comment-17526109 ] Stefan Miklosovic commented on CASSANDRA-17568: --- Thats exactly right. Server should process it all when its file-system is involved, indeed. I made a mistake here not detecting it does not make a lot of sense as I was thinking too much "locally". > Implement nodetool command to list data directories of existing tables > -- > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > Time Spent: 9h 10m > Remaining Estimate: 0h > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17537) nodetool compact should support using a key string to find the range to avoid operators having to manually do this
[ https://issues.apache.org/jira/browse/CASSANDRA-17537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526107#comment-17526107 ] David Capwell commented on CASSANDRA-17537: --- ok so that new assert breaks things, so canceling the commit https://app.circleci.com/pipelines/github/dcapwell/cassandra/1389/workflows/b987634e-a680-4b3f-bee7-e7e11e8e4b29/jobs/11372 {code} Attempted to force compact [BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-226-big-Data.db'), BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-216-big-Data.db'), BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-214-big-Data.db'), BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-228-big-Data.db'), BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-218-big-Data.db'), BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-224-big-Data.db'), BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-222-big-Data.db'), BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-220-big-Data.db'), BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-210-big-Data.db'), BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-212-big-Data.db')], but predicate does not include {code} > nodetool compact should support using a key string to find the range to avoid > operators having to manually do this > -- > > Key: CASSANDRA-17537 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17537 > Project: Cassandra > Issue Type: New Feature > Components: Local/Compaction, Tool/nodetool >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Time Spent: 1h > Remaining Estimate: 0h > > Its common that a single key needs to be compact, and operators need to do > the following > 1) go from key -> token > 2) generate range > 3) call nodetool compact with this range > We can simply this workflow by adding this to compact > nodetool compact ks.tbl -k “key1" -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526105#comment-17526105 ] Paulo Motta edited comment on CASSANDRA-17180 at 4/21/22 10:01 PM: --- Thanks for addressing initial comments. Finally found some time to look into this more deeply. Please find some follow-up comments below: * I think safety checks should be enabled by default, as long as people can disable it easily. Should we make this startup check enabled by default? We could improve the error message when the check fails to mention the properties to disable the check ({{{}startup_checks.check_data_resurrection.enabled=false{}}}) or ignore specific keyspace/tables ({{{}excluded_tables{}}}/{{{}excluded_keyspaces{}}})? * I didn't like [check-specific logic|https://github.com/apache/cassandra/pull/1351/files#diff-957f2fa6365cb92f19b74347fee7a9f310a07e32c3112f35196dc17462ec7269R511] on CassandraDaemon to schedule the heartbeat. I implemented this [suggestion|https://github.com/apache/cassandra/commit/0b3557dd43255538942a86f63dec4c36272f25e9] to move check post-action to StartupCheck class - what do you think? * Can we rename {{GcGraceSecondsOnStartupCheck}} class to {{CheckDataResurrection}} to be consistent with the check name ? * Can we make the default heartbeat file be stored on the storage directory (ie. {{{}DD.getLocalSystemKeyspacesDataFileLocations(){}}}) ? In some deployments the cassandra directory is non-writable. * I don't like adding [custom logic|https://github.com/apache/cassandra/pull/1351/files#diff-f375982492d2426d26da68e105a44d397568be76361e8156fe299e875b8041ffR214] to read/write the hearbeat file - since this is error-prone and we're just interested in the timestamp value and not the file format. Can we just use [File.setLastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#setLastModified(long)] and [File.lastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#lastModified()] to read/write the heartbeat instead? was (Author: paulo): Thanks for addressing initial comments. Finally found some time to look into this more deeply. Please find some follow-up comments below: * I think safety checks should be enabled by default, as long as people can disable it easily. Should we make this startup check enabled by default? We could improve the error message when the check fails to mention the properties to disable the check ({{startup_checks.check_data_resurrection.enabled=false}}) or ignore specific keyspace/tables ({{excluded_tables}}/{{excluded_keyspaces}})? * I didn't like check-specific logic on CassandraDaemon to schedule the heartbeat. I implemented this [suggestion|https://github.com/apache/cassandra/commit/0b3557dd43255538942a86f63dec4c36272f25e9] to move check post-action to StartupCheck class - what do you think? * Can we rename {{GcGraceSecondsOnStartupCheck}} class to {{CheckDataResurrection}} to be consistent with the check name ? * Can we make the default heartbeat file be stored on the storage directory (ie. {{DD.getLocalSystemKeyspacesDataFileLocations()}}) ? In some deployments the cassandra directory is non-writable. * I don't like adding [custom logic|https://github.com/apache/cassandra/pull/1351/files#diff-f375982492d2426d26da68e105a44d397568be76361e8156fe299e875b8041ffR214] to read/write the hearbeat file - since this is error-prone and we're just interested in the timestamp value and not the file format. Can we just use [File.setLastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#setLastModified(long)] and [File.lastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#lastModified()] to read/write the heartbeat instead? > Implement startup check to prevent Cassandra start to spread zombie data > > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 9.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526105#comment-17526105 ] Paulo Motta commented on CASSANDRA-17180: - Thanks for addressing initial comments. Finally found some time to look into this more deeply. Please find some follow-up comments below: * I think safety checks should be enabled by default, as long as people can disable it easily. Should we make this startup check enabled by default? We could improve the error message when the check fails to mention the properties to disable the check ({{startup_checks.check_data_resurrection.enabled=false}}) or ignore specific keyspace/tables ({{excluded_tables}}/{{excluded_keyspaces}})? * I didn't like check-specific logic on CassandraDaemon to schedule the heartbeat. I implemented this [suggestion|https://github.com/apache/cassandra/commit/0b3557dd43255538942a86f63dec4c36272f25e9] to move check post-action to StartupCheck class - what do you think? * Can we rename {{GcGraceSecondsOnStartupCheck}} class to {{CheckDataResurrection}} to be consistent with the check name ? * Can we make the default heartbeat file be stored on the storage directory (ie. {{DD.getLocalSystemKeyspacesDataFileLocations()}}) ? In some deployments the cassandra directory is non-writable. * I don't like adding [custom logic|https://github.com/apache/cassandra/pull/1351/files#diff-f375982492d2426d26da68e105a44d397568be76361e8156fe299e875b8041ffR214] to read/write the hearbeat file - since this is error-prone and we're just interested in the timestamp value and not the file format. Can we just use [File.setLastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#setLastModified(long)] and [File.lastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#lastModified()] to read/write the heartbeat instead? > Implement startup check to prevent Cassandra start to spread zombie data > > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 9.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16456) Add Plugin Support for CQLSH
[ https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526095#comment-17526095 ] Stefan Miklosovic edited comment on CASSANDRA-16456 at 4/21/22 9:58 PM: The answer to your very last question is yes. Because you could have an auth_provider implementation which is still "username and password-based" but it may differ internally. But we should still pass username / password to it and it is up to an implementation if it uses these flags or it will just ignore them. The implementation may, for example, detect that username / password does not make any sense to it and it may act on it (throwing exception or logging) but it is up to it solely what it does with it. Username and password just happened to be the most commonly used options but they are "just options", as any other one and they should passed to that impl. EDIT: what I need to check is that for SASL / GSSAPI, we can indeed instantiate that provider with all options user wants, even they are useless for that provider. Some providers might be strict and they would throw errors if you set it up with a property it does not recognize but I doubt that sasl/gssapi impl is done like that. Even if it is true, an user just stops to configure it like that. One detail I would mention is that we should ask for password only in case auth provider is plain text one because we are totally sure we need it if it is not specified anywhere. For other providers, I would not ask for it. was (Author: smiklosovic): The answer to your very last question is yes. Because you could have an auth_provider implementation which is still "username and password-based" but it may differ internally. But we should still pass username / password to it and it is up to an implementation if it uses these flags or it will just ignore them. The implementation may, for example, detect that username / password does not make any sense to it and it may act on it (throwing exception or logging) but it is up to it solely what it does with it. Username and password just happened to be the most commonly used options but they are "just options", as any other one and they should passed to that impl. > Add Plugin Support for CQLSH > > > Key: CASSANDRA-16456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16456 > Project: Cassandra > Issue Type: New Feature > Components: Tool/cqlsh >Reporter: Brian Houser >Assignee: Brian Houser >Priority: Normal > Labels: gsoc2021, mentor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently the Cassandra drivers offer a plugin authenticator architecture for > the support of different authentication methods. This has been leveraged to > provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, > cqlsh, the included CLI tool, does not offer such support. Switching to a new > enhanced authentication scheme thus means being cut off from using cqlsh in > normal operation. > We should have a means of using the same plugins and authentication providers > as the Python Cassandra driver. > Here's a link to an initial draft of > [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17537) nodetool compact should support using a key string to find the range to avoid operators having to manually do this
[ https://issues.apache.org/jira/browse/CASSANDRA-17537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526100#comment-17526100 ] David Capwell commented on CASSANDRA-17537: --- Starting commit CI Results (pending): ||Branch||Source||Circle CI||Jenkins|| |trunk|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-17537-trunk-A80909E5-5C23-42D0-A279-AFF09B8E92A0]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-17537-trunk-A80909E5-5C23-42D0-A279-AFF09B8E92A0]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/1625/]| > nodetool compact should support using a key string to find the range to avoid > operators having to manually do this > -- > > Key: CASSANDRA-17537 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17537 > Project: Cassandra > Issue Type: New Feature > Components: Local/Compaction, Tool/nodetool >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Time Spent: 1h > Remaining Estimate: 0h > > Its common that a single key needs to be compact, and operators need to do > the following > 1) go from key -> token > 2) generate range > 3) call nodetool compact with this range > We can simply this workflow by adding this to compact > nodetool compact ks.tbl -k “key1" -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16456) Add Plugin Support for CQLSH
[ https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526095#comment-17526095 ] Stefan Miklosovic commented on CASSANDRA-16456: --- The answer to your very last question is yes. Because you could have an auth_provider implementation which is still "username and password-based" but it may differ internally. But we should still pass username / password to it and it is up to an implementation if it uses these flags or it will just ignore them. The implementation may, for example, detect that username / password does not make any sense to it and it may act on it (throwing exception or logging) but it is up to it solely what it does with it. Username and password just happened to be the most commonly used options but they are "just options", as any other one and they should passed to that impl. > Add Plugin Support for CQLSH > > > Key: CASSANDRA-16456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16456 > Project: Cassandra > Issue Type: New Feature > Components: Tool/cqlsh >Reporter: Brian Houser >Assignee: Brian Houser >Priority: Normal > Labels: gsoc2021, mentor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently the Cassandra drivers offer a plugin authenticator architecture for > the support of different authentication methods. This has been leveraged to > provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, > cqlsh, the included CLI tool, does not offer such support. Switching to a new > enhanced authentication scheme thus means being cut off from using cqlsh in > normal operation. > We should have a means of using the same plugins and authentication providers > as the Python Cassandra driver. > Here's a link to an initial draft of > [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17560) Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long
[ https://issues.apache.org/jira/browse/CASSANDRA-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526093#comment-17526093 ] David Capwell commented on CASSANDRA-17560: --- jvm upgrade test failed due to CASSANDRA-16238 (race condition with fat client removal) which got fixed later on > Migrate track_warnings to more standard naming conventions and use latest > configuration types rather than long > -- > > Key: CASSANDRA-17560 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17560 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.1 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Track warnings currently is nested which is discouraged at the moment. It > also was before the config standards patch which moved storage typed longs to > a new DataStorageSpec type, we should migrate the configs there. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17560) Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long
[ https://issues.apache.org/jira/browse/CASSANDRA-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-17560: -- Source Control Link: https://github.com/apache/cassandra/commit/7db3285e7b745e591dc4c405ae9af6c1cddb0c79 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Migrate track_warnings to more standard naming conventions and use latest > configuration types rather than long > -- > > Key: CASSANDRA-17560 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17560 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.1 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Track warnings currently is nested which is discouraged at the moment. It > also was before the config standards patch which moved storage typed longs to > a new DataStorageSpec type, we should migrate the configs there. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated: Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long
This is an automated email from the ASF dual-hosted git repository. dcapwell pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/trunk by this push: new 7db3285e7b Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long 7db3285e7b is described below commit 7db3285e7b745e591dc4c405ae9af6c1cddb0c79 Author: David Capwell AuthorDate: Wed Apr 20 15:15:34 2022 -0700 Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long patch by David Capwell; reviewed by Andres de la Peña, Caleb Rackliffe for CASSANDRA-17560 --- CHANGES.txt| 1 + NEWS.txt | 18 ++-- build.xml | 4 +- conf/cassandra.yaml| 35 +++ ide/idea/workspace.xml | 6 +- src/java/org/apache/cassandra/config/Config.java | 8 +- .../cassandra/config/DatabaseDescriptor.java | 82 ++-- .../org/apache/cassandra/config/TrackWarnings.java | 108 .../org/apache/cassandra/cql3/QueryOptions.java| 75 +++--- .../cassandra/cql3/selection/ResultSetBuilder.java | 9 +- .../cassandra/cql3/statements/SelectStatement.java | 14 +-- src/java/org/apache/cassandra/db/ReadCommand.java | 24 ++--- .../org/apache/cassandra/db/RowIndexEntry.java | 31 +++--- ...=> RowIndexEntryReadSizeTooLargeException.java} | 4 +- .../exceptions/TombstoneAbortException.java| 2 +- src/java/org/apache/cassandra/net/ParamType.java | 8 +- .../apache/cassandra/service/StorageService.java | 83 +++- .../cassandra/service/StorageServiceMBean.java | 34 +++ .../cassandra/service/reads/ReadCallback.java | 6 +- .../CoordinatorWarnings.java | 9 +- .../WarnAbortCounter.java | 2 +- .../WarningContext.java| 22 ++--- .../WarningsSnapshot.java | 30 +++--- .../org/apache/cassandra/transport/Dispatcher.java | 2 +- test/conf/cassandra.yaml | 19 ++-- .../cassandra/distributed/impl/Coordinator.java| 2 +- .../cassandra/distributed/impl/Instance.java | 2 +- .../distributed/test/NativeMixedVersionTest.java | 7 +- .../AbstractClientSizeWarning.java | 6 +- .../CoordinatorReadSizeWarningTest.java| 7 +- .../LocalReadSizeWarningTest.java | 15 +-- .../RowIndexSizeWarningTest.java | 11 ++- .../TombstoneCountWarningTest.java | 6 +- .../cassandra/config/DatabaseDescriptorTest.java | 109 - .../config/YamlConfigurationLoaderTest.java| 54 +- .../WarningsSnapshotTest.java | 4 +- 36 files changed, 364 insertions(+), 495 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index 0f60d18244..5ab33a229b 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.1 + * Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long (CASSANDRA-17560) * Add support for CONTAINS and CONTAINS KEY in conditional UPDATE and DELETE statement (CASSANDRA-10537) * Migrate advanced config parameters to the new Config types (CASSANDRA-17431) * Make null to be meaning disabled and leave 0 as a valid value for permissions_update_interval, roles_update_interval, credentials_update_interval (CASSANDRA-17431) diff --git a/NEWS.txt b/NEWS.txt index 992c291115..7afac5e105 100644 --- a/NEWS.txt +++ b/NEWS.txt @@ -89,16 +89,16 @@ New features paxos_state_purging: repaired. Once this has been set across the cluster, users are encouraged to set their applications to supply a Commit consistency level of ANY with their LWT write operations, saving one additional WAN round-trip. See upgrade notes below. -- Warn/abort thresholds added to read queries notifying clients when these thresholds trigger (by - emitting a client warning or aborting the query). This feature is disabled by default, scheduled - to be enabled in 4.2; it is controlled with the configuration track_warnings.enabled, - setting to true will enable this feature. Each check has its own warn/abort thresholds, currently +- Warn/fail thresholds added to read queries notifying clients when these thresholds trigger (by + emitting a client warning or failing the query). This feature is disabled by default, scheduled + to be enabled in 4.2; it is controlled with the configuration read_thresholds_enabled, + setting to true will enable this feature. Each check has its own warn/fail thresholds, currently tombstones (tombston
[cassandra-website] branch asf-staging updated (924e389d -> 015350e5)
This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a change to branch asf-staging in repository https://gitbox.apache.org/repos/asf/cassandra-website.git omit 924e389d generate docs for 8fd077a6 new 015350e5 generate docs for 8fd077a6 This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (924e389d) \ N -- N -- N refs/heads/asf-staging (015350e5) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: content/doc/4.1/cassandra/cql/cql_singlefile.html | 2 ++ .../doc/latest/cassandra/cql/cql_singlefile.html | 2 ++ .../doc/trunk/cassandra/cql/cql_singlefile.html| 2 ++ content/search-index.js| 2 +- site-ui/build/ui-bundle.zip| Bin 4740078 -> 4740078 bytes 5 files changed, 7 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication
[ https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525961#comment-17525961 ] Maulin Vasavada edited comment on CASSANDRA-17513 at 4/21/22 8:17 PM: -- Thank you [~djoshi] for considering the suggestion for the ticket title. I've thought about it (little experimented also) and talked to some more 'security' experts and I agree with the approach to have a separate keystore for client vs server certs for internode connections in case we need to have client auth enabled. While Java keystores provide ability to store multiple keys in it, for variety of reasons (some of which you already mentioned in your lastest comment) it makes sense to keep client vs server keys separate. Given that we would need a different keystore for client TLS auth for the internode connection, what if somebody wants to use the same certs for client as well as server auth? -Would they be required to copy it to a separate keystore OR the code changes would have a fallback when the 'outbound keystore' (how current PR refers to) is not configured?- I realized that they can configure the same path for the 'outbound keystore' in that case. was (Author: maulin.vasavada): Thank you [~djoshi] for considering the suggestion for the ticket title. I've thought about it (little experimented also) and talked to some of the more 'security' experts and I agree with the approach to have a separate keystore for client vs server certs for internode connections in case we need to have client auth enabled. While Java keystores provide ability to store multiple keys in it, for variety of reasons (some of which you already mentioned in your lastest comment) it makes sense to keep client vs server keys separate. Given that we would need a different keystore for client TLS auth for the internode connection, what if somebody wants to use the same certs for client as well as server auth? -Would they be required to copy it to a separate keystore OR the code changes would have a fallback when the 'outbound keystore' (how current PR refers to) is not configured?- I realized that they can configure the same path for the 'outbound keystore' in that case. > Adding support for TLS client authentication for internode communication > > > Key: CASSANDRA-17513 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17513 > Project: Cassandra > Issue Type: Bug >Reporter: Jyothsna Konisa >Assignee: Jyothsna Konisa >Priority: Normal > Time Spent: 1.5h > Remaining Estimate: 0h > > Same keystore is being set for both Inbound and outbound connections but we > should use a keystore with server certificate for Inbound connections and a > keystore with client certificates for outbound connections. So we should add > a new property in Cassandra.yaml to pass outbound keystore and use it in > SSLContextFactory for creating outbound SSL context. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17563) Fix CircleCI Midres config
[ https://issues.apache.org/jira/browse/CASSANDRA-17563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526048#comment-17526048 ] Ekaterina Dimitrova commented on CASSANDRA-17563: - Thanks, so now everything will be the same in the setup, [~dcapwell] only adds a tool to help to create new patches with minimum efforts. If someone prefers to do it manually - that is fine. In all cases we need to verify after that the generated MIDRES and HIGHRES files have exactly what we want for "happy" CI :D The only thing to be mentioned is that the tool might rearrange the attributes, so if someone feels bad about that, they can just use the old way and skip seeing those rearrangements being just additional noise. > Fix CircleCI Midres config > -- > > Key: CASSANDRA-17563 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17563 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1 > > > During CircleCI addition of a new job to the config, the midres file got > messy. Two of the immediate issues (but we need to verify all jobs will use > the right executors and resources): > * the new job needs to use higher parallelism as the original in-jvm job > * j8_dtests_with_vnodes should get from midres 50 large but currently > midres makes it run with 25 and medium which fails around 100 tests -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17500) Create Maximum Keyspace Replication Factor Guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526046#comment-17526046 ] Savni Nagarkar commented on CASSANDRA-17500: [~adelapena] I like the proposed approach better than using thread local, I added your changes to the current branch and the pull request is [here|https://github.com/apache/cassandra/pull/1582]. I am working on replicating the changes for minimum_keyspace_rf right now. > Create Maximum Keyspace Replication Factor Guardrail > - > > Key: CASSANDRA-17500 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17500 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Guardrails >Reporter: Savni Nagarkar >Assignee: Savni Nagarkar >Priority: Normal > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket adds a maximum replication factor guardrail to ensure safety when > creating or altering key spaces. The replication factor will be applied per > data center. The ticket was prompted as a user set the replication factor > equal to the number of nodes in the cluster. The property will be added to > guardrails to ensure consistency. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17500) Create Maximum Keyspace Replication Factor Guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Savni Nagarkar updated CASSANDRA-17500: --- Test and Documentation Plan: https://github.com/apache/cassandra/pull/1582 was: [https://github.com/apache/cassandra/pull/1582|https://github.com/apache/cassandra/pull/1534] > Create Maximum Keyspace Replication Factor Guardrail > - > > Key: CASSANDRA-17500 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17500 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Guardrails >Reporter: Savni Nagarkar >Assignee: Savni Nagarkar >Priority: Normal > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket adds a maximum replication factor guardrail to ensure safety when > creating or altering key spaces. The replication factor will be applied per > data center. The ticket was prompted as a user set the replication factor > equal to the number of nodes in the cluster. The property will be added to > guardrails to ensure consistency. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17500) Create Maximum Keyspace Replication Factor Guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Savni Nagarkar updated CASSANDRA-17500: --- Test and Documentation Plan: [https://github.com/apache/cassandra/pull/1582|https://github.com/apache/cassandra/pull/1534] was:https://github.com/apache/cassandra/pull/1534 > Create Maximum Keyspace Replication Factor Guardrail > - > > Key: CASSANDRA-17500 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17500 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Guardrails >Reporter: Savni Nagarkar >Assignee: Savni Nagarkar >Priority: Normal > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket adds a maximum replication factor guardrail to ensure safety when > creating or altering key spaces. The replication factor will be applied per > data center. The ticket was prompted as a user set the replication factor > equal to the number of nodes in the cluster. The property will be added to > guardrails to ensure consistency. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17563) Fix CircleCI Midres config
[ https://issues.apache.org/jira/browse/CASSANDRA-17563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526016#comment-17526016 ] David Capwell commented on CASSANDRA-17563: --- Speaking with [~e.dimitrova] in slack, I moved the patch to not touch generate.sh and instead the scripts are used to create the patches (created script to create the patches and updated docs to show how) > Fix CircleCI Midres config > -- > > Key: CASSANDRA-17563 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17563 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1 > > > During CircleCI addition of a new job to the config, the midres file got > messy. Two of the immediate issues (but we need to verify all jobs will use > the right executors and resources): > * the new job needs to use higher parallelism as the original in-jvm job > * j8_dtests_with_vnodes should get from midres 50 large but currently > midres makes it run with 25 and medium which fails around 100 tests -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17556) jackson-databind 2.13.2 is vulnerable to CVE-2020-36518
[ https://issues.apache.org/jira/browse/CASSANDRA-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-17556: - Fix Version/s: 3.11.13 4.1 4.0.4 (was: 4.x) (was: 3.11.x) (was: 4.0.x) > jackson-databind 2.13.2 is vulnerable to CVE-2020-36518 > --- > > Key: CASSANDRA-17556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17556 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 3.11.13, 4.1, 4.0.4 > > > Seems like it's technically possible to cause a DoS with nested json. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17568) Implement nodetool command to list data directories of existing tables
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525998#comment-17525998 ] Tibor Repasi commented on CASSANDRA-17568: -- Thanks [~brandon.williams]. I've reverted it. I'm afraid, that current state is how far this improvement can go for now. > Implement nodetool command to list data directories of existing tables > -- > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > Time Spent: 9h 10m > Remaining Estimate: 0h > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17568) Implement nodetool command to list data directories of existing tables
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525991#comment-17525991 ] Brandon Williams commented on CASSANDRA-17568: -- bq. nodetool is a tool which is intended to interact with a Cassandra process via JMX Indeed, and why this approach won't work, nodetool won't necessarily be run from the same machine. The server needs to do the work and return the result via JMX. > Implement nodetool command to list data directories of existing tables > -- > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > Time Spent: 9h > Remaining Estimate: 0h > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17062) Expose Auth Caches metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525983#comment-17525983 ] Sam Tunnicliffe commented on CASSANDRA-17062: - Apologies for the long delay [~azotcsit]... {quote}I find "conditional MBean attributes" (meaning they can be populated conditionally or treated differently) to be very confusing. So I think having different entities for different MBeans (CacheMetrics and UnweightedCacheMetrics) is smth clearer to the end user. WDYT? {quote} Fair enough, I take that point. Thinking about it a bit more, I think my main issue is this: I know it's perfectly legal, but I find the hiding of methods by overloading in the {{UnweightedCacheSize/CacheSize}} and {{UnweightedCacheMetrics/CacheMetrics}} hierarchies somewhat unintuitive. I've tried an alternative approach of adding an abstract base class for cache metrics. This way, the two classes of caches can track and expose the particular metrics that are relevant to them, capacity & size in bytes for weighted and max entries & entries for unweighted, without any overloading or hiding. I've pushed that [here|https://github.com/beobal/cassandra/tree/samt/17062-trunk-rebase] on top of a rebase on trunk. A few things had changed in the course of the docs migration to the {{adoc}} format, plus CASSANDRA-16958. Let me know what you think. > Expose Auth Caches metrics > -- > > Key: CASSANDRA-17062 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17062 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Virtual Tables, Observability/Metrics, > Tool/nodetool >Reporter: Aleksei Zotov >Assignee: Aleksei Zotov >Priority: Normal > Fix For: 4.x > > > Unlike to other caches (row, key, counter), Auth Caches lack some monitoring > capabilities. Here are a few particular changes to get this inequity fixed: > # Add auth caches to _system_views.caches_ VT > # Expose auth caches metrics via JMX > # Add auth caches details to _nodetool info_ > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication
[ https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525974#comment-17525974 ] Maulin Vasavada edited comment on CASSANDRA-17513 at 4/21/22 6:50 PM: -- {code:java} I am open to considering implementing this idea if we don't force operators to explicitly a single store file i.e. maintain backward compatibility with what we have. However, it feels like this should be out of scope here and we can create a separate ticket to address it across both native and internode configurations {code} On the above quote, if I understand you correctly- you are suggesting that somebody can work on a separate ticket to support having client/server keys in the same keystore (in case anybody needs it)? If my understanding is correct- then yes I agree that it should be a separate concern out of the scope of this ticket. was (Author: maulin.vasavada): {code:java} I am open to considering implementing this idea if we don't force operators to explicitly a single store file i.e. maintain backward compatibility with what we have. However, it feels like this should be out of scope here and we can create a separate ticket to address it across both native and internode configurations {code} On the above quote, if I understand you correctly- you are suggesting that somebody can work on a separate ticket to support having client/server keys in the same keystore (in case anybody needs it)? If my understand is correct- then yes I agree that it should be a separate concern out of the scope of this ticket. > Adding support for TLS client authentication for internode communication > > > Key: CASSANDRA-17513 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17513 > Project: Cassandra > Issue Type: Bug >Reporter: Jyothsna Konisa >Assignee: Jyothsna Konisa >Priority: Normal > Time Spent: 1h 20m > Remaining Estimate: 0h > > Same keystore is being set for both Inbound and outbound connections but we > should use a keystore with server certificate for Inbound connections and a > keystore with client certificates for outbound connections. So we should add > a new property in Cassandra.yaml to pass outbound keystore and use it in > SSLContextFactory for creating outbound SSL context. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication
[ https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525974#comment-17525974 ] Maulin Vasavada commented on CASSANDRA-17513: - {code:java} I am open to considering implementing this idea if we don't force operators to explicitly a single store file i.e. maintain backward compatibility with what we have. However, it feels like this should be out of scope here and we can create a separate ticket to address it across both native and internode configurations {code} On the above quote, if I understand you correctly- you are suggesting that somebody can work on a separate ticket to support having client/server keys in the same keystore (in case anybody needs it)? If my understand is correct- then yes I agree that it should be a separate concern out of the scope of this ticket. > Adding support for TLS client authentication for internode communication > > > Key: CASSANDRA-17513 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17513 > Project: Cassandra > Issue Type: Bug >Reporter: Jyothsna Konisa >Assignee: Jyothsna Konisa >Priority: Normal > Time Spent: 1h 20m > Remaining Estimate: 0h > > Same keystore is being set for both Inbound and outbound connections but we > should use a keystore with server certificate for Inbound connections and a > keystore with client certificates for outbound connections. So we should add > a new property in Cassandra.yaml to pass outbound keystore and use it in > SSLContextFactory for creating outbound SSL context. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17568) Implement nodetool command to list data directories of existing tables
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525975#comment-17525975 ] Tibor Repasi commented on CASSANDRA-17568: -- With this [commit|https://github.com/apache/cassandra/pull/1580/commits/a759f9cb65bbd0a4620bcc7c6442a14e41507dd8] I've added a raw implementation of an {{--list-orphans}} option which is traversing all {{data_file_directories}} recursively in a depth of 2 and listing all paths which are not known to be used for tables. However, it does correctly list empty keyspace directories and dropped tables, I have some objections: # there is a {{system/_paxos_repair_state}} directory (I'm not familiar with) which is always listed; probably we would need a static exclude list # nodetool is a tool which is intended to interact with a Cassandra process via JMX, this feature is interacting primarily with the filesystem Therefore I don't really like this feature, it feels wrong, I would not unhappy to revert it. > Implement nodetool command to list data directories of existing tables > -- > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > Time Spent: 9h > Remaining Estimate: 0h > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17573) Fix test org.apache.cassandra.distributed.test.PaxosRepairTest#paxosRepairVersionGate
[ https://issues.apache.org/jira/browse/CASSANDRA-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-17573: -- Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Normal Discovered By: Unit Test Fix Version/s: 4.1 Severity: Normal Status: Open (was: Triage Needed) marking 4.1 to be addressed before we release 4.1 (can't release with flaky tests) > Fix test > org.apache.cassandra.distributed.test.PaxosRepairTest#paxosRepairVersionGate > - > > Key: CASSANDRA-17573 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17573 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Feature/Lightweight Transactions, > Test/dtest/java >Reporter: David Capwell >Priority: Normal > Fix For: 4.1 > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/1381/workflows/4a4e6100-6582-43fd-8f39-6b3cbb5a94b6/jobs/11282/tests#failed-test-0 > {code} > junit.framework.AssertionFailedError: Repair failed with errors: [Repair > session aa00ae00-c192-11ec-89f5-d521036fedec for range [(00c8,012c], > (0064,00c8], (012c,0064]] failed with error Paxos cleanup > session a1fe1fea-7522-47ec-879a-7f2e6cc592ad failed on /127.0.0.3:7012 with > message: Unsupported peer versions for a6404aa0-c192-11ec-89f5-d521036fedec > [(00c8,012c], (0064,00c8], (012c,0064]], Repair > command #3 finished with error] > at > org.apache.cassandra.distributed.test.PaxosRepairTest.lambda$repair$54f7d7c2$1(PaxosRepairTest.java:189) > at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81) > at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47) > at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-17573) Fix test org.apache.cassandra.distributed.test.PaxosRepairTest#paxosRepairVersionGate
David Capwell created CASSANDRA-17573: - Summary: Fix test org.apache.cassandra.distributed.test.PaxosRepairTest#paxosRepairVersionGate Key: CASSANDRA-17573 URL: https://issues.apache.org/jira/browse/CASSANDRA-17573 Project: Cassandra Issue Type: Bug Components: Consistency/Repair, Feature/Lightweight Transactions, Test/dtest/java Reporter: David Capwell https://app.circleci.com/pipelines/github/dcapwell/cassandra/1381/workflows/4a4e6100-6582-43fd-8f39-6b3cbb5a94b6/jobs/11282/tests#failed-test-0 {code} junit.framework.AssertionFailedError: Repair failed with errors: [Repair session aa00ae00-c192-11ec-89f5-d521036fedec for range [(00c8,012c], (0064,00c8], (012c,0064]] failed with error Paxos cleanup session a1fe1fea-7522-47ec-879a-7f2e6cc592ad failed on /127.0.0.3:7012 with message: Unsupported peer versions for a6404aa0-c192-11ec-89f5-d521036fedec [(00c8,012c], (0064,00c8], (012c,0064]], Repair command #3 finished with error] at org.apache.cassandra.distributed.test.PaxosRepairTest.lambda$repair$54f7d7c2$1(PaxosRepairTest.java:189) at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81) at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47) at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:834) {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication
[ https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525969#comment-17525969 ] Maulin Vasavada commented on CASSANDRA-17513: - +1 from my side. Thanks for your patience. > Adding support for TLS client authentication for internode communication > > > Key: CASSANDRA-17513 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17513 > Project: Cassandra > Issue Type: Bug >Reporter: Jyothsna Konisa >Assignee: Jyothsna Konisa >Priority: Normal > Time Spent: 1h 20m > Remaining Estimate: 0h > > Same keystore is being set for both Inbound and outbound connections but we > should use a keystore with server certificate for Inbound connections and a > keystore with client certificates for outbound connections. So we should add > a new property in Cassandra.yaml to pass outbound keystore and use it in > SSLContextFactory for creating outbound SSL context. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17560) Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long
[ https://issues.apache.org/jira/browse/CASSANDRA-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525967#comment-17525967 ] David Capwell commented on CASSANDRA-17560: --- found out the issue for the python upgrade test failures, CASSANDRA-10537 made a change to both python-dtest and trunk, and when this CI run was kicked off that patch wasn't in trunk, but it picked up the python-dtest changed, which caused this error Rebased again and trying 1 more time > Migrate track_warnings to more standard naming conventions and use latest > configuration types rather than long > -- > > Key: CASSANDRA-17560 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17560 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.1 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Track warnings currently is nested which is discouraged at the moment. It > also was before the config standards patch which moved storage typed longs to > a new DataStorageSpec type, we should migrate the configs there. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication
[ https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525961#comment-17525961 ] Maulin Vasavada edited comment on CASSANDRA-17513 at 4/21/22 6:38 PM: -- Thank you [~djoshi] for considering the suggestion for the ticket title. I've thought about it (little experimented also) and talked to some of the more 'security' experts and I agree with the approach to have a separate keystore for client vs server certs for internode connections in case we need to have client auth enabled. While Java keystores provide ability to store multiple keys in it, for variety of reasons (some of which you already mentioned in your lastest comment) it makes sense to keep client vs server keys separate. Given that we would need a different keystore for client TLS auth for the internode connection, what if somebody wants to use the same certs for client as well as server auth? -Would they be required to copy it to a separate keystore OR the code changes would have a fallback when the 'outbound keystore' (how current PR refers to) is not configured?- I realized that they can configure the same path for the 'outbound keystore' in that case. was (Author: maulin.vasavada): Thank you [~djoshi] for considering the suggestion for the ticket title. I've thought about it (little experimented also) and talked to some of the more 'security' experts and I agree with the approach to have a separate keystore for client vs server certs for internode connections in case we need to have client auth enabled. While Java keystores provide ability to store multiple keys in it, for variety of reasons (some of which you already mentioned in your lastest comment) it makes sense to keep client vs server keys separate. Given that we would need a different keystore for client TLS auth for the internode connection, what if somebody wants to use the same certs for client as well as server auth? Would they be required to copy it to a separate keystore OR the code changes would have a fallback when the 'outbound keystore' (how current PR refers to) is not configured? > Adding support for TLS client authentication for internode communication > > > Key: CASSANDRA-17513 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17513 > Project: Cassandra > Issue Type: Bug >Reporter: Jyothsna Konisa >Assignee: Jyothsna Konisa >Priority: Normal > Time Spent: 1h 20m > Remaining Estimate: 0h > > Same keystore is being set for both Inbound and outbound connections but we > should use a keystore with server certificate for Inbound connections and a > keystore with client certificates for outbound connections. So we should add > a new property in Cassandra.yaml to pass outbound keystore and use it in > SSLContextFactory for creating outbound SSL context. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication
[ https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525961#comment-17525961 ] Maulin Vasavada edited comment on CASSANDRA-17513 at 4/21/22 6:34 PM: -- Thank you [~djoshi] for considering the suggestion for the ticket title. I've thought about it (little experimented also) and talked to some of the more 'security' experts and I agree with the approach to have a separate keystore for client vs server certs for internode connections in case we need to have client auth enabled. While Java keystores provide ability to store multiple keys in it, for variety of reasons (some of which you already mentioned in your lastest comment) it makes sense to keep client vs server keys separate. Given that we would need a different keystore for client TLS auth for the internode connection, what if somebody wants to use the same certs for client as well as server auth? Would they be required to copy it to a separate keystore OR the code changes would have a fallback when the 'outbound keystore' (how current PR refers to) is not configured? was (Author: maulin.vasavada): Thank you [~djoshi] for considering the suggestion for the ticket title. I've thought about it (little experimented also) and talked to some of the more security experts and I agree with the approach to have a separate keystore for client vs server certs for internode connections in case we need to have client auth enabled. While Java keystores provide ability to store multiple keys in it, for variety of reasons (some of which you already mentioned in your lastest comment) it makes sense to keep client vs server keys separate. Given that we would need a different keystore for client TLS auth for the internode connection, what if somebody wants to use the same certs for client as well as server auth? Would they be required to copy it to a separate keystore OR the code changes would have a fallback when the 'outbound keystore' (how current PR refers to) is not configured? > Adding support for TLS client authentication for internode communication > > > Key: CASSANDRA-17513 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17513 > Project: Cassandra > Issue Type: Bug >Reporter: Jyothsna Konisa >Assignee: Jyothsna Konisa >Priority: Normal > Time Spent: 1h 20m > Remaining Estimate: 0h > > Same keystore is being set for both Inbound and outbound connections but we > should use a keystore with server certificate for Inbound connections and a > keystore with client certificates for outbound connections. So we should add > a new property in Cassandra.yaml to pass outbound keystore and use it in > SSLContextFactory for creating outbound SSL context. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication
[ https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525961#comment-17525961 ] Maulin Vasavada commented on CASSANDRA-17513: - Thank you [~djoshi] for considering the suggestion for the ticket title. I've thought about it (little experimented also) and talked to some of the more security experts and I agree with the approach to have a separate keystore for client vs server certs for internode connections in case we need to have client auth enabled. While Java keystores provide ability to store multiple keys in it, for variety of reasons (some of which you already mentioned in your lastest comment) it makes sense to keep client vs server keys separate. Given that we would need a different keystore for client TLS auth for the internode connection, what if somebody wants to use the same certs for client as well as server auth? Would they be required to copy it to a separate keystore OR the code changes would have a fallback when the 'outbound keystore' (how current PR refers to) is not configured? > Adding support for TLS client authentication for internode communication > > > Key: CASSANDRA-17513 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17513 > Project: Cassandra > Issue Type: Bug >Reporter: Jyothsna Konisa >Assignee: Jyothsna Konisa >Priority: Normal > Time Spent: 1h 20m > Remaining Estimate: 0h > > Same keystore is being set for both Inbound and outbound connections but we > should use a keystore with server certificate for Inbound connections and a > keystore with client certificates for outbound connections. So we should add > a new property in Cassandra.yaml to pass outbound keystore and use it in > SSLContextFactory for creating outbound SSL context. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17560) Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long
[ https://issues.apache.org/jira/browse/CASSANDRA-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525959#comment-17525959 ] David Capwell commented on CASSANDRA-17560: --- CI Results: j8_unit: * org.apache.cassandra.db.commitlog.CommitLogSegmentManagerCDCTest and org.apache.cassandra.dht.tokenallocator.OfflineTokenAllocatorTest both timeout in the JVM and didn't show up in test results... j11_jvm_dtest: org.apache.cassandra.distributed.test.PaxosRepairTest#paxosRepairVersionGate first time failing according to butler and we ran this test 4 times (j8/j11 w/ and w/o vnode) and only once did it fail, so feels flaky https://app.circleci.com/pipelines/github/dcapwell/cassandra/1381/workflows/4a4e6100-6582-43fd-8f39-6b3cbb5a94b6/jobs/11282/tests#failed-test-0 . Creating a ticket for this python upgrade looks to be failing to CASSANDRA-17451 and a CQL parser issue... need to look into this as [~brandon.williams] was saying that the timeout is 17451 but the CQL parser issue is new... so holding off the merge > Migrate track_warnings to more standard naming conventions and use latest > configuration types rather than long > -- > > Key: CASSANDRA-17560 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17560 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.1 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Track warnings currently is nested which is discouraged at the moment. It > also was before the config standards patch which moved storage typed longs to > a new DataStorageSpec type, we should migrate the configs there. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16456) Add Plugin Support for CQLSH
[ https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525958#comment-17525958 ] Brian Houser commented on CASSANDRA-16456: -- > Sorry, I am not getting this. I am not sure how it is done exactly on the > code level right at the moment but I would say that this should be pretty > transparent? Whatever properties there are specified in auth_provider, they > are taken into account and then they are eventually replaced by whatever is > in credentials. If there is a username property both in auth_provider section > in cqlshrc and in the related section in credentials, the property in > credentials overwrites / has precedence / shadows the one in cqlshrc. Basically right now if you have an Auth_provider specified (other than PlainTextAuthProvider), but specify username or password on the command line, it will override the custom loading and return PlainTextAuthProvider with the given username and password. This seemed to fit the original use case the best and be what the documentation was guaranteeing, particularly as there was no way to override the authprovider from the command line. Would you rather I just pass the username and password to whatever auth_provider is indicated, and if its not indicated just default to the PlainTextAuthProvider? > Add Plugin Support for CQLSH > > > Key: CASSANDRA-16456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16456 > Project: Cassandra > Issue Type: New Feature > Components: Tool/cqlsh >Reporter: Brian Houser >Assignee: Brian Houser >Priority: Normal > Labels: gsoc2021, mentor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently the Cassandra drivers offer a plugin authenticator architecture for > the support of different authentication methods. This has been leveraged to > provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, > cqlsh, the included CLI tool, does not offer such support. Switching to a new > enhanced authentication scheme thus means being cut off from using cqlsh in > normal operation. > We should have a means of using the same plugins and authentication providers > as the Python Cassandra driver. > Here's a link to an initial draft of > [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16456) Add Plugin Support for CQLSH
[ https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525925#comment-17525925 ] Stefan Miklosovic edited comment on CASSANDRA-16456 at 4/21/22 6:19 PM: _FooAuthProvider would get called with the name prop1, prop2. Notice that if there is no auth_provider section in cqlshrc file specifying what you want to load... the credentials file won't find any properties. You need to specify an auth_provider to use the "new school" way of loading the credentials file._ This in general makes sense, but as I look at it, when there is no auth_provider, there is still PlainTextAuthProvider implicitly. That provider is _default._ So even I do not have anything in cqlshrc in auth_provider, imagine there still is one, the plaintext one. Hence it will see the stuff in credentials file based in [PlainTextAuthProvider] section. _It seems you want it to default to PlainTextAuthProvider in all cases when auth provider isn't specified ..._ Exactly, yes, please. _If a provider happens to use a property called 'username' with the fix you propose, I'll end up loading the plaintextauth provider instead of the one specified, which would be pretty confusing._ Sorry, I am not getting this. I am not sure how it is done exactly on the code level right at the moment but I would say that this should be pretty transparent? Whatever properties there are specified in auth_provider, they are taken into account and then they are eventually replaced by whatever is in credentials. If there is a username property both in auth_provider section in cqlshrc and in the related section in credentials, the property in credentials overwrites / has precedence / shadows the one in cqlshrc. was (Author: smiklosovic): _FooAuthProvider would get called with the name prop1, prop2. Notice that if there is no auth_provider section in cqlshrc file specifying what you want to load... the credentials file won't find any properties. You need to specify an auth_provider to use the "new school" way of loading the credentials file._ This in general makes sense, but as I look at it, when there is no auth_provider, there is still PlainTextAuthProvider implicitly. That provider is _default._ So even I do not have anything in cqlshrc in auth_provider, imagine there still is one, the plaintext one. Hence it will see the stuff in credentials file based in [PlainTextAuthProvider] section. _It seems you want it to default to PlainTextAuthProvider in all cases when auth provider isn't specified ..._ Exactly, yes, please. > Add Plugin Support for CQLSH > > > Key: CASSANDRA-16456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16456 > Project: Cassandra > Issue Type: New Feature > Components: Tool/cqlsh >Reporter: Brian Houser >Assignee: Brian Houser >Priority: Normal > Labels: gsoc2021, mentor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently the Cassandra drivers offer a plugin authenticator architecture for > the support of different authentication methods. This has been leveraged to > provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, > cqlsh, the included CLI tool, does not offer such support. Switching to a new > enhanced authentication scheme thus means being cut off from using cqlsh in > normal operation. > We should have a means of using the same plugins and authentication providers > as the Python Cassandra driver. > Here's a link to an initial draft of > [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16456) Add Plugin Support for CQLSH
[ https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525925#comment-17525925 ] Stefan Miklosovic edited comment on CASSANDRA-16456 at 4/21/22 6:12 PM: _FooAuthProvider would get called with the name prop1, prop2. Notice that if there is no auth_provider section in cqlshrc file specifying what you want to load... the credentials file won't find any properties. You need to specify an auth_provider to use the "new school" way of loading the credentials file._ This in general makes sense, but as I look at it, when there is no auth_provider, there is still PlainTextAuthProvider implicitly. That provider is _default._ So even I do not have anything in cqlshrc in auth_provider, imagine there still is one, the plaintext one. Hence it will see the stuff in credentials file based in [PlainTextAuthProvider] section. _It seems you want it to default to PlainTextAuthProvider in all cases when auth provider isn't specified ..._ Exactly, yes, please. was (Author: smiklosovic): _FooAuthProvider would get called with the name prop1, prop2. Notice that if there is no auth_provider section in cqlshrc file specifying what you want to load... the credentials file won't find any properties. You need to specify an auth_provider to use the "new school" way of loading the credentials file._ This in general makes sense, but as I look at it, when there is no auth_provider, there is still PlainTextAuthProvider implicitly. That provider is _default._ So even I do not have anything in cqlshrc in auth_provider, imagine there still is one, the plaintext one. Hence it will see the stuff in credentials file based in [PlainTextAuthProvider] section. _It seems you want it to default to PlainTextAuthProvider in all cases when auth provider isn't specified ..._ Exactly, yes, please. > Add Plugin Support for CQLSH > > > Key: CASSANDRA-16456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16456 > Project: Cassandra > Issue Type: New Feature > Components: Tool/cqlsh >Reporter: Brian Houser >Assignee: Brian Houser >Priority: Normal > Labels: gsoc2021, mentor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently the Cassandra drivers offer a plugin authenticator architecture for > the support of different authentication methods. This has been leveraged to > provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, > cqlsh, the included CLI tool, does not offer such support. Switching to a new > enhanced authentication scheme thus means being cut off from using cqlsh in > normal operation. > We should have a means of using the same plugins and authentication providers > as the Python Cassandra driver. > Here's a link to an initial draft of > [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16456) Add Plugin Support for CQLSH
[ https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525925#comment-17525925 ] Stefan Miklosovic commented on CASSANDRA-16456: --- _FooAuthProvider would get called with the name prop1, prop2. Notice that if there is no auth_provider section in cqlshrc file specifying what you want to load... the credentials file won't find any properties. You need to specify an auth_provider to use the "new school" way of loading the credentials file._ This in general makes sense, but as I look at it, when there is no auth_provider, there is still PlainTextAuthProvider implicitly. That provider is _default._ So even I do not have anything in cqlshrc in auth_provider, imagine there still is one, the plaintext one. Hence it will see the stuff in credentials file based in [PlainTextAuthProvider] section. _It seems you want it to default to PlainTextAuthProvider in all cases when auth provider isn't specified ..._ Exactly, yes, please. > Add Plugin Support for CQLSH > > > Key: CASSANDRA-16456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16456 > Project: Cassandra > Issue Type: New Feature > Components: Tool/cqlsh >Reporter: Brian Houser >Assignee: Brian Houser >Priority: Normal > Labels: gsoc2021, mentor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently the Cassandra drivers offer a plugin authenticator architecture for > the support of different authentication methods. This has been leveraged to > provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, > cqlsh, the included CLI tool, does not offer such support. Switching to a new > enhanced authentication scheme thus means being cut off from using cqlsh in > normal operation. > We should have a means of using the same plugins and authentication providers > as the Python Cassandra driver. > Here's a link to an initial draft of > [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17500) Create Maximum Keyspace Replication Factor Guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525913#comment-17525913 ] Andres de la Peña commented on CASSANDRA-17500: --- [~savni_nagarkar] [~dcapwell] regarding passing the client state, I guess we could do something more or less [like this|https://github.com/adelapena/cassandra/commit/d1bddfa54cf430b4f836bcdcdbd5e4b3e9b33b4e], trying to keep the compatibility of 3rd party implementations of {{AbstractReplicationStrategy}}, if any. Nevertheless, I think we should start by migrating the min RF to guardrails (CASSANDRA-17212) before adding the max RF, so we don't have two separate approaches and config formats for min and max. Also, {{minimum_keyspace_rf}} is only on trunk, so if we are going to migrate it to guardrails it would be ideal to do it as soon as possible so we don't have to deprecate it later. wdyt? > Create Maximum Keyspace Replication Factor Guardrail > - > > Key: CASSANDRA-17500 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17500 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Guardrails >Reporter: Savni Nagarkar >Assignee: Savni Nagarkar >Priority: Normal > Fix For: 4.x > > > This ticket adds a maximum replication factor guardrail to ensure safety when > creating or altering key spaces. The replication factor will be applied per > data center. The ticket was prompted as a user set the replication factor > equal to the number of nodes in the cluster. The property will be added to > guardrails to ensure consistency. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy
[ https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525871#comment-17525871 ] Benedict Elliott Smith edited comment on CASSANDRA-17519 at 4/21/22 5:53 PM: - I suspect two things: 1) When originally written, this code depended on the assumption that there was mutual exclusion when creating one of these tidy objects, or that they were only created once, and that assumption was later broken (or perhaps was always false); 2) A variant of this race condition was encountered by the simulator when validating Paxos, and I “fixed” it without paying much attention to get things moving (perhaps without even intending to properly fix it at the time, as there was too much to do), and then forgot about it. I'll try to find time to perform a proper analysis of your report and the wider problems. was (Author: benedict): I suspect two things: 1) When originally written, this code depended on the assumption that there was mutual exclusion when creating one of these tidy objects, and that assumption was later broken (or perhaps was always false); 2) A variant of this race condition was encountered by the simulator when validating Paxos, and I fixed it without paying much attention to get things moving (perhaps without even intending to properly fix it at the time, as there was too much to do), and then forgot about it. I'll try to find time to perform a proper analysis of your report and the wider problems. > races/leaks in SSTableReader::GlobalTidy > > > Key: CASSANDRA-17519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17519 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, > CASSANDRA-17519-4.1-test-exposing-the-problem.txt > > > In Cassandra 4.0/3.11 there are at least two races in > SSTableReader::GlobalTidy > One is a get/get race, explicitly handled as an assertion in: > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204] > and it looks like "ok, it's a problem, but let's just not fix it" > The other one is get/tidy race between > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196] > and > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175] > > The second one can be easily hit by adding a small delay at the beginning of > `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually > such failure is what prompted the investigation of GlobalTidy correctness) > There was an attempt on `trunk` to fix these two races. > The details are not clear to me, and it all looks quite weird. I might be > mistaken, but as far as I can see the relevant changes were introduced in: > [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490] > that is piggybacked on a huge change in CASSANDRA-17008, without a separate > ticket or any sort of qa. > As far as I can see this attempt changes the first race into a leak, and the > second race to another race, this time allowing to have multiple GlobalTidy > objects for the same sstable (and, as a result, a premature running of > obsoletion code). > I'll follow up with PRs for relevant branches etc etc -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node
[ https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-17572: - Fix Version/s: (was: 3.0.x) (was: 3.11.x) (was: 4.0.x) > Race condition when IP address changes for a node can cause reads/writes to > route to the wrong node > --- > > Key: CASSANDRA-17572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17572 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Sam Kramer >Priority: Normal > Fix For: 4.x > > > Hi, > We noticed that there is a race condition present in the trunk of 3.x code, > and confirmed that it’s there in 4.x as well, which will result in incorrect > reads, and missed writes, for a very short period of time. > What brought the race condition to our attention was due to the fact we > started noticing a couple of missed writes for our Cassandra clusters in > Kubernetes. We found the Kubernetes piece interesting, as IP changes are very > frequent as opposed to a traditional setup. > More concretely: > # When a Cassandra node is turned off, and then starts with a new IP address > Z (former IP address X), it announces to the cluster (via gossip) it has IP Z > for Host ID Y > # If there are no conflicts, each node will decide to remove the old IP > address associated with Host ID Y > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) > from the storage ring. This also causes us to invalidate our token ring > cache > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] > ). > # At this time, a new request could come in (read or write), and will > re-calculate which endpoints to send the request to, as we’ve invalidated our > token ring cache > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). > # However, at this time we’ve only removed the IP address X (former IP > address), and have not re-added IP address Z. > # As a result, we will choose a new host to route our request to. In our > case, our keyspaces all run with NetworkTopologyStrategy, and so we simply > choose the node with the next closest token in the same rack as host Y > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]). > # Thus, the request is routed to a _different_ host, rather than the host > that has came back online. > # However, shortly later, we re-add the host (via it’s _new_ endpoint) to > the token ring > [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549] > # This will result in us invalidating our cache, and then again re-routing > requests appropriately. > Couple of additional thoughts: > - This doesn’t affect clusters where nodes <= RF with network topology > strategy. > - During this very brief period of time, CL for all user queries are > violated, but are ACK’d as successful. > - It’s easy to reproduce this race condition by simply adding a sleep here > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) > - If a cleanup is not ran before any range movement, it’s possible for rows > that were temporarily written to the wrong node re-appear. > - We tested that the race condition exists in our Cassandra 2.x fork (we're > not on 3.x or 4.x). So, there is a possibility here that it's only for > Cassandra 2.x, however unlikely from reading the code. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node
[ https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525905#comment-17525905 ] Brandon Williams commented on CASSANDRA-17572: -- Actually, the window here should already be very small, it's all done in the same path. > Race condition when IP address changes for a node can cause reads/writes to > route to the wrong node > --- > > Key: CASSANDRA-17572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17572 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Sam Kramer >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > Hi, > We noticed that there is a race condition present in the trunk of 3.x code, > and confirmed that it’s there in 4.x as well, which will result in incorrect > reads, and missed writes, for a very short period of time. > What brought the race condition to our attention was due to the fact we > started noticing a couple of missed writes for our Cassandra clusters in > Kubernetes. We found the Kubernetes piece interesting, as IP changes are very > frequent as opposed to a traditional setup. > More concretely: > # When a Cassandra node is turned off, and then starts with a new IP address > Z (former IP address X), it announces to the cluster (via gossip) it has IP Z > for Host ID Y > # If there are no conflicts, each node will decide to remove the old IP > address associated with Host ID Y > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) > from the storage ring. This also causes us to invalidate our token ring > cache > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] > ). > # At this time, a new request could come in (read or write), and will > re-calculate which endpoints to send the request to, as we’ve invalidated our > token ring cache > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). > # However, at this time we’ve only removed the IP address X (former IP > address), and have not re-added IP address Z. > # As a result, we will choose a new host to route our request to. In our > case, our keyspaces all run with NetworkTopologyStrategy, and so we simply > choose the node with the next closest token in the same rack as host Y > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]). > # Thus, the request is routed to a _different_ host, rather than the host > that has came back online. > # However, shortly later, we re-add the host (via it’s _new_ endpoint) to > the token ring > [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549] > # This will result in us invalidating our cache, and then again re-routing > requests appropriately. > Couple of additional thoughts: > - This doesn’t affect clusters where nodes <= RF with network topology > strategy. > - During this very brief period of time, CL for all user queries are > violated, but are ACK’d as successful. > - It’s easy to reproduce this race condition by simply adding a sleep here > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) > - If a cleanup is not ran before any range movement, it’s possible for rows > that were temporarily written to the wrong node re-appear. > - We tested that the race condition exists in our Cassandra 2.x fork (we're > not on 3.x or 4.x). So, there is a possibility here that it's only for > Cassandra 2.x, however unlikely from reading the code. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11871) Allow to aggregate by time intervals
[ https://issues.apache.org/jira/browse/CASSANDRA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-11871: -- Reviewers: Andres de la Peña, Yifan Cai (was: Andres de la Peña) Status: Review In Progress (was: Patch Available) > Allow to aggregate by time intervals > > > Key: CASSANDRA-11871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11871 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer >Priority: Normal > Fix For: 4.x > > Time Spent: 3h 20m > Remaining Estimate: 0h > > For time series data it can be usefull to aggregate by time intervals. > The idea would be to add support for one or several functions in the {{GROUP > BY}} clause. > Regarding the implementation, even if in general I also prefer to follow the > SQL syntax, I do not believe it will be a good fit for Cassandra. > If we have a table like: > {code} > CREATE TABLE trades > { > symbol text, > date date, > time time, > priceMantissa int, > priceExponent tinyint, > volume int, > PRIMARY KEY ((symbol, date), time) > }; > {code} > The trades will be inserted with an increasing time and sorted in the same > order. As we can have to process a large amount of data, we want to try to > limit ourself to the cases where we can build the groups on the flight (which > is not a requirement in the SQL world). > If we want to get the number of trades per minutes with the SQL syntax we > will have to write: > {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' > AND date = '2016-01-11' GROUP BY hour(time), minute(time);}} > which is fine. The problem is that if the user invert by mistake the > functions like that: > {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' > AND date = '2016-01-11' GROUP BY minute(time), hour(time);}} > the query will return weird results. > The only way to prevent that would be to check the function order and make > sure that we do not allow to skip functions (e.g. {{GROUP BY hour(time), > second(time)}}). > In my opinion a function like {{floor(, )}} will be > much better as it does not allow for this type of mistakes and is much more > flexible (you can create 5 minutes buckets if you want to). > {{SELECT floor(time, m), count() FROM Trades WHERE symbol = 'AAPL' AND date = > '2016-01-11' GROUP BY floor(time, m);}} > An important aspect to keep in mind with a function like {{floor}} is the > starting point. For a query like: {{SELECT floor(time, m), count() FROM > Trades WHERE symbol = 'AAPL' AND date = '2016-01-11' AND time >= '01:30:00' > AND time =< '07:30:00' GROUP BY floor(time, 2h);}}, I think that ideally the > result should return 3 groups: {{01:30:00}}, {{03:30:00}} and {{05:30:00}}. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11871) Allow to aggregate by time intervals
[ https://issues.apache.org/jira/browse/CASSANDRA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525889#comment-17525889 ] Yifan Cai commented on CASSANDRA-11871: --- +1 on the patch! CI looks good too. > Allow to aggregate by time intervals > > > Key: CASSANDRA-11871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11871 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer >Priority: Normal > Fix For: 4.x > > Time Spent: 3h 20m > Remaining Estimate: 0h > > For time series data it can be usefull to aggregate by time intervals. > The idea would be to add support for one or several functions in the {{GROUP > BY}} clause. > Regarding the implementation, even if in general I also prefer to follow the > SQL syntax, I do not believe it will be a good fit for Cassandra. > If we have a table like: > {code} > CREATE TABLE trades > { > symbol text, > date date, > time time, > priceMantissa int, > priceExponent tinyint, > volume int, > PRIMARY KEY ((symbol, date), time) > }; > {code} > The trades will be inserted with an increasing time and sorted in the same > order. As we can have to process a large amount of data, we want to try to > limit ourself to the cases where we can build the groups on the flight (which > is not a requirement in the SQL world). > If we want to get the number of trades per minutes with the SQL syntax we > will have to write: > {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' > AND date = '2016-01-11' GROUP BY hour(time), minute(time);}} > which is fine. The problem is that if the user invert by mistake the > functions like that: > {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' > AND date = '2016-01-11' GROUP BY minute(time), hour(time);}} > the query will return weird results. > The only way to prevent that would be to check the function order and make > sure that we do not allow to skip functions (e.g. {{GROUP BY hour(time), > second(time)}}). > In my opinion a function like {{floor(, )}} will be > much better as it does not allow for this type of mistakes and is much more > flexible (you can create 5 minutes buckets if you want to). > {{SELECT floor(time, m), count() FROM Trades WHERE symbol = 'AAPL' AND date = > '2016-01-11' GROUP BY floor(time, m);}} > An important aspect to keep in mind with a function like {{floor}} is the > starting point. For a query like: {{SELECT floor(time, m), count() FROM > Trades WHERE symbol = 'AAPL' AND date = '2016-01-11' AND time >= '01:30:00' > AND time =< '07:30:00' GROUP BY floor(time, 2h);}}, I think that ideally the > result should return 3 groups: {{01:30:00}}, {{03:30:00}} and {{05:30:00}}. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy
[ https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525865#comment-17525865 ] Jakub Zytka edited comment on CASSANDRA-17519 at 4/21/22 5:36 PM: -- I believe that the get/tidy race condition on 4.1 may end up in unexpected running the obsoletion code before it is due, potentially leading to some local data loss. Admittedly, I don't have a real-life scenario for that to happen. The fact that a failure of the assertion that we had on 4.0 and earlier has not been seen in the wild suggests that the occurrence probability is very low. Still, I preferred to err on the safe side, and thus the bug has been categorized as a recoverable loss. was (Author: jakubzytka): I believe that the get/tidy race condition may end up in unexpected running the obsoletion code before it is due, potentially leading to some local data loss. Admittedly, I don't have a real-life scenario for that to happen. The fact that a failure of the assertion that we had on 4.0 and earlier has not been seen in the wild suggests that the occurrence probability is very low. Still, I preferred to err on the safe side, and thus the bug has been categorized as a recoverable loss. > races/leaks in SSTableReader::GlobalTidy > > > Key: CASSANDRA-17519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17519 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, > CASSANDRA-17519-4.1-test-exposing-the-problem.txt > > > In Cassandra 4.0/3.11 there are at least two races in > SSTableReader::GlobalTidy > One is a get/get race, explicitly handled as an assertion in: > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204] > and it looks like "ok, it's a problem, but let's just not fix it" > The other one is get/tidy race between > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196] > and > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175] > > The second one can be easily hit by adding a small delay at the beginning of > `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually > such failure is what prompted the investigation of GlobalTidy correctness) > There was an attempt on `trunk` to fix these two races. > The details are not clear to me, and it all looks quite weird. I might be > mistaken, but as far as I can see the relevant changes were introduced in: > [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490] > that is piggybacked on a huge change in CASSANDRA-17008, without a separate > ticket or any sort of qa. > As far as I can see this attempt changes the first race into a leak, and the > second race to another race, this time allowing to have multiple GlobalTidy > objects for the same sstable (and, as a result, a premature running of > obsoletion code). > I'll follow up with PRs for relevant branches etc etc -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17563) Fix CircleCI Midres config
[ https://issues.apache.org/jira/browse/CASSANDRA-17563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525883#comment-17525883 ] David Capwell commented on CASSANDRA-17563: --- bq. has only an addition of a new job in_jvm which match as resource usage the old single in_jvm job and no other changes applied. Currently this is a massive amount of manual work to confirm, and there isn't a good source of truth to compare against; this patch moves the source of truth into a map so we know what happens (if you don't update you get default for the level, else you add an override) We talked about this a lot in slack, and my personal feeling is every step in a manual process is another chance for error, so the more steps done the higher risk to do it incorrectly; the current process as I can find is the following 1) create config-2_1.yml.MIDRES and config-2_1.yml.HIGHER using the current patches 2) update config-2_1.yml with your change 3) update config-2_1.yml.MIDRES with your change, and figure out how to apply the updated resources 4) update config-2_1.yml.HIGHER with your change and figure out how to apply the updated resources (method does not match step 3, so the "how" is different here) 5) generate diff for MIDRES and update patch 6) generate diff for HIGHER and update patch 7) test LOWER - success is defined as "what failed before is the only thing failing now" 8) test MIDRES - success is defined as "what failed before is the only thing failing now" 9) test HIGHER - success is defined as "what failed before is the only thing failing now" did I miss anything? bq. There are changes to the other generate.sh script I haven't looked at but any change there need to be tested given there are 0 tests for the script, 2 different people "testing" could yield different results, so we would need to have some way to define success that is agreed upon. For example, I fixed what I saw was a bug, when you ask it to generate LOWER, MIDRES, or HIGHER it doesn't actually update those files and instead only updates config.yml; the help page says this but to me this is unexpected behavior -a updates config.yml.LOWER, config.yml.MIDRES, and config.yml.HIGHER, -h updates config.yml only! I am totally cool with -h updating config.yml as well, but it feels like a bug that config.yml.HIGHER isn't updated... so my patch changing that... which one of us is the bug? Now, if we want to define it as "they did the same thing regardless of personal feelings about correct behavior" then I do know for a fact my patch is different; I am 100% ok reverting that difference bq. My concern is we don't know who was using what and how and it was working fine for quite some time I feel like a politician... can you define the word "what"? bq. Do we want to rewrite the whole approach one week before freeze when people highly utilize CI to push their latest work? the core change is moving away from patch to modifying the yaml tree; there are other changes but those are personal preference and 100% fine to drop... To me I ask the following question "if you yaml diff the old and new files, are there a difference?" if the answer is no, then there isn't much of a risk other than the script not working on an unknown laptop (which impacts generate.sh only, not CI configs). Now, if you want to de-risk that, we could use this script to generate the patches, but we don't solve the real problem of patches applying when they shouldn't (which is how I broke MIDRES). If we want to do that to lower risk before 4.1 freeze I am cool with that, but do not think that is a valid long term solution bq. now we will have a mix of python and shell scripts, are we sure the community will accept that? that is something anyone who touches this needs to answer, which is why I tried to pull in anyone who touched this logic to get their feedback. I do know that many in the community basically do this already (can tell by looking at circle ci as my private scripts rename things and cleanup our DAG), so its just moving part of that private logic into OSS to help maintain these files. bq. I really like and appreciate how you added diff but I am confused from the output what I am seeing actually. I see the new name and resource change. do you mean the output of the dump of what each job's resource is? {code} $ diff midres.resources midres.resources.new 9a10 > j11_jvm_dtests_vnode medium 10 22a24 > j8_jvm_dtests_vnode large 10 {code} so this is diff output, so ">" means that the right-hand-side has the following, but there is no matching on the left hand side... aka "new job" you see that MIDRES j8 and j11 do not have matching resources! This is because they don't now (as defined before the vnode patch), so I am pointing out that j11 and j8 run with different resources on MIDRES and that I am not changing that behavi
[jira] [Commented] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy
[ https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525871#comment-17525871 ] Benedict Elliott Smith commented on CASSANDRA-17519: I suspect two things: 1) When originally written, this code depended on the assumption that there was mutual exclusion when creating one of these tidy objects, and that assumption was later broken (or perhaps was always false); 2) A variant of this race condition was encountered by the simulator when validating Paxos, and I fixed it without paying much attention to get things moving (perhaps without even intending to properly fix it at the time, as there was too much to do), and then forgot about it. I'll try to find time to perform a proper analysis of your report and the wider problems. > races/leaks in SSTableReader::GlobalTidy > > > Key: CASSANDRA-17519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17519 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, > CASSANDRA-17519-4.1-test-exposing-the-problem.txt > > > In Cassandra 4.0/3.11 there are at least two races in > SSTableReader::GlobalTidy > One is a get/get race, explicitly handled as an assertion in: > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204] > and it looks like "ok, it's a problem, but let's just not fix it" > The other one is get/tidy race between > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196] > and > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175] > > The second one can be easily hit by adding a small delay at the beginning of > `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually > such failure is what prompted the investigation of GlobalTidy correctness) > There was an attempt on `trunk` to fix these two races. > The details are not clear to me, and it all looks quite weird. I might be > mistaken, but as far as I can see the relevant changes were introduced in: > [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490] > that is piggybacked on a huge change in CASSANDRA-17008, without a separate > ticket or any sort of qa. > As far as I can see this attempt changes the first race into a leak, and the > second race to another race, this time allowing to have multiple GlobalTidy > objects for the same sstable (and, as a result, a premature running of > obsoletion code). > I'll follow up with PRs for relevant branches etc etc -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy
[ https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Zytka updated CASSANDRA-17519: Description: In Cassandra 4.0/3.11 there are at least two races in SSTableReader::GlobalTidy One is a get/get race, explicitly handled as an assertion in: [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204] and it looks like "ok, it's a problem, but let's just not fix it" The other one is get/tidy race between [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196] and [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175] The second one can be easily hit by adding a small delay at the beginning of `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually such failure is what prompted the investigation of GlobalTidy correctness) There was an attempt on `trunk` to fix these two races. The details are not clear to me, and it all looks quite weird. I might be mistaken, but as far as I can see the relevant changes were introduced in: [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490] that is piggybacked on a huge change in CASSANDRA-17008, without a separate ticket or any sort of qa. As far as I can see this attempt changes the first race into a leak, and the second race to another race, this time allowing to have multiple GlobalTidy objects for the same sstable (and, as a result, a premature running of obsoletion code). I'll follow up with PRs for relevant branches etc etc was: In Cassandra 4.0/3.11 there are at least two races in SSTableReader::GlobalTidy One is a get/get race, explicitly handled as an assertion in: [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204] and it looks like "ok, it's a problem, but let's just not fix it" The other one is get/tidy race between [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196] and [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175] The second one can be easily hit by adding a small delay at the beginning of `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually such failure is what prompted the investigation of GlobalTidy correctness) There was an attempt on `trunk` to fix these two races. The details are not clear to me, and it all looks quite weird. I might be mistaken, but as far as I can see the relevant changes were introduced in: [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490] that is piggybacked on a huge change in CASSANDRA-17008, without a separate ticket or any sort of qa. As far as I can see this attempt changes the first race into a leak, and the second race to another race, this time allowing to have multiple GlobalTidy objects for the same sstable. I'll follow up with PRs for relevant branches etc etc > races/leaks in SSTableReader::GlobalTidy > > > Key: CASSANDRA-17519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17519 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, > CASSANDRA-17519-4.1-test-exposing-the-problem.txt > > > In Cassandra 4.0/3.11 there are at least two races in > SSTableReader::GlobalTidy > One is a get/get race, explicitly handled as an assertion in: > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204] > and it looks like "ok, it's a problem, but let's just not fix it" > The other one is get/tidy race between > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196] > and > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175] > > The second one can be easily hit by adding a small delay at the beginning of > `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually > such failure is what
[jira] [Updated] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy
[ https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Zytka updated CASSANDRA-17519: Test and Documentation Plan: a simple concurrency unit test is included Status: Patch Available (was: Open) > races/leaks in SSTableReader::GlobalTidy > > > Key: CASSANDRA-17519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17519 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, > CASSANDRA-17519-4.1-test-exposing-the-problem.txt > > > In Cassandra 4.0/3.11 there are at least two races in > SSTableReader::GlobalTidy > One is a get/get race, explicitly handled as an assertion in: > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204] > and it looks like "ok, it's a problem, but let's just not fix it" > The other one is get/tidy race between > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196] > and > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175] > > The second one can be easily hit by adding a small delay at the beginning of > `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually > such failure is what prompted the investigation of GlobalTidy correctness) > There was an attempt on `trunk` to fix these two races. > The details are not clear to me, and it all looks quite weird. I might be > mistaken, but as far as I can see the relevant changes were introduced in: > [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490] > that is piggybacked on a huge change in CASSANDRA-17008, without a separate > ticket or any sort of qa. > As far as I can see this attempt changes the first race into a leak, and the > second race to another race, this time allowing to have multiple GlobalTidy > objects for the same sstable. > I'll follow up with PRs for relevant branches etc etc -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy
[ https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Zytka updated CASSANDRA-17519: Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable Corruption / Loss(12986) Complexity: Normal Component/s: Legacy/Core Discovered By: Unit Test Severity: Normal Status: Open (was: Triage Needed) I believe that the get/tidy race condition may end up in unexpected running the obsoletion code before it is due, potentially leading to some local data loss. Admittedly, I don't have a real-life scenario for that to happen. The fact that a failure of the assertion that we had on 4.0 and earlier has not been seen in the wild suggests that the occurrence probability is very low. Still, I preferred to err on the safe side, and thus the bug has been categorized as a recoverable loss. > races/leaks in SSTableReader::GlobalTidy > > > Key: CASSANDRA-17519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17519 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, > CASSANDRA-17519-4.1-test-exposing-the-problem.txt > > > In Cassandra 4.0/3.11 there are at least two races in > SSTableReader::GlobalTidy > One is a get/get race, explicitly handled as an assertion in: > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204] > and it looks like "ok, it's a problem, but let's just not fix it" > The other one is get/tidy race between > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196] > and > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175] > > The second one can be easily hit by adding a small delay at the beginning of > `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually > such failure is what prompted the investigation of GlobalTidy correctness) > There was an attempt on `trunk` to fix these two races. > The details are not clear to me, and it all looks quite weird. I might be > mistaken, but as far as I can see the relevant changes were introduced in: > [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490] > that is piggybacked on a huge change in CASSANDRA-17008, without a separate > ticket or any sort of qa. > As far as I can see this attempt changes the first race into a leak, and the > second race to another race, this time allowing to have multiple GlobalTidy > objects for the same sstable. > I'll follow up with PRs for relevant branches etc etc -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy
[ https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525860#comment-17525860 ] Jakub Zytka commented on CASSANDRA-17519: - Hi, I've never submitted a patch to Cassandra before, so please bear with me. I attached 3 files: # test that exposes the described problem for `trunk`: [^CASSANDRA-17519-4.1-test-exposing-the-problem.txt] # the actual fix for `trunk`: [^CASSANDRA-17519-4.1-fix.txt] # the test and fix squashed, for cassandra-4.0 (there are slight differences due to resource leak handling): [^CASSANDRA-17519-4.0.txt] I took the liberty of putting comments liberally around the changed code. I think it's a good idea especially due to previous unsuccessful attempts to fix the code. One thing that I did not do, but I think is worth considering is to run the obsoletion code *before* removing the relevant entry from the lookup table. It looks more natural and removes the potential for yet another race condition (currently the obsoletion code must not assume that no other obsoletion code for the same descriptor is running). I understand that this is hardly possible, but I think that in general, it is safer to use the postulated order of execution - first obsoletion, and only then the removal from lookup. [~benedict] you might be interested in doing the review, as you changed the GlobalTidy code recently. (also, [~samt] , who was the reviewer). > races/leaks in SSTableReader::GlobalTidy > > > Key: CASSANDRA-17519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17519 > Project: Cassandra > Issue Type: Bug >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, > CASSANDRA-17519-4.1-test-exposing-the-problem.txt > > > In Cassandra 4.0/3.11 there are at least two races in > SSTableReader::GlobalTidy > One is a get/get race, explicitly handled as an assertion in: > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204] > and it looks like "ok, it's a problem, but let's just not fix it" > The other one is get/tidy race between > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196] > and > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175] > > The second one can be easily hit by adding a small delay at the beginning of > `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually > such failure is what prompted the investigation of GlobalTidy correctness) > There was an attempt on `trunk` to fix these two races. > The details are not clear to me, and it all looks quite weird. I might be > mistaken, but as far as I can see the relevant changes were introduced in: > [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490] > that is piggybacked on a huge change in CASSANDRA-17008, without a separate > ticket or any sort of qa. > As far as I can see this attempt changes the first race into a leak, and the > second race to another race, this time allowing to have multiple GlobalTidy > objects for the same sstable. > I'll follow up with PRs for relevant branches etc etc -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy
[ https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Zytka updated CASSANDRA-17519: Attachment: CASSANDRA-17519-4.0.txt > races/leaks in SSTableReader::GlobalTidy > > > Key: CASSANDRA-17519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17519 > Project: Cassandra > Issue Type: Bug >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, > CASSANDRA-17519-4.1-test-exposing-the-problem.txt > > > In Cassandra 4.0/3.11 there are at least two races in > SSTableReader::GlobalTidy > One is a get/get race, explicitly handled as an assertion in: > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204] > and it looks like "ok, it's a problem, but let's just not fix it" > The other one is get/tidy race between > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196] > and > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175] > > The second one can be easily hit by adding a small delay at the beginning of > `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually > such failure is what prompted the investigation of GlobalTidy correctness) > There was an attempt on `trunk` to fix these two races. > The details are not clear to me, and it all looks quite weird. I might be > mistaken, but as far as I can see the relevant changes were introduced in: > [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490] > that is piggybacked on a huge change in CASSANDRA-17008, without a separate > ticket or any sort of qa. > As far as I can see this attempt changes the first race into a leak, and the > second race to another race, this time allowing to have multiple GlobalTidy > objects for the same sstable. > I'll follow up with PRs for relevant branches etc etc -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy
[ https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Zytka updated CASSANDRA-17519: Attachment: CASSANDRA-17519-4.1-fix.txt CASSANDRA-17519-4.1-test-exposing-the-problem.txt > races/leaks in SSTableReader::GlobalTidy > > > Key: CASSANDRA-17519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17519 > Project: Cassandra > Issue Type: Bug >Reporter: Jakub Zytka >Assignee: Jakub Zytka >Priority: Normal > Attachments: CASSANDRA-17519-4.1-fix.txt, > CASSANDRA-17519-4.1-test-exposing-the-problem.txt > > > In Cassandra 4.0/3.11 there are at least two races in > SSTableReader::GlobalTidy > One is a get/get race, explicitly handled as an assertion in: > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204] > and it looks like "ok, it's a problem, but let's just not fix it" > The other one is get/tidy race between > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196] > and > [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175] > > The second one can be easily hit by adding a small delay at the beginning of > `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually > such failure is what prompted the investigation of GlobalTidy correctness) > There was an attempt on `trunk` to fix these two races. > The details are not clear to me, and it all looks quite weird. I might be > mistaken, but as far as I can see the relevant changes were introduced in: > [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490] > that is piggybacked on a huge change in CASSANDRA-17008, without a separate > ticket or any sort of qa. > As far as I can see this attempt changes the first race into a leak, and the > second race to another race, this time allowing to have multiple GlobalTidy > objects for the same sstable. > I'll follow up with PRs for relevant branches etc etc -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node
[ https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525847#comment-17525847 ] Brandon Williams commented on CASSANDRA-17572: -- It seems like the simplest thing to do would be to move the tokenMetadata.removeEndpoint to updateTokenMetadata, much like is being done with endpointsToRemove, that way we aren't invalidating the cache until the new IP has ownership. > Race condition when IP address changes for a node can cause reads/writes to > route to the wrong node > --- > > Key: CASSANDRA-17572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17572 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Sam Kramer >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > Hi, > We noticed that there is a race condition present in the trunk of 3.x code, > and confirmed that it’s there in 4.x as well, which will result in incorrect > reads, and missed writes, for a very short period of time. > What brought the race condition to our attention was due to the fact we > started noticing a couple of missed writes for our Cassandra clusters in > Kubernetes. We found the Kubernetes piece interesting, as IP changes are very > frequent as opposed to a traditional setup. > More concretely: > # When a Cassandra node is turned off, and then starts with a new IP address > Z (former IP address X), it announces to the cluster (via gossip) it has IP Z > for Host ID Y > # If there are no conflicts, each node will decide to remove the old IP > address associated with Host ID Y > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) > from the storage ring. This also causes us to invalidate our token ring > cache > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] > ). > # At this time, a new request could come in (read or write), and will > re-calculate which endpoints to send the request to, as we’ve invalidated our > token ring cache > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). > # However, at this time we’ve only removed the IP address X (former IP > address), and have not re-added IP address Z. > # As a result, we will choose a new host to route our request to. In our > case, our keyspaces all run with NetworkTopologyStrategy, and so we simply > choose the node with the next closest token in the same rack as host Y > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]). > # Thus, the request is routed to a _different_ host, rather than the host > that has came back online. > # However, shortly later, we re-add the host (via it’s _new_ endpoint) to > the token ring > [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549] > # This will result in us invalidating our cache, and then again re-routing > requests appropriately. > Couple of additional thoughts: > - This doesn’t affect clusters where nodes <= RF with network topology > strategy. > - During this very brief period of time, CL for all user queries are > violated, but are ACK’d as successful. > - It’s easy to reproduce this race condition by simply adding a sleep here > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) > - If a cleanup is not ran before any range movement, it’s possible for rows > that were temporarily written to the wrong node re-appear. > - We tested that the race condition exists in our Cassandra 2.x fork (we're > not on 3.x or 4.x). So, there is a possibility here that it's only for > Cassandra 2.x, however unlikely from reading the code. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17560) Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long
[ https://issues.apache.org/jira/browse/CASSANDRA-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525843#comment-17525843 ] David Capwell commented on CASSANDRA-17560: --- thanks! Pushed to the source branch linked above, and watching CI > Migrate track_warnings to more standard naming conventions and use latest > configuration types rather than long > -- > > Key: CASSANDRA-17560 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17560 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.1 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Track warnings currently is nested which is discouraged at the moment. It > also was before the config standards patch which moved storage typed longs to > a new DataStorageSpec type, we should migrate the configs there. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node
Sam Kramer created CASSANDRA-17572: -- Summary: Race condition when IP address changes for a node can cause reads/writes to route to the wrong node Key: CASSANDRA-17572 URL: https://issues.apache.org/jira/browse/CASSANDRA-17572 Project: Cassandra Issue Type: Bug Reporter: Sam Kramer Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z, it announces to the cluster (via gossip) it has IP X for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X, and have not re-added IP address Z. As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the node with the next closest token in the same rack as host Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]). # Thus, the request is routed to a _different_ host, rather than the host that has came back online. # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the token ring [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549] # This will result in us invalidating our cache, and then again re-routing requests appropriately. Couple of additional thoughts: - This doesn’t affect clusters where nodes <= RF with network topology strategy. - During this very brief period of time, CL for all user queries are violated, but are ACK’d as successful. - It’s easy to reproduce this race condition by simply adding a sleep here ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) - If a cleanup is not ran before any range movement, it’s possible for rows that were temporarily written to the wrong node re-appear. - We tested that the race condition exist in our Cassandra 2.x fork (we're not on 3.x or 4.x). So, there is a possibility here that it's on for Cassandra 2.x, however unlikely. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node
[ https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-17572: - Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable Corruption / Loss(12986) Complexity: Normal Component/s: Cluster/Membership Discovered By: User Report Fix Version/s: 3.0.x 3.11.x 4.0.x 4.x Severity: Normal Status: Open (was: Triage Needed) > Race condition when IP address changes for a node can cause reads/writes to > route to the wrong node > --- > > Key: CASSANDRA-17572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17572 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Sam Kramer >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > Hi, > We noticed that there is a race condition present in the trunk of 3.x code, > and confirmed that it’s there in 4.x as well, which will result in incorrect > reads, and missed writes, for a very short period of time. > What brought the race condition to our attention was due to the fact we > started noticing a couple of missed writes for our Cassandra clusters in > Kubernetes. We found the Kubernetes piece interesting, as IP changes are very > frequent as opposed to a traditional setup. > More concretely: > # When a Cassandra node is turned off, and then starts with a new IP address > Z (former IP address X), it announces to the cluster (via gossip) it has IP Z > for Host ID Y > # If there are no conflicts, each node will decide to remove the old IP > address associated with Host ID Y > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) > from the storage ring. This also causes us to invalidate our token ring > cache > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] > ). > # At this time, a new request could come in (read or write), and will > re-calculate which endpoints to send the request to, as we’ve invalidated our > token ring cache > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). > # However, at this time we’ve only removed the IP address X (former IP > address), and have not re-added IP address Z. > # As a result, we will choose a new host to route our request to. In our > case, our keyspaces all run with NetworkTopologyStrategy, and so we simply > choose the node with the next closest token in the same rack as host Y > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]). > # Thus, the request is routed to a _different_ host, rather than the host > that has came back online. > # However, shortly later, we re-add the host (via it’s _new_ endpoint) to > the token ring > [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549] > # This will result in us invalidating our cache, and then again re-routing > requests appropriately. > Couple of additional thoughts: > - This doesn’t affect clusters where nodes <= RF with network topology > strategy. > - During this very brief period of time, CL for all user queries are > violated, but are ACK’d as successful. > - It’s easy to reproduce this race condition by simply adding a sleep here > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) > - If a cleanup is not ran before any range movement, it’s possible for rows > that were temporarily written to the wrong node re-appear. > - We tested that the race condition exists in our Cassandra 2.x fork (we're > not on 3.x or 4.x). So, there is a possibility here that it's only for > Cassandra 2.x, however unlikely from reading the code. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17166) Enhance SnakeYAML properties to be reusable outside of YAML parsing, support camel case conversion to snake case, and add support to ignore properties
[ https://issues.apache.org/jira/browse/CASSANDRA-17166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525840#comment-17525840 ] Caleb Rackliffe commented on CASSANDRA-17166: - Rebase and additional cleanups LGTM > Enhance SnakeYAML properties to be reusable outside of YAML parsing, support > camel case conversion to snake case, and add support to ignore properties > -- > > Key: CASSANDRA-17166 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17166 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.x > > Time Spent: 15h 10m > Remaining Estimate: 0h > > SnakeYaml is rather limited in the “object mapping” layer, which forces our > internal code to match specific patterns (all fields public and camel case); > we can remove this restriction by leveraging Jackson for property lookup, and > leaving the YAML handling to SnakeYAML -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node
[ https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Kramer updated CASSANDRA-17572: --- Description: Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z (former IP address X), it announces to the cluster (via gossip) it has IP Z for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X, and have not re-added IP address Z. # As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the node with the next closest token in the same rack as host Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]). # Thus, the request is routed to a _different_ host, rather than the host that has came back online. # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the token ring [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549] # This will result in us invalidating our cache, and then again re-routing requests appropriately. Couple of additional thoughts: - This doesn’t affect clusters where nodes <= RF with network topology strategy. - During this very brief period of time, CL for all user queries are violated, but are ACK’d as successful. - It’s easy to reproduce this race condition by simply adding a sleep here ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) - If a cleanup is not ran before any range movement, it’s possible for rows that were temporarily written to the wrong node re-appear. - We tested that the race condition exists in our Cassandra 2.x fork (we're not on 3.x or 4.x). So, there is a possibility here that it's only for Cassandra 2.x, however unlikely from reading the code. was: Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z, it announces to the cluster (via gossip) it has IP X for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X, and have not re-added IP address Z. # As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the node wi
[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node
[ https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Kramer updated CASSANDRA-17572: --- Description: Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z (former IP address X), it announces to the cluster (via gossip) it has IP Z for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X (former IP address), and have not re-added IP address Z. # As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the node with the next closest token in the same rack as host Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]). # Thus, the request is routed to a _different_ host, rather than the host that has came back online. # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the token ring [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549] # This will result in us invalidating our cache, and then again re-routing requests appropriately. Couple of additional thoughts: - This doesn’t affect clusters where nodes <= RF with network topology strategy. - During this very brief period of time, CL for all user queries are violated, but are ACK’d as successful. - It’s easy to reproduce this race condition by simply adding a sleep here ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) - If a cleanup is not ran before any range movement, it’s possible for rows that were temporarily written to the wrong node re-appear. - We tested that the race condition exists in our Cassandra 2.x fork (we're not on 3.x or 4.x). So, there is a possibility here that it's only for Cassandra 2.x, however unlikely from reading the code. was: Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z (former IP address X), it announces to the cluster (via gossip) it has IP Z for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X, and have not re-added IP address Z. # As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStr
[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node
[ https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Kramer updated CASSANDRA-17572: --- Description: Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z, it announces to the cluster (via gossip) it has IP X for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X, and have not re-added IP address Z. # As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the node with the next closest token in the same rack as host Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]). # Thus, the request is routed to a _different_ host, rather than the host that has came back online. # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the token ring [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549] # This will result in us invalidating our cache, and then again re-routing requests appropriately. Couple of additional thoughts: - This doesn’t affect clusters where nodes <= RF with network topology strategy. - During this very brief period of time, CL for all user queries are violated, but are ACK’d as successful. - It’s easy to reproduce this race condition by simply adding a sleep here ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) - If a cleanup is not ran before any range movement, it’s possible for rows that were temporarily written to the wrong node re-appear. - We tested that the race condition exists in our Cassandra 2.x fork (we're not on 3.x or 4.x). So, there is a possibility here that it's only for Cassandra 2.x, however unlikely from reading the code. was: Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z, it announces to the cluster (via gossip) it has IP X for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X, and have not re-added IP address Z. # As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the node with the next closest tok
[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node
[ https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Kramer updated CASSANDRA-17572: --- Description: Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z, it announces to the cluster (via gossip) it has IP X for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X, and have not re-added IP address Z. # As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the node with the next closest token in the same rack as host Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]). # Thus, the request is routed to a _different_ host, rather than the host that has came back online. # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the token ring [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549] # This will result in us invalidating our cache, and then again re-routing requests appropriately. Couple of additional thoughts: - This doesn’t affect clusters where nodes <= RF with network topology strategy. - During this very brief period of time, CL for all user queries are violated, but are ACK’d as successful. - It’s easy to reproduce this race condition by simply adding a sleep here ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) - If a cleanup is not ran before any range movement, it’s possible for rows that were temporarily written to the wrong node re-appear. - We tested that the race condition exists in our Cassandra 2.x fork (we're not on 3.x or 4.x). So, there is a possibility here that it's on for Cassandra 2.x, however unlikely. was: Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z, it announces to the cluster (via gossip) it has IP X for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X, and have not re-added IP address Z. # As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the node with the next closest token in the same rack as h
[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node
[ https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Kramer updated CASSANDRA-17572: --- Description: Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z, it announces to the cluster (via gossip) it has IP X for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X, and have not re-added IP address Z. # As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the node with the next closest token in the same rack as host Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]). # Thus, the request is routed to a _different_ host, rather than the host that has came back online. # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the token ring [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549] # This will result in us invalidating our cache, and then again re-routing requests appropriately. Couple of additional thoughts: - This doesn’t affect clusters where nodes <= RF with network topology strategy. - During this very brief period of time, CL for all user queries are violated, but are ACK’d as successful. - It’s easy to reproduce this race condition by simply adding a sleep here ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) - If a cleanup is not ran before any range movement, it’s possible for rows that were temporarily written to the wrong node re-appear. - We tested that the race condition exist in our Cassandra 2.x fork (we're not on 3.x or 4.x). So, there is a possibility here that it's on for Cassandra 2.x, however unlikely. was: Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z, it announces to the cluster (via gossip) it has IP X for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X, and have not re-added IP address Z. # As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the node with the next closest token in the same rack as h
[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node
[ https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Kramer updated CASSANDRA-17572: --- Description: Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z, it announces to the cluster (via gossip) it has IP X for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X, and have not re-added IP address Z. # As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the node with the next closest token in the same rack as host Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]). # Thus, the request is routed to a _different_ host, rather than the host that has came back online. # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the token ring [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549] # This will result in us invalidating our cache, and then again re-routing requests appropriately. Couple of additional thoughts: - This doesn’t affect clusters where nodes <= RF with network topology strategy. - During this very brief period of time, CL for all user queries are violated, but are ACK’d as successful. - It’s easy to reproduce this race condition by simply adding a sleep here ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) - If a cleanup is not ran before any range movement, it’s possible for rows that were temporarily written to the wrong node re-appear. - We tested that the race condition exist in our Cassandra 2.x fork (we're not on 3.x or 4.x). So, there is a possibility here that it's on for Cassandra 2.x, however unlikely. was: Hi, We noticed that there is a race condition present in the trunk of 3.x code, and confirmed that it’s there in 4.x as well, which will result in incorrect reads, and missed writes, for a very short period of time. What brought the race condition to our attention was due to the fact we started noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We found the Kubernetes piece interesting, as IP changes are very frequent as opposed to a traditional setup. More concretely: # When a Cassandra node is turned off, and then starts with a new IP address Z, it announces to the cluster (via gossip) it has IP X for Host ID Y # If there are no conflicts, each node will decide to remove the old IP address associated with Host ID Y ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) from the storage ring. This also causes us to invalidate our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] ). # At this time, a new request could come in (read or write), and will re-calculate which endpoints to send the request to, as we’ve invalidated our token ring cache ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). # However, at this time we’ve only removed the IP address X, and have not re-added IP address Z. As a result, we will choose a new host to route our request to. In our case, our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the node with the next closest token in the same rack as h
[jira] [Commented] (CASSANDRA-16456) Add Plugin Support for CQLSH
[ https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525830#comment-17525830 ] Brian Houser commented on CASSANDRA-16456: -- Thanks for the notes I'll update the code. Hmm ok let me explain my thinking Cqlsh.py is in charge of parsing stuff at the command line level, and processing the legacy authentication section. >From this it gets a user name and password. At this point my thinking was >that it should work exactly as it did before: * if there is a username, but no password, it should prompt for a password. * If there is no username, no password and no auth_provider, it should just use None for auth provider * If there is a username and a password, it should use it directly. If you are specifying a new AuthProvider (that is something that isn't PlainTextAuthProvider) than the convention is very simple... * Get the module and class name form the auth_provider section of the cqlshrc file * Get additional properties from any properties left in [auth_provider]section of the cqlshrc file * Get additional properties from everything in the credentials section labeled with the auth_provider class name. For example... If I am using the FooAuthProvider... my cqlshrc file would look like this... ``` [auth_provider] module = foo.foo classname = FooAuthProvider prop1 = value1 ``` My credentials file might look like this... ``` [FooAuthProvider] prop2= value2 ``` FooAuthProvider would get called with the name prop1, prop2. Notice that if there is no auth_provider section in cqlshrc file specifying what you want to load... the credentials file won't find any properties. You need to specify an auth_provider to use the "new school" way of loading the credentials file. The whole intent of specifying the auth provider name in the credentials file seemed to be to allow there to be different credentials in one place depending on the auth provider specified. In keeping with python convention, I was trying to force you to be specific if you were going to use the new way of loading stuff... since this is meant for custom loading of auth providers. There's already a legacy case for authentication section, specifying the username on the command line. It seems you want it to default to PlainTextAuthProvider in all cases when auth provider isn't specified, I can do that pretty easily in the Authhandling bit. In which case, if you don't specify any provider in the cqlshrc file, I'll assume you meant PlainTextAuthProvider, and pull it from the credentials file if it exists and no other auth_provider is specified. I appreciate that you provided a fix for your concern, but unfortunately it's easy to see this creating a clash with newer providers. If a provider happens to use a property called 'username' with the fix you propose, I'll end up loading the plaintextauth provider instead of the one specified, which would be pretty confusing. I'd rather shove any new logic into the authhandling piece where it can be unit tested more easily. > Add Plugin Support for CQLSH > > > Key: CASSANDRA-16456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16456 > Project: Cassandra > Issue Type: New Feature > Components: Tool/cqlsh >Reporter: Brian Houser >Assignee: Brian Houser >Priority: Normal > Labels: gsoc2021, mentor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently the Cassandra drivers offer a plugin authenticator architecture for > the support of different authentication methods. This has been leveraged to > provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, > cqlsh, the included CLI tool, does not offer such support. Switching to a new > enhanced authentication scheme thus means being cut off from using cqlsh in > normal operation. > We should have a means of using the same plugins and authentication providers > as the Python Cassandra driver. > Here's a link to an initial draft of > [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17565) Fix test_parallel_upgrade_with_internode_ssl
[ https://issues.apache.org/jira/browse/CASSANDRA-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525815#comment-17525815 ] Brandon Williams commented on CASSANDRA-17565: -- Here are the branches and precommit CI on circle: ||Branch||Precommit CI|| |[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-17565-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/447/workflows/d512f2ec-d340-4075-9c3c-22099d23c73c], [j11|https://app.circleci.com/pipelines/github/driftx/cassandra/447/workflows/5a5204f5-fa28-40b0-9f56-64765601999f]| |[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-17565-trunk]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/446/workflows/6796d66c-c1db-4a32-80cf-d069100fe19a], [j11|https://app.circleci.com/pipelines/github/driftx/cassandra/446/workflows/cf5c2d25-274e-4821-91dc-aabf9c5ad986]| > Fix test_parallel_upgrade_with_internode_ssl > > > Key: CASSANDRA-17565 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17565 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.x > > > While working on CASSANDRA-17341 I hit this flaky test, very rarely failing > but it is failing on trunk. > More info in this CI run: > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/1563/workflows/61bda0b7-f699-4897-877f-c7d523a03127/jobs/10318 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17571) Config upper bound should be handled earlier
[ https://issues.apache.org/jira/browse/CASSANDRA-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525782#comment-17525782 ] Ekaterina Dimitrova commented on CASSANDRA-17571: - Marking as 4.1 block as there was a discussion to add extended classes for Int to handle old int parameters upper bound and changing those in Config will be considered breaking change after a release. CC [~dcapwell] and [~maedhroz] and [~mck] I will push the suggested classes in the next few hours for approval before moving any config to them. > Config upper bound should be handled earlier > > > Key: CASSANDRA-17571 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17571 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1 > > > Config upper bound should be handled on startup/config setup and not during > conversion -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17571) Config upper bound should be handled earlier
[ https://issues.apache.org/jira/browse/CASSANDRA-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-17571: Fix Version/s: 4.1 (was: 4.x) > Config upper bound should be handled earlier > > > Key: CASSANDRA-17571 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17571 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1 > > > Config upper bound should be handled on startup/config setup and not during > conversion -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17571) Config upper bound should be handled earlier
[ https://issues.apache.org/jira/browse/CASSANDRA-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-17571: Fix Version/s: 4.x > Config upper bound should be handled earlier > > > Key: CASSANDRA-17571 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17571 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.x > > > Config upper bound should be handled on startup/config setup and not during > conversion -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17571) Config upper bound should be handled earlier
[ https://issues.apache.org/jira/browse/CASSANDRA-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-17571: Bug Category: Parent values: Code(13163) Complexity: Low Hanging Fruit Component/s: Local/Config Discovered By: User Report Severity: Low Assignee: Ekaterina Dimitrova Status: Open (was: Triage Needed) > Config upper bound should be handled earlier > > > Key: CASSANDRA-17571 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17571 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > > Config upper bound should be handled on startup/config setup and not during > conversion -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-17571) Config upper bound should be handled earlier
Ekaterina Dimitrova created CASSANDRA-17571: --- Summary: Config upper bound should be handled earlier Key: CASSANDRA-17571 URL: https://issues.apache.org/jira/browse/CASSANDRA-17571 Project: Cassandra Issue Type: Bug Reporter: Ekaterina Dimitrova Config upper bound should be handled on startup/config setup and not during conversion -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17563) Fix CircleCI Midres config
[ https://issues.apache.org/jira/browse/CASSANDRA-17563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525764#comment-17525764 ] Ekaterina Dimitrova commented on CASSANDRA-17563: - My previous point still stands that we need to be sure everything on all branches works as before. IMHO what we really need to see is that every new highres and midres on every branch has only an addition of a new job in_jvm which match as resource usage the old single in_jvm job and no other changes applied. Thanks for all the updates and adding the docs, etc. I understand and really appreciate your good intentions for improvement and ease of maintenance. Unfortunately, I have a few immediate concerns w which make me think we need to have immediate fix for midres and rewrites of the scripts after the release: * There are changes to the other generate.sh script I haven't looked at but any change there need to be tested that it didn't break any of the options added and tested one by one by [~adelapena] * My concern is we don't know who was using what and how and it was working fine for quite some time. Do we want to rewrite the whole approach one week before freeze when people highly utilize CI to push their latest work? What do others think? * Also, now we will have a mix of python and shell scripts, are we sure the community will accept that? I really like and appreciate how you added diff but I am confused from the output what I am seeing actually. I see the new name and resource change. > Fix CircleCI Midres config > -- > > Key: CASSANDRA-17563 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17563 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1 > > > During CircleCI addition of a new job to the config, the midres file got > messy. Two of the immediate issues (but we need to verify all jobs will use > the right executors and resources): > * the new job needs to use higher parallelism as the original in-jvm job > * j8_dtests_with_vnodes should get from midres 50 large but currently > midres makes it run with 25 and medium which fails around 100 tests -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17568) Implement nodetool command to list data directories of existing tables
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525749#comment-17525749 ] Tibor Repasi commented on CASSANDRA-17568: -- I've addressed all review comments for now and looking forward to add an option listing the orphaned directories. > Implement nodetool command to list data directories of existing tables > -- > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > Time Spent: 7h 40m > Remaining Estimate: 0h > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17564) Add synchronization to wait for outstanding tasks in the compaction executor and nonPeriodicTasks during CassandraDaemon setup
[ https://issues.apache.org/jira/browse/CASSANDRA-17564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525712#comment-17525712 ] Stefan Miklosovic commented on CASSANDRA-17564: --- units passed https://app.circleci.com/pipelines/github/instaclustr/cassandra/929/workflows/a33ef23d-0ae1-4a5d-9a36-a55f914f484f > Add synchronization to wait for outstanding tasks in the compaction executor > and nonPeriodicTasks during CassandraDaemon setup > -- > > Key: CASSANDRA-17564 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17564 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Haoze Wu >Priority: Normal > Fix For: 3.11.x, 4.0.x > > Time Spent: 10m > Remaining Estimate: 0h > > We have been testing Cassandra 3.11.10 for a while. During a node start, we > found that a synchronization guarantee implied by the code comments is not > enforced. Specifically, in the `invalidate` method called in this call stack > (in version 3.11.10): > {code:java} > org.apache.cassandra.service.CassandraDaemon#main:786 > org.apache.cassandra.service.CassandraDaemon#activate:633 > org.apache.cassandra.service.CassandraDaemon#setup:261 > org.apache.cassandra.schema.LegacySchemaMigrator#migrate:83 > org.apache.cassandra.schema.LegacySchemaMigrator#unloadLegacySchemaTables:137 > java.lang.Iterable#forEach:75 > org.apache.cassandra.schema.LegacySchemaMigrator#lambda$unloadLegacySchemaTables$1:137 > org.apache.cassandra.db.ColumnFamilyStore#invalidate:542 {code} > In line 564~570 within `public void invalidate(boolean expectMBean)`: > {code:java} > latencyCalculator.cancel(false); > compactionStrategyManager.shutdown(); > SystemKeyspace.removeTruncationRecord(metadata.cfId); // line 566 > data.dropSSTables(); // line 568 > LifecycleTransaction.waitForDeletions(); // line 569 > indexManager.invalidateAllIndexesBlocking(); > {code} > According to the code and the comments, we suppose `data.dropSSTables()` in > line 568 will submit some tidier tasks to the `nonPeriodicTasks` thread pool. > Call stack in version 3.11.10: > {code:java} > org.apache.cassandra.db.lifecycle.Tracker#dropSSTables:233 > org.apache.cassandra.db.lifecycle.Tracker#dropSSTables:238 > org.apache.cassandra.db.lifecycle.Tracker#dropSSTables:267 > org.apache.cassandra.utils.concurrent.Refs#release:241 > org.apache.cassandra.utils.concurrent.Ref#release:119 > org.apache.cassandra.utils.concurrent.Ref#release:225 > org.apache.cassandra.utils.concurrent.Ref#release:326 > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier#tidy:2205 > {code} > Then, `LifecycleTransaction.waitForDeletions()` in line 569 is > {code:java} > /** > * Deletions run on the nonPeriodicTasks executor, (both failedDeletions > or global tidiers in SSTableReader) > * so by scheduling a new empty task and waiting for it we ensure any > prior deletion has completed. > */ > public static void waitForDeletions() > { > LogTransaction.waitForDeletions(); > } > {code} > And then call `waitForDeletions` in `LogTransaction`: > {code:java} > static void waitForDeletions() > { > > FBUtilities.waitOnFuture(ScheduledExecutors.nonPeriodicTasks.schedule(Runnables.doNothing(), > 0, TimeUnit.MILLISECONDS)); > } > {code} > From the comments, we think it ensures that all existing tasks in > `nonPeriodicTasks` are drained. However, we found some tidier tasks are still > running in `nonPeriodicTasks` thread pool. > We suspect that those tidier tasks should be guaranteed to finish during > server setup, because of its exception handling. In version 3.11.10, these > tidier tasks are submitted to `nonPeriodicTasks` in > `SSTableReader$InstanceTidier#tidy:2205`, and have the exception handling > `FileUtils.handleFSErrorAndPropagate(new FSWriteError(e, file))` (within the > call stack `SSTableReader$InstanceTidier$1#run:2223` => > `LogTransaction$SSTableTidier#run:386` => `LogTransaction#delete:261`). > The `FileUtils.handleFSErrorAndPropagate` handles this `FSWriteError`. We > found that it checks the `CassandraDaemon.setupCompleted` flag in call stack > within (`FileUtils#handleFSErrorAndPropagate:507` => > `JVMStabilityInspector#inspectThrowable:60` => > `JVMStabilityInspector#inspectThrowable:106` => > `JVMStabilityInspector#inspectDiskError:73` => `FileUtils#handleFSError:494` > => `DefaultFSErrorHandler:handleFSError:58`) > {code:java} > if (!StorageService.instance.isDaemonSetupCompleted()) // line 58 > handleStartupFSError(e);
[jira] [Commented] (CASSANDRA-17564) Add synchronization to wait for outstanding tasks in the compaction executor and nonPeriodicTasks during CassandraDaemon setup
[ https://issues.apache.org/jira/browse/CASSANDRA-17564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525672#comment-17525672 ] Stefan Miklosovic commented on CASSANDRA-17564: --- Hi [~functioner] Would you please try if this is any better for you? (1) I think the issue is that if, hypothetically, that runnable submitted in InstanceTidier throws, the global ref will never be released. cc [~benedict] (1) https://github.com/instaclustr/cassandra/tree/CASSANDRA-17564 > Add synchronization to wait for outstanding tasks in the compaction executor > and nonPeriodicTasks during CassandraDaemon setup > -- > > Key: CASSANDRA-17564 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17564 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Haoze Wu >Priority: Normal > Fix For: 3.11.x, 4.0.x > > Time Spent: 10m > Remaining Estimate: 0h > > We have been testing Cassandra 3.11.10 for a while. During a node start, we > found that a synchronization guarantee implied by the code comments is not > enforced. Specifically, in the `invalidate` method called in this call stack > (in version 3.11.10): > {code:java} > org.apache.cassandra.service.CassandraDaemon#main:786 > org.apache.cassandra.service.CassandraDaemon#activate:633 > org.apache.cassandra.service.CassandraDaemon#setup:261 > org.apache.cassandra.schema.LegacySchemaMigrator#migrate:83 > org.apache.cassandra.schema.LegacySchemaMigrator#unloadLegacySchemaTables:137 > java.lang.Iterable#forEach:75 > org.apache.cassandra.schema.LegacySchemaMigrator#lambda$unloadLegacySchemaTables$1:137 > org.apache.cassandra.db.ColumnFamilyStore#invalidate:542 {code} > In line 564~570 within `public void invalidate(boolean expectMBean)`: > {code:java} > latencyCalculator.cancel(false); > compactionStrategyManager.shutdown(); > SystemKeyspace.removeTruncationRecord(metadata.cfId); // line 566 > data.dropSSTables(); // line 568 > LifecycleTransaction.waitForDeletions(); // line 569 > indexManager.invalidateAllIndexesBlocking(); > {code} > According to the code and the comments, we suppose `data.dropSSTables()` in > line 568 will submit some tidier tasks to the `nonPeriodicTasks` thread pool. > Call stack in version 3.11.10: > {code:java} > org.apache.cassandra.db.lifecycle.Tracker#dropSSTables:233 > org.apache.cassandra.db.lifecycle.Tracker#dropSSTables:238 > org.apache.cassandra.db.lifecycle.Tracker#dropSSTables:267 > org.apache.cassandra.utils.concurrent.Refs#release:241 > org.apache.cassandra.utils.concurrent.Ref#release:119 > org.apache.cassandra.utils.concurrent.Ref#release:225 > org.apache.cassandra.utils.concurrent.Ref#release:326 > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier#tidy:2205 > {code} > Then, `LifecycleTransaction.waitForDeletions()` in line 569 is > {code:java} > /** > * Deletions run on the nonPeriodicTasks executor, (both failedDeletions > or global tidiers in SSTableReader) > * so by scheduling a new empty task and waiting for it we ensure any > prior deletion has completed. > */ > public static void waitForDeletions() > { > LogTransaction.waitForDeletions(); > } > {code} > And then call `waitForDeletions` in `LogTransaction`: > {code:java} > static void waitForDeletions() > { > > FBUtilities.waitOnFuture(ScheduledExecutors.nonPeriodicTasks.schedule(Runnables.doNothing(), > 0, TimeUnit.MILLISECONDS)); > } > {code} > From the comments, we think it ensures that all existing tasks in > `nonPeriodicTasks` are drained. However, we found some tidier tasks are still > running in `nonPeriodicTasks` thread pool. > We suspect that those tidier tasks should be guaranteed to finish during > server setup, because of its exception handling. In version 3.11.10, these > tidier tasks are submitted to `nonPeriodicTasks` in > `SSTableReader$InstanceTidier#tidy:2205`, and have the exception handling > `FileUtils.handleFSErrorAndPropagate(new FSWriteError(e, file))` (within the > call stack `SSTableReader$InstanceTidier$1#run:2223` => > `LogTransaction$SSTableTidier#run:386` => `LogTransaction#delete:261`). > The `FileUtils.handleFSErrorAndPropagate` handles this `FSWriteError`. We > found that it checks the `CassandraDaemon.setupCompleted` flag in call stack > within (`FileUtils#handleFSErrorAndPropagate:507` => > `JVMStabilityInspector#inspectThrowable:60` => > `JVMStabilityInspector#inspectThrowable:106` => > `JVMStabilityInspector#inspectDiskError:73` => `FileUtils#handleFSError:494` > => `DefaultFSErrorHandler:hand
[jira] [Commented] (CASSANDRA-17570) Update the CQL version for the 4.1 release
[ https://issues.apache.org/jira/browse/CASSANDRA-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525668#comment-17525668 ] Brandon Williams commented on CASSANDRA-17570: -- bq. we should probably consider removing the CQL.textile file. This is also probably a good idea because it's the only file in that format in the repo (it's 10 years old) > Update the CQL version for the 4.1 release > -- > > Key: CASSANDRA-17570 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17570 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Syntax >Reporter: Benjamin Lerer >Priority: Normal > Fix For: 4.1 > > > We made several changes to CQL during that version. We need to document those > changes in the {{CQL.textile}} file and update the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11871) Allow to aggregate by time intervals
[ https://issues.apache.org/jira/browse/CASSANDRA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525667#comment-17525667 ] Andres de la Peña commented on CASSANDRA-11871: --- One last detail, we should probably add an entry on {{NEWS.txt}}. > Allow to aggregate by time intervals > > > Key: CASSANDRA-11871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11871 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer >Priority: Normal > Fix For: 4.x > > Time Spent: 3h 20m > Remaining Estimate: 0h > > For time series data it can be usefull to aggregate by time intervals. > The idea would be to add support for one or several functions in the {{GROUP > BY}} clause. > Regarding the implementation, even if in general I also prefer to follow the > SQL syntax, I do not believe it will be a good fit for Cassandra. > If we have a table like: > {code} > CREATE TABLE trades > { > symbol text, > date date, > time time, > priceMantissa int, > priceExponent tinyint, > volume int, > PRIMARY KEY ((symbol, date), time) > }; > {code} > The trades will be inserted with an increasing time and sorted in the same > order. As we can have to process a large amount of data, we want to try to > limit ourself to the cases where we can build the groups on the flight (which > is not a requirement in the SQL world). > If we want to get the number of trades per minutes with the SQL syntax we > will have to write: > {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' > AND date = '2016-01-11' GROUP BY hour(time), minute(time);}} > which is fine. The problem is that if the user invert by mistake the > functions like that: > {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' > AND date = '2016-01-11' GROUP BY minute(time), hour(time);}} > the query will return weird results. > The only way to prevent that would be to check the function order and make > sure that we do not allow to skip functions (e.g. {{GROUP BY hour(time), > second(time)}}). > In my opinion a function like {{floor(, )}} will be > much better as it does not allow for this type of mistakes and is much more > flexible (you can create 5 minutes buckets if you want to). > {{SELECT floor(time, m), count() FROM Trades WHERE symbol = 'AAPL' AND date = > '2016-01-11' GROUP BY floor(time, m);}} > An important aspect to keep in mind with a function like {{floor}} is the > starting point. For a query like: {{SELECT floor(time, m), count() FROM > Trades WHERE symbol = 'AAPL' AND date = '2016-01-11' AND time >= '01:30:00' > AND time =< '07:30:00' GROUP BY floor(time, 2h);}}, I think that ideally the > result should return 3 groups: {{01:30:00}}, {{03:30:00}} and {{05:30:00}}. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17570) Update the CQL version for the 4.1 release
[ https://issues.apache.org/jira/browse/CASSANDRA-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525665#comment-17525665 ] Benjamin Lerer commented on CASSANDRA-17570: Thanks [~brandon.williams]. It seems that we have now 2 sources for the CQL the CQL.textile file and the documentation and none of them is accurate. The doc seems nevertheless better so we should probably consider removing the {{CQL.textile}} file. The CQL version need to be set to 3.4.6 and the change for that version will need to be mentioned. > Update the CQL version for the 4.1 release > -- > > Key: CASSANDRA-17570 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17570 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Syntax >Reporter: Benjamin Lerer >Priority: Normal > Fix For: 4.1 > > > We made several changes to CQL during that version. We need to document those > changes in the {{CQL.textile}} file and update the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17212) Migrate threshold for minimum keyspace replication factor to guardrails
[ https://issues.apache.org/jira/browse/CASSANDRA-17212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525662#comment-17525662 ] Andres de la Peña commented on CASSANDRA-17212: --- [~jmckenzie] I'm not sure we want to always exclude system keyspaces. For example, we probably want to apply the guardrails for restrictions on IN queries even when querying system tables, since those queries can be quite harmful (see CASSANDRA-17187 and CASSANDRA-17186). I guess that our main reason to exclude system keyspaces in cases such as the guardrail for disabling {{ALLOW FILTERING}} is that drivers might internally use the guarded queries for doing their thing. > Migrate threshold for minimum keyspace replication factor to guardrails > --- > > Key: CASSANDRA-17212 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17212 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Andres de la Peña >Assignee: Savni Nagarkar >Priority: Normal > Fix For: 4.x > > > The config property > [{{minimum_keyspace_rf}}|https://github.com/apache/cassandra/blob/5fdadb25f95099b8945d9d9ee11d3e380d3867f4/conf/cassandra.yaml] > that was added by CASSANDRA-14557 can be migrated to guardrails, for example: > {code} > guardrails: > ... > replication_factor: > warn_threshold: 2 > abort_threshold: 3 > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17370) Add flag enabling operators to restrict use of ALLOW FILTERING in queries
[ https://issues.apache.org/jira/browse/CASSANDRA-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525651#comment-17525651 ] Andres de la Peña commented on CASSANDRA-17370: --- Looks good to me, +1 > Add flag enabling operators to restrict use of ALLOW FILTERING in queries > - > > Key: CASSANDRA-17370 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17370 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Semantics, Feature/Guardrails >Reporter: Savni Nagarkar >Assignee: Savni Nagarkar >Priority: Normal > Fix For: 4.x > > Time Spent: 3h 20m > Remaining Estimate: 0h > > This ticket adds the ability for operators to disallow use of ALLOW FILTERING > predicates in CQL SELECT statements. As queries that ALLOW FILTERING can > place additional load on the database, the flag enables operators to provide > tighter bounds on performance guarantees. The patch includes a new yaml > property, as well as a hot property enabling the value to be modified via JMX > at runtime. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17570) Update the CQL version for the 4.1 release
[ https://issues.apache.org/jira/browse/CASSANDRA-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525650#comment-17525650 ] Brandon Williams commented on CASSANDRA-17570: -- Just fyi, cqlsh was bumped for this in CASSANDRA-17432. > Update the CQL version for the 4.1 release > -- > > Key: CASSANDRA-17570 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17570 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Syntax >Reporter: Benjamin Lerer >Priority: Normal > Fix For: 4.1 > > > We made several changes to CQL during that version. We need to document those > changes in the {{CQL.textile}} file and update the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15510) BTree: Improve Building, Inserting and Transforming
[ https://issues.apache.org/jira/browse/CASSANDRA-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525645#comment-17525645 ] Benjamin Lerer commented on CASSANDRA-15510: Sorry, I was focussing on porting CASSANDRA-15511. I will merge the changes and run CI. If I remember correctly I think that the patch broke some tests. We also need to run CI on Jenkins as I do not think that CircleCi can run the burn tests. > BTree: Improve Building, Inserting and Transforming > --- > > Key: CASSANDRA-15510 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15510 > Project: Cassandra > Issue Type: Improvement > Components: Local/Other >Reporter: Benedict Elliott Smith >Assignee: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0.x, 4.x > > Time Spent: 10h > Remaining Estimate: 0h > > This work was originally undertaken as a follow-up to CASSANDRA-15367 to > ensure performance is strictly improved, but it may no longer be needed for > that purpose. It’s still hugely impactful, however. It remains to be > decided where this should land. > The current {{BTree}} implementation is suboptimal in a number of ways, with > very little focus having been given to its performance besides its > memory-occupancy. This patch aims to address that, specifically improving > the performance and allocations involved in: building, transforming and > inserting into a tree. > To facilitate this work, the {{BTree}} definition is modified slightly, so > that we can perform some simple arithmetic on tree sizes. Specifically, > trees of depth n are defined to have a maximum capacity of {{branchFactor^n - > 1}}, which translates into capping the number of leaf children at > {{branchFactor-1}}, as opposed to {{branchFactor}}. Since {{branchFactor}} > is a power of 2, this permits fast tree size arithmetic, enabling some of > these changes. > h2. Building > The static build method has been modified to utilise dedicated > {{buildPerfect}} methods that build either perfectly dense or perfectly > sparse sub-trees. These perfect trees all share their {{sizeMap}} with each > other, and can be built more efficiently than trees of arbitrary size. The > specifics are described in detail in the comments, but this building block > can be used to construct trees of any size, using at most one child at each > level that is not either perfectly sparse or perfectly dense. Bulk methods > are used where possible. > For large trees this can produce up to 30x throughput improvement and 30% > allocation reduction vs 3.0 (TBC, and to be tested vs 4.0). > {{FastBuilder}} is introduced for building a tree in-order (or in reverse) > without duplicate elements to resolve, without necessarily knowing the size > upfront. This meets the needs of most use cases. Data is built directly > into nodes, with up to one already-constructed node, and one partially > constructed node, on each level, being mutated to share their contents in the > event of insufficient data to populate the tree. These builders are > thread-locally shared. These leads to minimal copying, the same sharing of > {{sizeMap}} as above, zero wasted allocations, and results in minimal > difference in performance between utilising the less-ergonomic static build > and builder approach. > For large trees this leads to ~4.5x throughput improvement, and 70% reduction > in allocations vs a normal Builder. For small trees performance is > comparable, but allocations similarly reduced. > h2. Inserting > It turns out that we only ever insert another tree into a tree, so we exploit > this to implement an efficient union of two trees, operating on them directly > via stacks in the transformer, instead of via a collection interface. A > builder-like object is introduced that shares functionality with > {{FastBuilder}}, and permits us to build the result of the union directly > into the final nodes, reusing as much of the original trees as possible. > Bulk methods are used where possible. > The result is not _uniformly_ faster, but is _significantly_ faster on > average: median _improvement_ of 1.4x (that is, 2.4x total throughput), mean > improvement of 10x. Worst reduction is 30%, and it may be that we can > isolate and alleviate that. Allocations are also reduced significantly, with > a median of 30% and mean of 42% for the tested workloads. As the trees get > larger the improvement drops, but remains uniformly lower. > h2. Transforming > Transformations garbage overhead is minimal, i.e. the main allocations are > those necessary to represent the new tree. It is significantly faster and > particularly more efficient when removing elements, utilising the shared > functionality of th
[jira] [Commented] (CASSANDRA-17560) Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long
[ https://issues.apache.org/jira/browse/CASSANDRA-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525644#comment-17525644 ] Andres de la Peña commented on CASSANDRA-17560: --- Looks good to me, I have just added a comment about how we log the updating of properties, it can be addressed on commit. > Migrate track_warnings to more standard naming conventions and use latest > configuration types rather than long > -- > > Key: CASSANDRA-17560 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17560 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.1 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Track warnings currently is nested which is discouraged at the moment. It > also was before the config standards patch which moved storage typed longs to > a new DataStorageSpec type, we should migrate the configs there. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17565) Fix test_parallel_upgrade_with_internode_ssl
[ https://issues.apache.org/jira/browse/CASSANDRA-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525641#comment-17525641 ] Brandon Williams commented on CASSANDRA-17565: -- bq. but that didn't work, so ignore this. To clarify, there are 7 failures in that run, but 6 were git errors and one was legit. Not trusting the results, I did another [4000 runs|https://app.circleci.com/pipelines/github/driftx/cassandra/443/workflows/8e7a307a-4a13-4c00-ab45-ca65b48ac602/jobs/5184] and got one failure again...however, examining the line number, that has to be from the 4.0 side, and indeed it needs the same patch. But now the question is, can the upgrade test be run with both a custom 4.0 and trunk branch? If not, perhaps this is enough to commit the trunk side, and then we can run 4k with a custom 4.0 branch against it, which should prove out the whole thing. > Fix test_parallel_upgrade_with_internode_ssl > > > Key: CASSANDRA-17565 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17565 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.x > > > While working on CASSANDRA-17341 I hit this flaky test, very rarely failing > but it is failing on trunk. > More info in this CI run: > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/1563/workflows/61bda0b7-f699-4897-877f-c7d523a03127/jobs/10318 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17568) Implement nodetool command to list data directories of existing tables
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-17568: -- Summary: Implement nodetool command to list data directories of existing tables (was: Tool to list data directories) > Implement nodetool command to list data directories of existing tables > -- > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > Time Spent: 5h > Remaining Estimate: 0h > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17568) Tool to list data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-17568: -- Status: Changes Suggested (was: Review In Progress) I have did the first pass and I wait for author's feedback. > Tool to list data directories > - > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > Time Spent: 5h > Remaining Estimate: 0h > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17568) Tool to list data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-17568: -- Test and Documentation Plan: unit tests Status: Patch Available (was: In Progress) https://github.com/apache/cassandra/pull/1580 > Tool to list data directories > - > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > Time Spent: 5h > Remaining Estimate: 0h > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17568) Tool to list data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-17568: -- Reviewers: Stefan Miklosovic Status: Review In Progress (was: Patch Available) > Tool to list data directories > - > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > Time Spent: 5h > Remaining Estimate: 0h > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17563) Fix CircleCI Midres config
[ https://issues.apache.org/jira/browse/CASSANDRA-17563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525606#comment-17525606 ] Berenguer Blasi commented on CASSANDRA-17563: - [~dcapwell] I am in the middle of sthg but I will try to look into this asap > Fix CircleCI Midres config > -- > > Key: CASSANDRA-17563 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17563 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.1 > > > During CircleCI addition of a new job to the config, the midres file got > messy. Two of the immediate issues (but we need to verify all jobs will use > the right executors and resources): > * the new job needs to use higher parallelism as the original in-jvm job > * j8_dtests_with_vnodes should get from midres 50 large but currently > midres makes it run with 25 and medium which fails around 100 tests -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17568) Tool to list data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525598#comment-17525598 ] Stefan Miklosovic commented on CASSANDRA-17568: --- I did first more serious pass on the PR. I would love to have all issues addressed and we can consider more seriously what to do with it next. > Tool to list data directories > - > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > Time Spent: 4h > Remaining Estimate: 0h > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11871) Allow to aggregate by time intervals
[ https://issues.apache.org/jira/browse/CASSANDRA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525593#comment-17525593 ] Benjamin Lerer commented on CASSANDRA-11871: [~yifanc] Your comment about the CQL.textile file made me realized that we need to upgrade the CQL version for the next release and make sure that all the CQL changes are mentioned in the CQL version change. I opened CASSANDRA-17570 for that. I addressed your comments :-) > Allow to aggregate by time intervals > > > Key: CASSANDRA-11871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11871 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer >Priority: Normal > Fix For: 4.x > > Time Spent: 3h 20m > Remaining Estimate: 0h > > For time series data it can be usefull to aggregate by time intervals. > The idea would be to add support for one or several functions in the {{GROUP > BY}} clause. > Regarding the implementation, even if in general I also prefer to follow the > SQL syntax, I do not believe it will be a good fit for Cassandra. > If we have a table like: > {code} > CREATE TABLE trades > { > symbol text, > date date, > time time, > priceMantissa int, > priceExponent tinyint, > volume int, > PRIMARY KEY ((symbol, date), time) > }; > {code} > The trades will be inserted with an increasing time and sorted in the same > order. As we can have to process a large amount of data, we want to try to > limit ourself to the cases where we can build the groups on the flight (which > is not a requirement in the SQL world). > If we want to get the number of trades per minutes with the SQL syntax we > will have to write: > {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' > AND date = '2016-01-11' GROUP BY hour(time), minute(time);}} > which is fine. The problem is that if the user invert by mistake the > functions like that: > {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' > AND date = '2016-01-11' GROUP BY minute(time), hour(time);}} > the query will return weird results. > The only way to prevent that would be to check the function order and make > sure that we do not allow to skip functions (e.g. {{GROUP BY hour(time), > second(time)}}). > In my opinion a function like {{floor(, )}} will be > much better as it does not allow for this type of mistakes and is much more > flexible (you can create 5 minutes buckets if you want to). > {{SELECT floor(time, m), count() FROM Trades WHERE symbol = 'AAPL' AND date = > '2016-01-11' GROUP BY floor(time, m);}} > An important aspect to keep in mind with a function like {{floor}} is the > starting point. For a query like: {{SELECT floor(time, m), count() FROM > Trades WHERE symbol = 'AAPL' AND date = '2016-01-11' AND time >= '01:30:00' > AND time =< '07:30:00' GROUP BY floor(time, 2h);}}, I think that ideally the > result should return 3 groups: {{01:30:00}}, {{03:30:00}} and {{05:30:00}}. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17570) Update the CQL version for the 4.1 release
[ https://issues.apache.org/jira/browse/CASSANDRA-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-17570: --- Change Category: Semantic Complexity: Low Hanging Fruit Status: Open (was: Triage Needed) > Update the CQL version for the 4.1 release > -- > > Key: CASSANDRA-17570 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17570 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Syntax >Reporter: Benjamin Lerer >Priority: Normal > Fix For: 4.1 > > > We made several changes to CQL during that version. We need to document those > changes in the {{CQL.textile}} file and update the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-17570) Update the CQL version for the 4.1 release
Benjamin Lerer created CASSANDRA-17570: -- Summary: Update the CQL version for the 4.1 release Key: CASSANDRA-17570 URL: https://issues.apache.org/jira/browse/CASSANDRA-17570 Project: Cassandra Issue Type: Improvement Components: CQL/Syntax Reporter: Benjamin Lerer We made several changes to CQL during that version. We need to document those changes in the {{CQL.textile}} file and update the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17570) Update the CQL version for the 4.1 release
[ https://issues.apache.org/jira/browse/CASSANDRA-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-17570: --- Fix Version/s: 4.1 > Update the CQL version for the 4.1 release > -- > > Key: CASSANDRA-17570 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17570 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Syntax >Reporter: Benjamin Lerer >Priority: Normal > Fix For: 4.1 > > > We made several changes to CQL during that version. We need to document those > changes in the {{CQL.textile}} file and update the version. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org