date:20220421

[jira] [Comment Edited] (CASSANDRA-16983) Separating CQLSH credentials from the cqlshrc file

2022-04-21 Thread Brian Houser (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526184#comment-17526184
 ] 

Brian Houser edited comment on CASSANDRA-16983 at 4/22/22 3:37 AM:
---

I think we agreed to make some minor changes to this (plain_text_auth) in 
credentials working with the new custom loading system

see
https://issues.apache.org/jira/browse/CASSANDRA-16456


was (Author: bhouser):
I think we agreed to make some minor changes to this (plain_text_auth) in 
credentials working with the new custom loading system

> Separating CQLSH credentials from the cqlshrc file
> --
>
> Key: CASSANDRA-16983
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16983
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/cqlsh
>Reporter: Bowen Song
>Assignee: Bowen Song
>Priority: Normal
>  Labels: lhf
> Fix For: 4.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently, the CQLSH tool accepts credentials (username & password) from the 
> following 3 places:
> 1. the command line parameter "-p"
> 2. the cqlshrc file
> 3. prompt the user
> This is not ideal.
> Credentials in the command line is a security risk, because it could be see 
> by other users on a shared system.
> The cqlshrc file is better, but still not good enough. Because the cqlshrc 
> file is a config file,  it's often acceptable to have it as a world readable 
> file, and share it with other users. It also prevents user from having 
> multiple sets of credentials, either for the same Cassandra cluster or 
> different clusters.
> To improve the security of CQLSH and make it secure by design, I purpose the 
> following changes:
> * Warn the user if a password is giving in the command line, and recommend 
> them to use a credential file instead
> * Warn the user if credentials are present in the cqlshrc file and the 
> cqlshrc file is not secure (e.g.: world readable or owned by a different user)
> * Deprecate credentials in the cqlshrc, and recommend the user to move them 
> to a separate credential file. The aim is to not break anything at the 
> moment, but eventually stop accepting credentials from the cqlshrc file.
> * Reject the credentials file if it's not secure, and tell the user how to 
> secure it. Optionally, prompt the user for password if it's an interactive 
> session. (Think how does OpenSSH handle insecure credential files)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16983) Separating CQLSH credentials from the cqlshrc file

2022-04-21 Thread Brian Houser (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526184#comment-17526184
 ] 

Brian Houser commented on CASSANDRA-16983:
--

I think we agreed to make some minor changes to this (plain_text_auth) in 
credentials working with the new custom loading system

> Separating CQLSH credentials from the cqlshrc file
> --
>
> Key: CASSANDRA-16983
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16983
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/cqlsh
>Reporter: Bowen Song
>Assignee: Bowen Song
>Priority: Normal
>  Labels: lhf
> Fix For: 4.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently, the CQLSH tool accepts credentials (username & password) from the 
> following 3 places:
> 1. the command line parameter "-p"
> 2. the cqlshrc file
> 3. prompt the user
> This is not ideal.
> Credentials in the command line is a security risk, because it could be see 
> by other users on a shared system.
> The cqlshrc file is better, but still not good enough. Because the cqlshrc 
> file is a config file,  it's often acceptable to have it as a world readable 
> file, and share it with other users. It also prevents user from having 
> multiple sets of credentials, either for the same Cassandra cluster or 
> different clusters.
> To improve the security of CQLSH and make it secure by design, I purpose the 
> following changes:
> * Warn the user if a password is giving in the command line, and recommend 
> them to use a credential file instead
> * Warn the user if credentials are present in the cqlshrc file and the 
> cqlshrc file is not secure (e.g.: world readable or owned by a different user)
> * Deprecate credentials in the cqlshrc, and recommend the user to move them 
> to a separate credential file. The aim is to not break anything at the 
> moment, but eventually stop accepting credentials from the cqlshrc file.
> * Reject the credentials file if it's not secure, and tell the user how to 
> secure it. Optionally, prompt the user for password if it's an interactive 
> session. (Think how does OpenSSH handle insecure credential files)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra-website] branch asf-staging updated (015350e5 -> 4cb38fd9)

2022-04-21 Thread git-site-role

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a change to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git


 discard 015350e5 generate docs for 8fd077a6
 new 4cb38fd9 generate docs for 8fd077a6

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (015350e5)
\
 N -- N -- N   refs/heads/asf-staging (4cb38fd9)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../cassandra/configuration/cass_yaml_file.html|  84 +++--
 .../cassandra/configuration/cass_yaml_file.html|  84 +++--
 .../cassandra/configuration/cass_yaml_file.html|  84 +++--
 content/search-index.js|   2 +-
 site-ui/build/ui-bundle.zip| Bin 4740078 -> 4740078 
bytes
 5 files changed, 187 insertions(+), 67 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17571) Config upper bound should be handled earlier

2022-04-21 Thread Ekaterina Dimitrova (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526150#comment-17526150
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-17571 at 4/22/22 1:51 AM:
--

Prototype in this [commit 
|https://github.com/ekaterinadimitrova2/cassandra/commit/1ab9f32ef34402a0f74036d768a22449170052b6]
 - only a few parameters were migrated for test purposes and to see how it will 
look like.
Also, I will split in separate commits the parameters in groups on migration 
with attached tests to them and CI to be sure gradually nothing Is missed but I 
want to confirm that the approach is still what we want. CC [~adelapena] in 
case he has time to provide input.

Currently if people provide the new config with the new format we handle the 
former int parameters by returning cast value from their getters,  but on 
startup the user might set a bigger long value and think wrongly that one will 
be used when in practice the Integer.MAX_VALUE will be used. We need just to 
fail the user they can't set that big value, mimic the behavior of when they 
provide old value bigger than int. We also limit with these classes that people 
cannot set anything that will overflow during conversion to the smallest 
allowed unit instead of setting MAX_VALUE silently. 


was (Author: e.dimitrova):
Prototype in this [commit 
|https://github.com/ekaterinadimitrova2/cassandra/commit/1ab9f32ef34402a0f74036d768a22449170052b6]
 - only a few parameters were migrated for test purposes and to see how it will 
look like.
Also, I will split in separate commits the parameters in groups on migration 
with attached tests to them and CI to be sure gradually nothing Is missed but I 
want to confirm that the approach is still what we want. CC [~adelapena] in 
case he has time to provide input.

Currently if people provide the new config with the new format we handle the 
former int parameters by returning cast value from their getters,  but on 
startup the user might set a bigger long value and think wrongly that one will 
be used when in practice the Integer.MAX_VALUE will be used. We need just to 
fail the user they can't set that big value, mimic the behavior of when they 
provide old value bigger than int. We also limit with these classes that people 
cannot set anything that will overflow during conversion to the smallest 
allowed unit instead of setting MAX_VALUE silently. 

 

 

> Config upper bound should be handled earlier
> 
>
> Key: CASSANDRA-17571
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17571
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.1
>
>
> Config upper bound should be handled on startup/config setup and not during 
> conversion



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17571) Config upper bound should be handled earlier

2022-04-21 Thread Ekaterina Dimitrova (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526150#comment-17526150
 ] 

Ekaterina Dimitrova commented on CASSANDRA-17571:
-

Prototype in this [commit 
|https://github.com/ekaterinadimitrova2/cassandra/commit/1ab9f32ef34402a0f74036d768a22449170052b6]
 - only a few parameters were migrated for test purposes and to see how it will 
look like.
Also, I will split in separate commits the parameters in groups on migration 
with attached tests to them and CI to be sure gradually nothing Is missed but I 
want to confirm that the approach is still what we want. CC [~adelapena] in 
case he has time to provide input.

Currently if people provide the new config with the new format we handle the 
former int parameters by returning cast value from their getters,  but on 
startup the user might set a bigger long value and think wrongly that one will 
be used when in practice the Integer.MAX_VALUE will be used. We need just to 
fail the user they can't set that big value, mimic the behavior of when they 
provide old value bigger than int. We also limit with these classes that people 
cannot set anything that will overflow during conversion to the smallest 
allowed unit instead of setting MAX_VALUE silently. 

 

 

> Config upper bound should be handled earlier
> 
>
> Key: CASSANDRA-17571
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17571
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.1
>
>
> Config upper bound should be handled on startup/config setup and not during 
> conversion



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16456) Add Plugin Support for CQLSH

2022-04-21 Thread Brian Houser (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526146#comment-17526146
 ] 

Brian Houser commented on CASSANDRA-16456:
--

Ok... cool.  I think we've finally cracked the desired behavior. I'm going to 
go ahead and write it out the spec here.  Implementing this should be quick.
 * in the cqlshrc file you can list an Auth_provider section, and specify a 
module and class name.  if you do than we will dynamically load that class 
using the remaining properties in the auth_provider section as well as the 
properties found in credentials under that class name.
 * If you don't provide an auth_provider module and class name, we will assume 
you specified the PlainTextAuthProvider. 
 * You can provide a user name and a password on the command line.  if you do, 
these two properties will be passed to whatever auth provider is specified, and 
will override any other username and password provided in the credentials or 
other file
 * you can provide a user name and password under the Authentication section.  
If you do, those properties will be passed to whatever auth_provider specified 
and will override any other specification of username and password in 
credentials or cqlshrc file.
 * Any properties in credentials file will override the properties in the 
auth_provider section of the cqlshrc file.
 * If you are using the PlainTextAuthProvider and only provide username, you 
will be prompted for a password.

I'll implement the above and add tests for the behavior.  Please let me know if 
this spec isn't accurate.

> Add Plugin Support for CQLSH
> 
>
> Key: CASSANDRA-16456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16456
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/cqlsh
>Reporter: Brian Houser
>Assignee: Brian Houser
>Priority: Normal
>  Labels: gsoc2021, mentor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently the Cassandra drivers offer a plugin authenticator architecture for 
> the support of different authentication methods. This has been leveraged to 
> provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, 
> cqlsh, the included CLI tool, does not offer such support. Switching to a new 
> enhanced authentication scheme thus means being cut off from using cqlsh in 
> normal operation.
> We should have a means of using the same plugins and authentication providers 
> as the Python Cassandra driver.
> Here's a link to an initial draft of 
> [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data

2022-04-21 Thread Paulo Motta (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-17180:

Status: Open  (was: Patch Available)

> Implement startup check to prevent Cassandra start to spread zombie data
> 
>
> Key: CASSANDRA-17180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17180
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Legacy/Observability
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> As already discussed on ML, it would be nice to have a service which would 
> periodically write timestamp to a file signalling it is up / running.
> Then, on the startup, we would read this file and we would determine if there 
> is some table which gc grace is behind this time and we would fail the start 
> so we would prevent zombie data to be likely spread around a cluster.
> https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data

2022-04-21 Thread Paulo Motta (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526115#comment-17526115
 ] 

Paulo Motta commented on CASSANDRA-17180:
-

bq. After spending more time on this, I identified an issue

Nice catch!

bq.  I have not detected this by my unit tests because I was, more or less, 
mocking it but once I actually tried it on the running node, to my surprise it 
was not detecting the tables which should be causing violations.

Can we create a (in-jvm or python) dtest to ensure this is being properly 
tested and any future regressions caught?

bq. I think it is viable to do via "SchemaKeyspace.fetchNonSystemKeyspaces()". 

Sounds good to me.

bq. I am not sure I can make this method publicly visible without any 
conseqencies yet.

I think this should be fine.

bq. On the other hand, it will check tables in "system_distributed" as well as 
"system_auth". These tables do not have gc = 0 and they are not excluded from 
fetchNonSystemKeyspaces call.

that's ok, it's probably a good idea to check these tables anyway.

> Implement startup check to prevent Cassandra start to spread zombie data
> 
>
> Key: CASSANDRA-17180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17180
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Legacy/Observability
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> As already discussed on ML, it would be nice to have a service which would 
> periodically write timestamp to a file signalling it is up / running.
> Then, on the startup, we would read this file and we would determine if there 
> is some table which gc grace is behind this time and we would fail the start 
> so we would prevent zombie data to be likely spread around a cluster.
> https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17568) Implement nodetool command to list data directories of existing tables

2022-04-21 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526109#comment-17526109
 ] 

Stefan Miklosovic commented on CASSANDRA-17568:
---

Thats exactly right. Server should process it all when its file-system is 
involved, indeed. I made a mistake here not detecting it does not make a lot of 
sense as I was thinking too much "locally".

> Implement nodetool command to list data directories of existing tables
> --
>
> Key: CASSANDRA-17568
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17568
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/nodetool
>Reporter: Tibor Repasi
>Assignee: Tibor Repasi
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> When a table is created, dropped and re-created with the same name, 
> directories remain within data paths. Operators may be challenged finding out 
> which directories belong to existing tables and which may be subject to 
> removal. However, the information is available in CQL as well as in MBeans 
> via JMX, a convenient access to this information is still missing.
> My proposal is a new nodetool subcommand allowing to list data paths of all 
> existing tables.
> {code}
> % bin/nodetool datapaths -- example
> Keyspace : example
>   Table : test
>   Paths :
>   
> /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17537) nodetool compact should support using a key string to find the range to avoid operators having to manually do this

2022-04-21 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526107#comment-17526107
 ] 

David Capwell commented on CASSANDRA-17537:
---

ok so that new assert breaks things, so canceling the commit

https://app.circleci.com/pipelines/github/dcapwell/cassandra/1389/workflows/b987634e-a680-4b3f-bee7-e7e11e8e4b29/jobs/11372

{code}
Attempted to force compact 
[BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-226-big-Data.db'),
 
BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-216-big-Data.db'),
 
BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-214-big-Data.db'),
 
BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-228-big-Data.db'),
 
BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-218-big-Data.db'),
 
BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-224-big-Data.db'),
 
BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-222-big-Data.db'),
 
BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-220-big-Data.db'),
 
BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-210-big-Data.db'),
 
BigTableReader(path='/tmp/cassandra/build/test/cassandra/data/LeveledCompactionStrategyTest/StandardLeveled-4892b7b0c1bc11ecbcb76d3ce7d88979/nb-212-big-Data.db')],
 but predicate does not include
{code}

> nodetool compact should support using a key string to find the range to avoid 
> operators having to manually do this
> --
>
> Key: CASSANDRA-17537
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17537
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Local/Compaction, Tool/nodetool
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Its common that a single key needs to be compact, and operators need to do 
> the following
> 1) go from key -> token
> 2) generate range
> 3) call nodetool compact with this range
> We can simply this workflow by adding this to compact
> nodetool compact ks.tbl -k “key1"



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data

2022-04-21 Thread Paulo Motta (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526105#comment-17526105
 ] 

Paulo Motta edited comment on CASSANDRA-17180 at 4/21/22 10:01 PM:
---

Thanks for addressing initial comments. Finally found some time to look into 
this more deeply. Please find some follow-up comments below:
 * I think safety checks should be enabled by default, as long as people can 
disable it easily. Should we make this startup check enabled by default? We 
could improve the error message when the check fails to mention the properties 
to disable the check 
({{{}startup_checks.check_data_resurrection.enabled=false{}}}) or ignore 
specific keyspace/tables ({{{}excluded_tables{}}}/{{{}excluded_keyspaces{}}})?

 * I didn't like [check-specific 
logic|https://github.com/apache/cassandra/pull/1351/files#diff-957f2fa6365cb92f19b74347fee7a9f310a07e32c3112f35196dc17462ec7269R511]
 on CassandraDaemon to schedule the heartbeat. I implemented this 
[suggestion|https://github.com/apache/cassandra/commit/0b3557dd43255538942a86f63dec4c36272f25e9]
 to move check post-action to StartupCheck class - what do you think?

 * Can we rename {{GcGraceSecondsOnStartupCheck}} class to 
{{CheckDataResurrection}} to be consistent with the check name ?

 * Can we make the default heartbeat file be stored on the storage directory 
(ie. {{{}DD.getLocalSystemKeyspacesDataFileLocations(){}}}) ? In some 
deployments the cassandra directory is non-writable.

 * I don't like adding [custom 
logic|https://github.com/apache/cassandra/pull/1351/files#diff-f375982492d2426d26da68e105a44d397568be76361e8156fe299e875b8041ffR214]
 to read/write the hearbeat file - since this is error-prone and we're just 
interested in the timestamp value and not the file format. Can we just use 
[File.setLastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#setLastModified(long)]
 and 
[File.lastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#lastModified()]
 to read/write the heartbeat instead?


was (Author: paulo):
Thanks for addressing initial comments. Finally found some time to look into 
this more deeply. Please find some follow-up comments below:

* I think safety checks should be enabled by default, as long as people can 
disable it easily. Should we make this startup check enabled by default? We 
could improve the error message when the check fails to mention the properties 
to disable the check ({{startup_checks.check_data_resurrection.enabled=false}}) 
or ignore specific keyspace/tables ({{excluded_tables}}/{{excluded_keyspaces}})?

* I didn't like check-specific logic on CassandraDaemon to schedule the 
heartbeat. I implemented this 
[suggestion|https://github.com/apache/cassandra/commit/0b3557dd43255538942a86f63dec4c36272f25e9]
 to move check post-action to StartupCheck class - what do you think?

* Can we rename {{GcGraceSecondsOnStartupCheck}} class to 
{{CheckDataResurrection}} to be consistent with the check name ?

* Can we make the default heartbeat file be stored on the storage directory 
(ie. {{DD.getLocalSystemKeyspacesDataFileLocations()}}) ? In some deployments 
the cassandra directory is non-writable.

* I don't like adding [custom 
logic|https://github.com/apache/cassandra/pull/1351/files#diff-f375982492d2426d26da68e105a44d397568be76361e8156fe299e875b8041ffR214]
 to read/write the hearbeat file - since this is error-prone and we're just 
interested in the timestamp value and not the file format. Can we just use 
[File.setLastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#setLastModified(long)]
 and 
[File.lastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#lastModified()]
 to read/write the heartbeat instead?

> Implement startup check to prevent Cassandra start to spread zombie data
> 
>
> Key: CASSANDRA-17180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17180
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Legacy/Observability
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> As already discussed on ML, it would be nice to have a service which would 
> periodically write timestamp to a file signalling it is up / running.
> Then, on the startup, we would read this file and we would determine if there 
> is some table which gc grace is behind this time and we would fail the start 
> so we would prevent zombie data to be likely spread around a cluster.
> https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data

2022-04-21 Thread Paulo Motta (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526105#comment-17526105
 ] 

Paulo Motta commented on CASSANDRA-17180:
-

Thanks for addressing initial comments. Finally found some time to look into 
this more deeply. Please find some follow-up comments below:

* I think safety checks should be enabled by default, as long as people can 
disable it easily. Should we make this startup check enabled by default? We 
could improve the error message when the check fails to mention the properties 
to disable the check ({{startup_checks.check_data_resurrection.enabled=false}}) 
or ignore specific keyspace/tables ({{excluded_tables}}/{{excluded_keyspaces}})?

* I didn't like check-specific logic on CassandraDaemon to schedule the 
heartbeat. I implemented this 
[suggestion|https://github.com/apache/cassandra/commit/0b3557dd43255538942a86f63dec4c36272f25e9]
 to move check post-action to StartupCheck class - what do you think?

* Can we rename {{GcGraceSecondsOnStartupCheck}} class to 
{{CheckDataResurrection}} to be consistent with the check name ?

* Can we make the default heartbeat file be stored on the storage directory 
(ie. {{DD.getLocalSystemKeyspacesDataFileLocations()}}) ? In some deployments 
the cassandra directory is non-writable.

* I don't like adding [custom 
logic|https://github.com/apache/cassandra/pull/1351/files#diff-f375982492d2426d26da68e105a44d397568be76361e8156fe299e875b8041ffR214]
 to read/write the hearbeat file - since this is error-prone and we're just 
interested in the timestamp value and not the file format. Can we just use 
[File.setLastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#setLastModified(long)]
 and 
[File.lastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#lastModified()]
 to read/write the heartbeat instead?

> Implement startup check to prevent Cassandra start to spread zombie data
> 
>
> Key: CASSANDRA-17180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17180
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Legacy/Observability
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> As already discussed on ML, it would be nice to have a service which would 
> periodically write timestamp to a file signalling it is up / running.
> Then, on the startup, we would read this file and we would determine if there 
> is some table which gc grace is behind this time and we would fail the start 
> so we would prevent zombie data to be likely spread around a cluster.
> https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-16456) Add Plugin Support for CQLSH

2022-04-21 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526095#comment-17526095
 ] 

Stefan Miklosovic edited comment on CASSANDRA-16456 at 4/21/22 9:58 PM:


The answer to your very last question is yes. Because you could have an 
auth_provider implementation which is still "username and password-based" but 
it may differ internally. But we should still pass username / password to it 
and it is up to an implementation if it uses these flags or it will just ignore 
them. The implementation may, for example, detect that username / password does 
not make any sense to it and it may act on it (throwing exception or logging) 
but it is up to it solely what it does with it. Username and password just 
happened to be the most commonly used options but they are "just options", as 
any other one and they should passed to that impl.

 

EDIT: what I need to check is that for SASL / GSSAPI, we can indeed instantiate 
that provider with all options user wants, even they are useless for that 
provider. Some providers might be strict and they would throw errors if you set 
it up with a property it does not recognize but I doubt that sasl/gssapi impl 
is done like that. Even if it is true, an user just stops to configure it like 
that.

One detail I would mention is that we should ask for password only in case auth 
provider is plain text one because we are totally sure we need it if it is not 
specified anywhere. For other providers, I would not ask for it.


was (Author: smiklosovic):
The answer to your very last question is yes. Because you could have an 
auth_provider implementation which is still "username and password-based" but 
it may differ internally. But we should still pass username / password to it 
and it is up to an implementation if it uses these flags or it will just ignore 
them. The implementation may, for example, detect that username / password does 
not make any sense to it and it may act on it (throwing exception or logging) 
but it is up to it solely what it does with it. Username and password just 
happened to be the most commonly used options but they are "just options", as 
any other one and they should passed to that impl.

> Add Plugin Support for CQLSH
> 
>
> Key: CASSANDRA-16456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16456
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/cqlsh
>Reporter: Brian Houser
>Assignee: Brian Houser
>Priority: Normal
>  Labels: gsoc2021, mentor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently the Cassandra drivers offer a plugin authenticator architecture for 
> the support of different authentication methods. This has been leveraged to 
> provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, 
> cqlsh, the included CLI tool, does not offer such support. Switching to a new 
> enhanced authentication scheme thus means being cut off from using cqlsh in 
> normal operation.
> We should have a means of using the same plugins and authentication providers 
> as the Python Cassandra driver.
> Here's a link to an initial draft of 
> [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17537) nodetool compact should support using a key string to find the range to avoid operators having to manually do this

2022-04-21 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526100#comment-17526100
 ] 

David Capwell commented on CASSANDRA-17537:
---

Starting commit

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|trunk|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-17537-trunk-A80909E5-5C23-42D0-A279-AFF09B8E92A0]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-17537-trunk-A80909E5-5C23-42D0-A279-AFF09B8E92A0]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/1625/]|


> nodetool compact should support using a key string to find the range to avoid 
> operators having to manually do this
> --
>
> Key: CASSANDRA-17537
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17537
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Local/Compaction, Tool/nodetool
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Its common that a single key needs to be compact, and operators need to do 
> the following
> 1) go from key -> token
> 2) generate range
> 3) call nodetool compact with this range
> We can simply this workflow by adding this to compact
> nodetool compact ks.tbl -k “key1"



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16456) Add Plugin Support for CQLSH

2022-04-21 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526095#comment-17526095
 ] 

Stefan Miklosovic commented on CASSANDRA-16456:
---

The answer to your very last question is yes. Because you could have an 
auth_provider implementation which is still "username and password-based" but 
it may differ internally. But we should still pass username / password to it 
and it is up to an implementation if it uses these flags or it will just ignore 
them. The implementation may, for example, detect that username / password does 
not make any sense to it and it may act on it (throwing exception or logging) 
but it is up to it solely what it does with it. Username and password just 
happened to be the most commonly used options but they are "just options", as 
any other one and they should passed to that impl.

> Add Plugin Support for CQLSH
> 
>
> Key: CASSANDRA-16456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16456
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/cqlsh
>Reporter: Brian Houser
>Assignee: Brian Houser
>Priority: Normal
>  Labels: gsoc2021, mentor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently the Cassandra drivers offer a plugin authenticator architecture for 
> the support of different authentication methods. This has been leveraged to 
> provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, 
> cqlsh, the included CLI tool, does not offer such support. Switching to a new 
> enhanced authentication scheme thus means being cut off from using cqlsh in 
> normal operation.
> We should have a means of using the same plugins and authentication providers 
> as the Python Cassandra driver.
> Here's a link to an initial draft of 
> [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17560) Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long

2022-04-21 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526093#comment-17526093
 ] 

David Capwell commented on CASSANDRA-17560:
---

jvm upgrade test failed due to CASSANDRA-16238 (race condition with fat client 
removal) which got fixed later on

> Migrate track_warnings to more standard naming conventions and use latest 
> configuration types rather than long
> --
>
> Key: CASSANDRA-17560
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17560
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.1
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Track warnings currently is nested which is discouraged at the moment.  It 
> also was before the config standards patch which moved storage typed longs to 
> a new DataStorageSpec type, we should migrate the configs there.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17560) Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long

2022-04-21 Thread David Capwell (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-17560:
--
Source Control Link: 
https://github.com/apache/cassandra/commit/7db3285e7b745e591dc4c405ae9af6c1cddb0c79
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Migrate track_warnings to more standard naming conventions and use latest 
> configuration types rather than long
> --
>
> Key: CASSANDRA-17560
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17560
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.1
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Track warnings currently is nested which is discouraged at the moment.  It 
> also was before the config standards patch which moved storage typed longs to 
> a new DataStorageSpec type, we should migrate the configs there.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch trunk updated: Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long

2022-04-21 Thread dcapwell

This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 7db3285e7b Migrate track_warnings to more standard naming conventions 
and use latest configuration types rather than long
7db3285e7b is described below

commit 7db3285e7b745e591dc4c405ae9af6c1cddb0c79
Author: David Capwell 
AuthorDate: Wed Apr 20 15:15:34 2022 -0700

Migrate track_warnings to more standard naming conventions and use latest 
configuration types rather than long

patch by David Capwell; reviewed by Andres de la Peña, Caleb Rackliffe for 
CASSANDRA-17560
---
 CHANGES.txt|   1 +
 NEWS.txt   |  18 ++--
 build.xml  |   4 +-
 conf/cassandra.yaml|  35 +++
 ide/idea/workspace.xml |   6 +-
 src/java/org/apache/cassandra/config/Config.java   |   8 +-
 .../cassandra/config/DatabaseDescriptor.java   |  82 ++--
 .../org/apache/cassandra/config/TrackWarnings.java | 108 
 .../org/apache/cassandra/cql3/QueryOptions.java|  75 +++---
 .../cassandra/cql3/selection/ResultSetBuilder.java |   9 +-
 .../cassandra/cql3/statements/SelectStatement.java |  14 +--
 src/java/org/apache/cassandra/db/ReadCommand.java  |  24 ++---
 .../org/apache/cassandra/db/RowIndexEntry.java |  31 +++---
 ...=> RowIndexEntryReadSizeTooLargeException.java} |   4 +-
 .../exceptions/TombstoneAbortException.java|   2 +-
 src/java/org/apache/cassandra/net/ParamType.java   |   8 +-
 .../apache/cassandra/service/StorageService.java   |  83 +++-
 .../cassandra/service/StorageServiceMBean.java |  34 +++
 .../cassandra/service/reads/ReadCallback.java  |   6 +-
 .../CoordinatorWarnings.java   |   9 +-
 .../WarnAbortCounter.java  |   2 +-
 .../WarningContext.java|  22 ++---
 .../WarningsSnapshot.java  |  30 +++---
 .../org/apache/cassandra/transport/Dispatcher.java |   2 +-
 test/conf/cassandra.yaml   |  19 ++--
 .../cassandra/distributed/impl/Coordinator.java|   2 +-
 .../cassandra/distributed/impl/Instance.java   |   2 +-
 .../distributed/test/NativeMixedVersionTest.java   |   7 +-
 .../AbstractClientSizeWarning.java |   6 +-
 .../CoordinatorReadSizeWarningTest.java|   7 +-
 .../LocalReadSizeWarningTest.java  |  15 +--
 .../RowIndexSizeWarningTest.java   |  11 ++-
 .../TombstoneCountWarningTest.java |   6 +-
 .../cassandra/config/DatabaseDescriptorTest.java   | 109 -
 .../config/YamlConfigurationLoaderTest.java|  54 +-
 .../WarningsSnapshotTest.java  |   4 +-
 36 files changed, 364 insertions(+), 495 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 0f60d18244..5ab33a229b 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.1
+ * Migrate track_warnings to more standard naming conventions and use latest 
configuration types rather than long (CASSANDRA-17560)
  * Add support for CONTAINS and CONTAINS KEY in conditional UPDATE and DELETE 
statement (CASSANDRA-10537)
  * Migrate advanced config parameters to the new Config types (CASSANDRA-17431)
  * Make null to be meaning disabled and leave 0 as a valid value for 
permissions_update_interval, roles_update_interval, credentials_update_interval 
(CASSANDRA-17431)
diff --git a/NEWS.txt b/NEWS.txt
index 992c291115..7afac5e105 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -89,16 +89,16 @@ New features
   paxos_state_purging: repaired. Once this has been set across the 
cluster, users are encouraged to set their
   applications to supply a Commit consistency level of ANY with their LWT 
write operations, saving one additional WAN
   round-trip. See upgrade notes below.
-- Warn/abort thresholds added to read queries notifying clients when these 
thresholds trigger (by
-  emitting a client warning or aborting the query).  This feature is 
disabled by default, scheduled
-  to be enabled in 4.2; it is controlled with the configuration 
track_warnings.enabled,
-  setting to true will enable this feature.  Each check has its own 
warn/abort thresholds, currently
+- Warn/fail thresholds added to read queries notifying clients when these 
thresholds trigger (by
+  emitting a client warning or failing the query).  This feature is 
disabled by default, scheduled
+  to be enabled in 4.2; it is controlled with the configuration 
read_thresholds_enabled,
+  setting to true will enable this feature.  Each check has its own 
warn/fail thresholds, currently
   tombstones (tombston

[cassandra-website] branch asf-staging updated (924e389d -> 015350e5)

2022-04-21 Thread git-site-role

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a change to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git


omit 924e389d generate docs for 8fd077a6
 new 015350e5 generate docs for 8fd077a6

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (924e389d)
\
 N -- N -- N   refs/heads/asf-staging (015350e5)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/doc/4.1/cassandra/cql/cql_singlefile.html  |   2 ++
 .../doc/latest/cassandra/cql/cql_singlefile.html   |   2 ++
 .../doc/trunk/cassandra/cql/cql_singlefile.html|   2 ++
 content/search-index.js|   2 +-
 site-ui/build/ui-bundle.zip| Bin 4740078 -> 4740078 
bytes
 5 files changed, 7 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication

2022-04-21 Thread Maulin Vasavada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525961#comment-17525961
 ] 

Maulin Vasavada edited comment on CASSANDRA-17513 at 4/21/22 8:17 PM:
--

Thank you [~djoshi] for considering the suggestion for the ticket title.

I've thought about it (little experimented also) and talked to some more 
'security' experts and I agree with the approach to have a separate keystore 
for client vs server certs for internode connections in case we need to have 
client auth enabled. While Java keystores provide ability to store multiple 
keys in it, for variety of reasons (some of which you already mentioned in your 
lastest comment) it makes sense to keep client vs server keys separate.

Given that we would need a different keystore for client TLS auth for the 
internode connection, what if somebody wants to use the same certs for client 
as well as server auth? -Would they be required to copy it to a separate 
keystore OR the code changes would have a fallback when the 'outbound keystore' 
(how current PR refers to) is not configured?- I realized that they can 
configure the same path for the 'outbound keystore' in that case.


was (Author: maulin.vasavada):
Thank you [~djoshi] for considering the suggestion for the ticket title.

I've thought about it (little experimented also) and talked to some of the more 
'security' experts and I agree with the approach to have a separate keystore 
for client vs server certs for internode connections in case we need to have 
client auth enabled. While Java keystores provide ability to store multiple 
keys in it, for variety of reasons (some of which you already mentioned in your 
lastest comment) it makes sense to keep client vs server keys separate.

Given that we would need a different keystore for client TLS auth for the 
internode connection, what if somebody wants to use the same certs for client 
as well as server auth? -Would they be required to copy it to a separate 
keystore OR the code changes would have a fallback when the 'outbound keystore' 
(how current PR refers to) is not configured?- I realized that they can 
configure the same path for the 'outbound keystore' in that case.

> Adding support for TLS client authentication for internode communication
> 
>
> Key: CASSANDRA-17513
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17513
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Same keystore is being set for both Inbound and outbound connections but we 
> should use a keystore with server certificate for Inbound connections and a 
> keystore with client certificates for outbound connections. So we should add 
> a new property in Cassandra.yaml to pass outbound keystore and use it in 
> SSLContextFactory for creating outbound SSL context.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17563) Fix CircleCI Midres config

2022-04-21 Thread Ekaterina Dimitrova (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526048#comment-17526048
 ] 

Ekaterina Dimitrova commented on CASSANDRA-17563:
-

Thanks, so now everything will be the same in the setup, [~dcapwell] only adds 
a tool to help to create new patches with minimum efforts. If someone prefers 
to do it manually - that is fine. In all cases we need to verify after that the 
generated MIDRES and HIGHRES files have exactly what we want for "happy" CI :D 

The only thing to be mentioned is that the tool might rearrange the attributes, 
so if someone feels bad about that, they can just use the old way and skip 
seeing those rearrangements being just additional noise. 

> Fix CircleCI Midres config
> --
>
> Key: CASSANDRA-17563
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17563
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.1
>
>
> During CircleCI addition of a new job to the config, the midres file got 
> messy. Two of the immediate issues (but we need to verify all jobs will use 
> the right executors and resources):
>  * the new job needs to use higher parallelism as the original in-jvm job
>  *  j8_dtests_with_vnodes should get from midres 50 large but currently 
> midres makes it run with 25 and medium which fails around 100 tests



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17500) Create Maximum Keyspace Replication Factor Guardrail

2022-04-21 Thread Savni Nagarkar (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526046#comment-17526046
 ] 

Savni Nagarkar commented on CASSANDRA-17500:


[~adelapena] I like the proposed approach better than using thread local, I 
added your changes to the current branch and the pull request is 
[here|https://github.com/apache/cassandra/pull/1582]. I am working on 
replicating the changes for minimum_keyspace_rf right now. 

> Create Maximum Keyspace Replication Factor Guardrail 
> -
>
> Key: CASSANDRA-17500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17500
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Guardrails
>Reporter: Savni Nagarkar
>Assignee: Savni Nagarkar
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket adds a maximum replication factor guardrail to ensure safety when 
> creating or altering key spaces. The replication factor will be applied per 
> data center. The ticket was prompted as a user set the replication factor 
> equal to the number of nodes in the cluster. The property will be added to 
> guardrails to ensure consistency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17500) Create Maximum Keyspace Replication Factor Guardrail

2022-04-21 Thread Savni Nagarkar (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Savni Nagarkar updated CASSANDRA-17500:
---
Test and Documentation Plan: 
https://github.com/apache/cassandra/pull/1582

 

  was:
[https://github.com/apache/cassandra/pull/1582|https://github.com/apache/cassandra/pull/1534]

 


> Create Maximum Keyspace Replication Factor Guardrail 
> -
>
> Key: CASSANDRA-17500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17500
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Guardrails
>Reporter: Savni Nagarkar
>Assignee: Savni Nagarkar
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket adds a maximum replication factor guardrail to ensure safety when 
> creating or altering key spaces. The replication factor will be applied per 
> data center. The ticket was prompted as a user set the replication factor 
> equal to the number of nodes in the cluster. The property will be added to 
> guardrails to ensure consistency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17500) Create Maximum Keyspace Replication Factor Guardrail

2022-04-21 Thread Savni Nagarkar (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Savni Nagarkar updated CASSANDRA-17500:
---
Test and Documentation Plan: 
[https://github.com/apache/cassandra/pull/1582|https://github.com/apache/cassandra/pull/1534]

 

  was:https://github.com/apache/cassandra/pull/1534


> Create Maximum Keyspace Replication Factor Guardrail 
> -
>
> Key: CASSANDRA-17500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17500
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Guardrails
>Reporter: Savni Nagarkar
>Assignee: Savni Nagarkar
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket adds a maximum replication factor guardrail to ensure safety when 
> creating or altering key spaces. The replication factor will be applied per 
> data center. The ticket was prompted as a user set the replication factor 
> equal to the number of nodes in the cluster. The property will be added to 
> guardrails to ensure consistency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17563) Fix CircleCI Midres config

2022-04-21 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526016#comment-17526016
 ] 

David Capwell commented on CASSANDRA-17563:
---

Speaking with [~e.dimitrova] in slack, I moved the patch to not touch 
generate.sh and instead the scripts are used to create the patches (created 
script to create the patches and updated docs to show how)

> Fix CircleCI Midres config
> --
>
> Key: CASSANDRA-17563
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17563
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.1
>
>
> During CircleCI addition of a new job to the config, the midres file got 
> messy. Two of the immediate issues (but we need to verify all jobs will use 
> the right executors and resources):
>  * the new job needs to use higher parallelism as the original in-jvm job
>  *  j8_dtests_with_vnodes should get from midres 50 large but currently 
> midres makes it run with 25 and medium which fails around 100 tests



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17556) jackson-databind 2.13.2 is vulnerable to CVE-2020-36518

2022-04-21 Thread Brandon Williams (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-17556:
-
Fix Version/s: 3.11.13
   4.1
   4.0.4
   (was: 4.x)
   (was: 3.11.x)
   (was: 4.0.x)

> jackson-databind 2.13.2 is vulnerable to CVE-2020-36518
> ---
>
> Key: CASSANDRA-17556
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17556
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.11.13, 4.1, 4.0.4
>
>
> Seems like it's technically possible to cause a DoS with nested json.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17568) Implement nodetool command to list data directories of existing tables

2022-04-21 Thread Tibor Repasi (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525998#comment-17525998
 ] 

Tibor Repasi commented on CASSANDRA-17568:
--

Thanks [~brandon.williams]. I've reverted it. I'm afraid, that current state is 
how far this improvement can go for now.

> Implement nodetool command to list data directories of existing tables
> --
>
> Key: CASSANDRA-17568
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17568
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/nodetool
>Reporter: Tibor Repasi
>Assignee: Tibor Repasi
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> When a table is created, dropped and re-created with the same name, 
> directories remain within data paths. Operators may be challenged finding out 
> which directories belong to existing tables and which may be subject to 
> removal. However, the information is available in CQL as well as in MBeans 
> via JMX, a convenient access to this information is still missing.
> My proposal is a new nodetool subcommand allowing to list data paths of all 
> existing tables.
> {code}
> % bin/nodetool datapaths -- example
> Keyspace : example
>   Table : test
>   Paths :
>   
> /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17568) Implement nodetool command to list data directories of existing tables

2022-04-21 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525991#comment-17525991
 ] 

Brandon Williams commented on CASSANDRA-17568:
--

bq. nodetool is a tool which is intended to interact with a Cassandra process 
via JMX

Indeed, and why this approach won't work, nodetool won't necessarily be run 
from the same machine.  The server needs to do the work and return the result 
via JMX.

> Implement nodetool command to list data directories of existing tables
> --
>
> Key: CASSANDRA-17568
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17568
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/nodetool
>Reporter: Tibor Repasi
>Assignee: Tibor Repasi
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> When a table is created, dropped and re-created with the same name, 
> directories remain within data paths. Operators may be challenged finding out 
> which directories belong to existing tables and which may be subject to 
> removal. However, the information is available in CQL as well as in MBeans 
> via JMX, a convenient access to this information is still missing.
> My proposal is a new nodetool subcommand allowing to list data paths of all 
> existing tables.
> {code}
> % bin/nodetool datapaths -- example
> Keyspace : example
>   Table : test
>   Paths :
>   
> /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17062) Expose Auth Caches metrics

2022-04-21 Thread Sam Tunnicliffe (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525983#comment-17525983
 ] 

Sam Tunnicliffe commented on CASSANDRA-17062:
-

Apologies for the long delay [~azotcsit]...
{quote}I find "conditional MBean attributes" (meaning they can be populated 
conditionally or treated differently) to be very confusing. So I think having 
different entities for different MBeans (CacheMetrics and 
UnweightedCacheMetrics) is smth clearer to the end user. WDYT?
{quote}
Fair enough, I take that point. Thinking about it a bit more, I think my main 
issue is this:  
I know it's perfectly legal, but I find the hiding of methods by overloading in 
the {{UnweightedCacheSize/CacheSize}} and 
{{UnweightedCacheMetrics/CacheMetrics}} hierarchies somewhat unintuitive. 

I've tried an alternative approach of adding an abstract base class for cache 
metrics. This way, the two classes of caches can track and expose the 
particular metrics that are relevant to them, capacity & size in bytes for 
weighted and max entries & entries for unweighted, without any overloading or 
hiding. I've pushed that 
[here|https://github.com/beobal/cassandra/tree/samt/17062-trunk-rebase] on top 
of a rebase on trunk. A few things had changed in the course of the docs 
migration to the {{adoc}} format, plus CASSANDRA-16958.

Let me know what you think.

> Expose Auth Caches metrics
> --
>
> Key: CASSANDRA-17062
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17062
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Virtual Tables, Observability/Metrics, 
> Tool/nodetool
>Reporter: Aleksei Zotov
>Assignee: Aleksei Zotov
>Priority: Normal
> Fix For: 4.x
>
>
> Unlike to other caches (row, key, counter), Auth Caches lack some monitoring 
> capabilities. Here are a few particular changes to get this inequity fixed:
>  # Add auth caches to _system_views.caches_ VT
>  # Expose auth caches metrics via JMX
>  # Add auth caches details to _nodetool info_
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication

2022-04-21 Thread Maulin Vasavada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525974#comment-17525974
 ] 

Maulin Vasavada edited comment on CASSANDRA-17513 at 4/21/22 6:50 PM:
--

{code:java}
I am open to considering implementing this idea if we don't force operators to 
explicitly a single store file i.e. maintain backward compatibility with what 
we have. However, it feels like this should be out of scope here and we can 
create a separate ticket to address it across both native and internode 
configurations {code}
On the above quote, if I understand you correctly- you are suggesting that 
somebody can work on a separate ticket to support having client/server keys in 
the same keystore (in case anybody needs it)?

If my understanding is correct- then yes I agree that it should be a separate 
concern out of the scope of this ticket.


was (Author: maulin.vasavada):
{code:java}
I am open to considering implementing this idea if we don't force operators to 
explicitly a single store file i.e. maintain backward compatibility with what 
we have. However, it feels like this should be out of scope here and we can 
create a separate ticket to address it across both native and internode 
configurations {code}
On the above quote, if I understand you correctly- you are suggesting that 
somebody can work on a separate ticket to support having client/server keys in 
the same keystore (in case anybody needs it)?

If my understand is correct- then yes I agree that it should be a separate 
concern out of the scope of this ticket.

> Adding support for TLS client authentication for internode communication
> 
>
> Key: CASSANDRA-17513
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17513
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Same keystore is being set for both Inbound and outbound connections but we 
> should use a keystore with server certificate for Inbound connections and a 
> keystore with client certificates for outbound connections. So we should add 
> a new property in Cassandra.yaml to pass outbound keystore and use it in 
> SSLContextFactory for creating outbound SSL context.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication

2022-04-21 Thread Maulin Vasavada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525974#comment-17525974
 ] 

Maulin Vasavada commented on CASSANDRA-17513:
-

{code:java}
I am open to considering implementing this idea if we don't force operators to 
explicitly a single store file i.e. maintain backward compatibility with what 
we have. However, it feels like this should be out of scope here and we can 
create a separate ticket to address it across both native and internode 
configurations {code}
On the above quote, if I understand you correctly- you are suggesting that 
somebody can work on a separate ticket to support having client/server keys in 
the same keystore (in case anybody needs it)?

If my understand is correct- then yes I agree that it should be a separate 
concern out of the scope of this ticket.

> Adding support for TLS client authentication for internode communication
> 
>
> Key: CASSANDRA-17513
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17513
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Same keystore is being set for both Inbound and outbound connections but we 
> should use a keystore with server certificate for Inbound connections and a 
> keystore with client certificates for outbound connections. So we should add 
> a new property in Cassandra.yaml to pass outbound keystore and use it in 
> SSLContextFactory for creating outbound SSL context.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17568) Implement nodetool command to list data directories of existing tables

2022-04-21 Thread Tibor Repasi (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525975#comment-17525975
 ] 

Tibor Repasi commented on CASSANDRA-17568:
--

With this 
[commit|https://github.com/apache/cassandra/pull/1580/commits/a759f9cb65bbd0a4620bcc7c6442a14e41507dd8]
 I've added a raw implementation of an {{--list-orphans}} option which is 
traversing all {{data_file_directories}} recursively in a depth of 2 and 
listing all paths which are not known to be used for tables.

However, it does correctly list empty keyspace directories and dropped tables, 
I have some objections:
# there is a {{system/_paxos_repair_state}} directory (I'm not familiar with) 
which is always listed; probably we would need a static exclude list
# nodetool is a tool which is intended to interact with a Cassandra process via 
JMX, this feature is interacting primarily with the filesystem

Therefore I don't really like this feature, it feels wrong, I would not unhappy 
to revert it.

> Implement nodetool command to list data directories of existing tables
> --
>
> Key: CASSANDRA-17568
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17568
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/nodetool
>Reporter: Tibor Repasi
>Assignee: Tibor Repasi
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> When a table is created, dropped and re-created with the same name, 
> directories remain within data paths. Operators may be challenged finding out 
> which directories belong to existing tables and which may be subject to 
> removal. However, the information is available in CQL as well as in MBeans 
> via JMX, a convenient access to this information is still missing.
> My proposal is a new nodetool subcommand allowing to list data paths of all 
> existing tables.
> {code}
> % bin/nodetool datapaths -- example
> Keyspace : example
>   Table : test
>   Paths :
>   
> /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17573) Fix test org.apache.cassandra.distributed.test.PaxosRepairTest#paxosRepairVersionGate

2022-04-21 Thread David Capwell (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-17573:
--
 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Complexity: Normal
Discovered By: Unit Test
Fix Version/s: 4.1
 Severity: Normal
   Status: Open  (was: Triage Needed)

marking 4.1 to be addressed before we release 4.1 (can't release with flaky 
tests)

> Fix test 
> org.apache.cassandra.distributed.test.PaxosRepairTest#paxosRepairVersionGate
> -
>
> Key: CASSANDRA-17573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17573
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Feature/Lightweight Transactions, 
> Test/dtest/java
>Reporter: David Capwell
>Priority: Normal
> Fix For: 4.1
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1381/workflows/4a4e6100-6582-43fd-8f39-6b3cbb5a94b6/jobs/11282/tests#failed-test-0
> {code}
> junit.framework.AssertionFailedError: Repair failed with errors: [Repair 
> session aa00ae00-c192-11ec-89f5-d521036fedec for range [(00c8,012c], 
> (0064,00c8], (012c,0064]] failed with error Paxos cleanup 
> session a1fe1fea-7522-47ec-879a-7f2e6cc592ad failed on /127.0.0.3:7012 with 
> message: Unsupported peer versions for a6404aa0-c192-11ec-89f5-d521036fedec 
> [(00c8,012c], (0064,00c8], (012c,0064]], Repair 
> command #3 finished with error]
>   at 
> org.apache.cassandra.distributed.test.PaxosRepairTest.lambda$repair$54f7d7c2$1(PaxosRepairTest.java:189)
>   at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81)
>   at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
>   at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-17573) Fix test org.apache.cassandra.distributed.test.PaxosRepairTest#paxosRepairVersionGate

2022-04-21 Thread David Capwell (Jira)

David Capwell created CASSANDRA-17573:
-

 Summary: Fix test 
org.apache.cassandra.distributed.test.PaxosRepairTest#paxosRepairVersionGate
 Key: CASSANDRA-17573
 URL: https://issues.apache.org/jira/browse/CASSANDRA-17573
 Project: Cassandra
  Issue Type: Bug
  Components: Consistency/Repair, Feature/Lightweight Transactions, 
Test/dtest/java
Reporter: David Capwell


https://app.circleci.com/pipelines/github/dcapwell/cassandra/1381/workflows/4a4e6100-6582-43fd-8f39-6b3cbb5a94b6/jobs/11282/tests#failed-test-0

{code}
junit.framework.AssertionFailedError: Repair failed with errors: [Repair 
session aa00ae00-c192-11ec-89f5-d521036fedec for range [(00c8,012c], 
(0064,00c8], (012c,0064]] failed with error Paxos cleanup 
session a1fe1fea-7522-47ec-879a-7f2e6cc592ad failed on /127.0.0.3:7012 with 
message: Unsupported peer versions for a6404aa0-c192-11ec-89f5-d521036fedec 
[(00c8,012c], (0064,00c8], (012c,0064]], Repair command 
#3 finished with error]
at 
org.apache.cassandra.distributed.test.PaxosRepairTest.lambda$repair$54f7d7c2$1(PaxosRepairTest.java:189)
at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81)
at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47)
at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication

2022-04-21 Thread Maulin Vasavada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525969#comment-17525969
 ] 

Maulin Vasavada commented on CASSANDRA-17513:
-

+1 from my side. Thanks for your patience. 

> Adding support for TLS client authentication for internode communication
> 
>
> Key: CASSANDRA-17513
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17513
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Same keystore is being set for both Inbound and outbound connections but we 
> should use a keystore with server certificate for Inbound connections and a 
> keystore with client certificates for outbound connections. So we should add 
> a new property in Cassandra.yaml to pass outbound keystore and use it in 
> SSLContextFactory for creating outbound SSL context.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17560) Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long

2022-04-21 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525967#comment-17525967
 ] 

David Capwell commented on CASSANDRA-17560:
---

found out the issue for the python upgrade test failures, CASSANDRA-10537 made 
a change to both python-dtest and trunk, and when this CI run was kicked off 
that patch wasn't in trunk, but it picked up the python-dtest changed, which 
caused this error

Rebased again and trying 1 more time

> Migrate track_warnings to more standard naming conventions and use latest 
> configuration types rather than long
> --
>
> Key: CASSANDRA-17560
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17560
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.1
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Track warnings currently is nested which is discouraged at the moment.  It 
> also was before the config standards patch which moved storage typed longs to 
> a new DataStorageSpec type, we should migrate the configs there.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication

2022-04-21 Thread Maulin Vasavada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525961#comment-17525961
 ] 

Maulin Vasavada edited comment on CASSANDRA-17513 at 4/21/22 6:38 PM:
--

Thank you [~djoshi] for considering the suggestion for the ticket title.

I've thought about it (little experimented also) and talked to some of the more 
'security' experts and I agree with the approach to have a separate keystore 
for client vs server certs for internode connections in case we need to have 
client auth enabled. While Java keystores provide ability to store multiple 
keys in it, for variety of reasons (some of which you already mentioned in your 
lastest comment) it makes sense to keep client vs server keys separate.

Given that we would need a different keystore for client TLS auth for the 
internode connection, what if somebody wants to use the same certs for client 
as well as server auth? -Would they be required to copy it to a separate 
keystore OR the code changes would have a fallback when the 'outbound keystore' 
(how current PR refers to) is not configured?- I realized that they can 
configure the same path for the 'outbound keystore' in that case.


was (Author: maulin.vasavada):
Thank you [~djoshi] for considering the suggestion for the ticket title.

I've thought about it (little experimented also) and talked to some of the more 
'security' experts and I agree with the approach to have a separate keystore 
for client vs server certs for internode connections in case we need to have 
client auth enabled. While Java keystores provide ability to store multiple 
keys in it, for variety of reasons (some of which you already mentioned in your 
lastest comment) it makes sense to keep client vs server keys separate.

Given that we would need a different keystore for client TLS auth for the 
internode connection, what if somebody wants to use the same certs for client 
as well as server auth? Would they be required to copy it to a separate 
keystore OR the code changes would have a fallback when the 'outbound keystore' 
(how current PR refers to) is not configured?

> Adding support for TLS client authentication for internode communication
> 
>
> Key: CASSANDRA-17513
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17513
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Same keystore is being set for both Inbound and outbound connections but we 
> should use a keystore with server certificate for Inbound connections and a 
> keystore with client certificates for outbound connections. So we should add 
> a new property in Cassandra.yaml to pass outbound keystore and use it in 
> SSLContextFactory for creating outbound SSL context.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication

2022-04-21 Thread Maulin Vasavada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525961#comment-17525961
 ] 

Maulin Vasavada edited comment on CASSANDRA-17513 at 4/21/22 6:34 PM:
--

Thank you [~djoshi] for considering the suggestion for the ticket title.

I've thought about it (little experimented also) and talked to some of the more 
'security' experts and I agree with the approach to have a separate keystore 
for client vs server certs for internode connections in case we need to have 
client auth enabled. While Java keystores provide ability to store multiple 
keys in it, for variety of reasons (some of which you already mentioned in your 
lastest comment) it makes sense to keep client vs server keys separate.

Given that we would need a different keystore for client TLS auth for the 
internode connection, what if somebody wants to use the same certs for client 
as well as server auth? Would they be required to copy it to a separate 
keystore OR the code changes would have a fallback when the 'outbound keystore' 
(how current PR refers to) is not configured?


was (Author: maulin.vasavada):
Thank you [~djoshi] for considering the suggestion for the ticket title.

I've thought about it (little experimented also) and talked to some of the more 
security experts and I agree with the approach to have a separate keystore for 
client vs server certs for internode connections in case we need to have client 
auth enabled. While Java keystores provide ability to store multiple keys in 
it, for variety of reasons (some of which you already mentioned in your lastest 
comment) it makes sense to keep client vs server keys separate.

Given that we would need a different keystore for client TLS auth for the 
internode connection, what if somebody wants to use the same certs for client 
as well as server auth? Would they be required to copy it to a separate 
keystore OR the code changes would have a fallback when the 'outbound keystore' 
(how current PR refers to) is not configured?

> Adding support for TLS client authentication for internode communication
> 
>
> Key: CASSANDRA-17513
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17513
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Same keystore is being set for both Inbound and outbound connections but we 
> should use a keystore with server certificate for Inbound connections and a 
> keystore with client certificates for outbound connections. So we should add 
> a new property in Cassandra.yaml to pass outbound keystore and use it in 
> SSLContextFactory for creating outbound SSL context.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17513) Adding support for TLS client authentication for internode communication

2022-04-21 Thread Maulin Vasavada (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525961#comment-17525961
 ] 

Maulin Vasavada commented on CASSANDRA-17513:
-

Thank you [~djoshi] for considering the suggestion for the ticket title.

I've thought about it (little experimented also) and talked to some of the more 
security experts and I agree with the approach to have a separate keystore for 
client vs server certs for internode connections in case we need to have client 
auth enabled. While Java keystores provide ability to store multiple keys in 
it, for variety of reasons (some of which you already mentioned in your lastest 
comment) it makes sense to keep client vs server keys separate.

Given that we would need a different keystore for client TLS auth for the 
internode connection, what if somebody wants to use the same certs for client 
as well as server auth? Would they be required to copy it to a separate 
keystore OR the code changes would have a fallback when the 'outbound keystore' 
(how current PR refers to) is not configured?

> Adding support for TLS client authentication for internode communication
> 
>
> Key: CASSANDRA-17513
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17513
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jyothsna Konisa
>Assignee: Jyothsna Konisa
>Priority: Normal
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Same keystore is being set for both Inbound and outbound connections but we 
> should use a keystore with server certificate for Inbound connections and a 
> keystore with client certificates for outbound connections. So we should add 
> a new property in Cassandra.yaml to pass outbound keystore and use it in 
> SSLContextFactory for creating outbound SSL context.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17560) Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long

2022-04-21 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525959#comment-17525959
 ] 

David Capwell commented on CASSANDRA-17560:
---

CI Results:

j8_unit:
* org.apache.cassandra.db.commitlog.CommitLogSegmentManagerCDCTest and 
org.apache.cassandra.dht.tokenallocator.OfflineTokenAllocatorTest both timeout 
in the JVM and didn't show up in test results... 

j11_jvm_dtest: 
org.apache.cassandra.distributed.test.PaxosRepairTest#paxosRepairVersionGate 
first time failing according to butler and we ran this test 4 times (j8/j11 w/ 
and w/o vnode) and only once did it fail, so feels flaky 
https://app.circleci.com/pipelines/github/dcapwell/cassandra/1381/workflows/4a4e6100-6582-43fd-8f39-6b3cbb5a94b6/jobs/11282/tests#failed-test-0
 .  Creating a ticket for this

python upgrade looks to be failing to CASSANDRA-17451 and a CQL parser issue... 
need to look into this as [~brandon.williams] was saying that the timeout is 
17451 but the CQL parser issue is new... so holding off the merge

> Migrate track_warnings to more standard naming conventions and use latest 
> configuration types rather than long
> --
>
> Key: CASSANDRA-17560
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17560
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.1
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Track warnings currently is nested which is discouraged at the moment.  It 
> also was before the config standards patch which moved storage typed longs to 
> a new DataStorageSpec type, we should migrate the configs there.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16456) Add Plugin Support for CQLSH

2022-04-21 Thread Brian Houser (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525958#comment-17525958
 ] 

Brian Houser commented on CASSANDRA-16456:
--

> Sorry, I am not getting this. I am not sure how it is done exactly on the 
> code level right at the moment but I would say that this should be pretty 
> transparent? Whatever properties there are specified in auth_provider, they 
> are taken into account and then they are eventually replaced by whatever is 
> in credentials. If there is a username property both in auth_provider section 
> in cqlshrc and in the related section in credentials, the property in 
> credentials overwrites / has precedence / shadows the one in cqlshrc.

Basically right now if you have an Auth_provider specified (other than 
PlainTextAuthProvider), but specify username or password on the command line, 
it will override the custom loading and return PlainTextAuthProvider with the 
given username and password.  This seemed to fit the original use case the best 
and be what the documentation was guaranteeing, particularly as there was no 
way to override the authprovider from the command line. 

Would you rather I just pass the username and password to whatever 
auth_provider is indicated, and if its not indicated just default to the 
PlainTextAuthProvider?

> Add Plugin Support for CQLSH
> 
>
> Key: CASSANDRA-16456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16456
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/cqlsh
>Reporter: Brian Houser
>Assignee: Brian Houser
>Priority: Normal
>  Labels: gsoc2021, mentor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently the Cassandra drivers offer a plugin authenticator architecture for 
> the support of different authentication methods. This has been leveraged to 
> provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, 
> cqlsh, the included CLI tool, does not offer such support. Switching to a new 
> enhanced authentication scheme thus means being cut off from using cqlsh in 
> normal operation.
> We should have a means of using the same plugins and authentication providers 
> as the Python Cassandra driver.
> Here's a link to an initial draft of 
> [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-16456) Add Plugin Support for CQLSH

2022-04-21 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525925#comment-17525925
 ] 

Stefan Miklosovic edited comment on CASSANDRA-16456 at 4/21/22 6:19 PM:


_FooAuthProvider would get called with the name prop1, prop2.  Notice that if 
there is no auth_provider section in cqlshrc file specifying what you want to 
load... the credentials file won't find any properties.  You need to specify an 
auth_provider to use the "new school" way of loading the credentials file._ 

This in general makes sense, but as I look at it, when there is no 
auth_provider, there is still PlainTextAuthProvider implicitly. That provider 
is _default._ So even I do not have anything in cqlshrc in auth_provider, 
imagine there still is one, the plaintext one. Hence it will see the stuff in 
credentials file based in [PlainTextAuthProvider] section.

_It seems you want it to default to PlainTextAuthProvider in all cases when 
auth provider isn't specified ..._

Exactly, yes, please.

_If a provider happens to use a property called 'username' with the fix you 
propose, I'll end up loading the plaintextauth provider instead of the one 
specified, which would be pretty confusing._

Sorry, I am not getting this. I am not sure how it is done exactly on the code 
level right at the moment but I would say that this should be pretty 
transparent? Whatever properties there are specified in auth_provider, they are 
taken into account and then they are eventually replaced by whatever is in 
credentials. If there is a username property both in auth_provider section in 
cqlshrc and in the related section in credentials, the property in credentials 
overwrites / has precedence / shadows the one in cqlshrc.


was (Author: smiklosovic):
_FooAuthProvider would get called with the name prop1, prop2.  Notice that if 
there is no auth_provider section in cqlshrc file specifying what you want to 
load... the credentials file won't find any properties.  You need to specify an 
auth_provider to use the "new school" way of loading the credentials file._ 

This in general makes sense, but as I look at it, when there is no 
auth_provider, there is still PlainTextAuthProvider implicitly. That provider 
is _default._ So even I do not have anything in cqlshrc in auth_provider, 
imagine there still is one, the plaintext one. Hence it will see the stuff in 
credentials file based in [PlainTextAuthProvider] section.

_It seems you want it to default to PlainTextAuthProvider in all cases when 
auth provider isn't specified ..._

Exactly, yes, please.

> Add Plugin Support for CQLSH
> 
>
> Key: CASSANDRA-16456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16456
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/cqlsh
>Reporter: Brian Houser
>Assignee: Brian Houser
>Priority: Normal
>  Labels: gsoc2021, mentor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently the Cassandra drivers offer a plugin authenticator architecture for 
> the support of different authentication methods. This has been leveraged to 
> provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, 
> cqlsh, the included CLI tool, does not offer such support. Switching to a new 
> enhanced authentication scheme thus means being cut off from using cqlsh in 
> normal operation.
> We should have a means of using the same plugins and authentication providers 
> as the Python Cassandra driver.
> Here's a link to an initial draft of 
> [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-16456) Add Plugin Support for CQLSH

2022-04-21 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525925#comment-17525925
 ] 

Stefan Miklosovic edited comment on CASSANDRA-16456 at 4/21/22 6:12 PM:


_FooAuthProvider would get called with the name prop1, prop2.  Notice that if 
there is no auth_provider section in cqlshrc file specifying what you want to 
load... the credentials file won't find any properties.  You need to specify an 
auth_provider to use the "new school" way of loading the credentials file._ 

This in general makes sense, but as I look at it, when there is no 
auth_provider, there is still PlainTextAuthProvider implicitly. That provider 
is _default._ So even I do not have anything in cqlshrc in auth_provider, 
imagine there still is one, the plaintext one. Hence it will see the stuff in 
credentials file based in [PlainTextAuthProvider] section.

_It seems you want it to default to PlainTextAuthProvider in all cases when 
auth provider isn't specified ..._

Exactly, yes, please.


was (Author: smiklosovic):
_FooAuthProvider would get called with the name prop1, prop2.  Notice that if 
there is no auth_provider section in cqlshrc file specifying what you want to 
load... the credentials file won't find any properties.  You need to specify an 
auth_provider to use the "new school" way of loading the credentials file._ 

This in general makes sense, but as I look at it, when there is no 
auth_provider, there is still PlainTextAuthProvider implicitly. That provider 
is _default._ So even I do not have anything in cqlshrc in auth_provider, 
imagine there still is one, the plaintext one. Hence it will see the stuff in 
credentials file based in [PlainTextAuthProvider] section.

_It seems you want it to default to PlainTextAuthProvider in all cases when 
auth provider isn't specified ..._

Exactly, yes, please.

 

 

> Add Plugin Support for CQLSH
> 
>
> Key: CASSANDRA-16456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16456
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/cqlsh
>Reporter: Brian Houser
>Assignee: Brian Houser
>Priority: Normal
>  Labels: gsoc2021, mentor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently the Cassandra drivers offer a plugin authenticator architecture for 
> the support of different authentication methods. This has been leveraged to 
> provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, 
> cqlsh, the included CLI tool, does not offer such support. Switching to a new 
> enhanced authentication scheme thus means being cut off from using cqlsh in 
> normal operation.
> We should have a means of using the same plugins and authentication providers 
> as the Python Cassandra driver.
> Here's a link to an initial draft of 
> [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16456) Add Plugin Support for CQLSH

2022-04-21 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525925#comment-17525925
 ] 

Stefan Miklosovic commented on CASSANDRA-16456:
---

_FooAuthProvider would get called with the name prop1, prop2.  Notice that if 
there is no auth_provider section in cqlshrc file specifying what you want to 
load... the credentials file won't find any properties.  You need to specify an 
auth_provider to use the "new school" way of loading the credentials file._ 

This in general makes sense, but as I look at it, when there is no 
auth_provider, there is still PlainTextAuthProvider implicitly. That provider 
is _default._ So even I do not have anything in cqlshrc in auth_provider, 
imagine there still is one, the plaintext one. Hence it will see the stuff in 
credentials file based in [PlainTextAuthProvider] section.

_It seems you want it to default to PlainTextAuthProvider in all cases when 
auth provider isn't specified ..._

Exactly, yes, please.

 

 

> Add Plugin Support for CQLSH
> 
>
> Key: CASSANDRA-16456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16456
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/cqlsh
>Reporter: Brian Houser
>Assignee: Brian Houser
>Priority: Normal
>  Labels: gsoc2021, mentor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently the Cassandra drivers offer a plugin authenticator architecture for 
> the support of different authentication methods. This has been leveraged to 
> provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, 
> cqlsh, the included CLI tool, does not offer such support. Switching to a new 
> enhanced authentication scheme thus means being cut off from using cqlsh in 
> normal operation.
> We should have a means of using the same plugins and authentication providers 
> as the Python Cassandra driver.
> Here's a link to an initial draft of 
> [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17500) Create Maximum Keyspace Replication Factor Guardrail

2022-04-21 Thread Jira



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525913#comment-17525913
 ] 

Andres de la Peña commented on CASSANDRA-17500:
---

[~savni_nagarkar] [~dcapwell] regarding passing the client state, I guess we 
could do something more or less [like 
this|https://github.com/adelapena/cassandra/commit/d1bddfa54cf430b4f836bcdcdbd5e4b3e9b33b4e],
 trying to keep the compatibility of 3rd party implementations of 
{{AbstractReplicationStrategy}}, if any.

Nevertheless, I think we should start by migrating the min RF to guardrails 
(CASSANDRA-17212) before adding the max RF, so we don't have two separate 
approaches and config formats for min and max. Also, {{minimum_keyspace_rf}} is 
only on trunk, so if we are going to migrate it to guardrails it would be ideal 
to do it as soon as possible so we don't have to deprecate it later. wdyt?

> Create Maximum Keyspace Replication Factor Guardrail 
> -
>
> Key: CASSANDRA-17500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17500
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Guardrails
>Reporter: Savni Nagarkar
>Assignee: Savni Nagarkar
>Priority: Normal
> Fix For: 4.x
>
>
> This ticket adds a maximum replication factor guardrail to ensure safety when 
> creating or altering key spaces. The replication factor will be applied per 
> data center. The ticket was prompted as a user set the replication factor 
> equal to the number of nodes in the cluster. The property will be added to 
> guardrails to ensure consistency.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy

2022-04-21 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525871#comment-17525871
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-17519 at 4/21/22 5:53 PM:
-

I suspect two things:

1) When originally written, this code depended on the assumption that there was 
mutual exclusion when creating one of these tidy objects, or that they were 
only created once, and that assumption was later broken (or perhaps was always 
false);

2) A variant of this race condition was encountered by the simulator when 
validating Paxos, and I “fixed” it without paying much attention to get things 
moving (perhaps without even intending to properly fix it at the time, as there 
was too much to do), and then forgot about it.

I'll try to find time to perform a proper analysis of your report and the wider 
problems.


was (Author: benedict):
I suspect two things:

1) When originally written, this code depended on the assumption that there was 
mutual exclusion when creating one of these tidy objects, and that assumption 
was later broken (or perhaps was always false);

2) A variant of this race condition was encountered by the simulator when 
validating Paxos, and I fixed it without paying much attention to get things 
moving (perhaps without even intending to properly fix it at the time, as there 
was too much to do), and then forgot about it.

I'll try to find time to perform a proper analysis of your report and the wider 
problems.

> races/leaks in SSTableReader::GlobalTidy
> 
>
> Key: CASSANDRA-17519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Jakub Zytka
>Assignee: Jakub Zytka
>Priority: Normal
> Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, 
> CASSANDRA-17519-4.1-test-exposing-the-problem.txt
>
>
> In Cassandra 4.0/3.11 there are at least two races in 
> SSTableReader::GlobalTidy
> One is a get/get race, explicitly handled as an assertion in:
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]
> and it looks like "ok, it's a problem, but let's just not fix it"
> The other one is get/tidy race between 
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]
> and
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]
>  
> The second one can be easily hit by adding a small delay at the beginning of 
> `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually 
> such failure is what prompted the investigation of GlobalTidy correctness)
> There was an attempt on `trunk` to fix these two races.
> The details are not clear to me, and it all looks quite weird. I might be 
> mistaken, but as far as I can see the relevant changes were introduced in:
> [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490]
> that is piggybacked on a huge change in CASSANDRA-17008, without a separate 
> ticket or any sort of qa.
> As far as I can see this attempt changes the first race into a leak, and the 
> second race to another race, this time allowing to have multiple GlobalTidy 
> objects for the same sstable (and, as a result, a premature running of 
> obsoletion code).
> I'll follow up with PRs for relevant branches etc etc



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node

2022-04-21 Thread Brandon Williams (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-17572:
-
Fix Version/s: (was: 3.0.x)
   (was: 3.11.x)
   (was: 4.0.x)

> Race condition when IP address changes for a node can cause reads/writes to 
> route to the wrong node
> ---
>
> Key: CASSANDRA-17572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17572
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Sam Kramer
>Priority: Normal
> Fix For: 4.x
>
>
> Hi,
> We noticed that there is a race condition present in the trunk of 3.x code, 
> and confirmed that it’s there in 4.x as well, which will result in incorrect 
> reads, and missed writes, for a very short period of time.
> What brought the race condition to our attention was due to the fact we 
> started noticing a couple of missed writes for our Cassandra clusters in 
> Kubernetes. We found the Kubernetes piece interesting, as IP changes are very 
> frequent as opposed to a traditional setup.
> More concretely:
>  # When a Cassandra node is turned off, and then starts with a new IP address 
> Z (former IP address X), it announces to the cluster (via gossip) it has IP Z 
> for Host ID Y
>  # If there are no conflicts, each node will decide to remove the old IP 
> address associated with Host ID Y 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
>  from the storage ring. This also causes us to invalidate our token ring 
> cache 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
>  ).
>  # At this time, a new request could come in (read or write), and will 
> re-calculate which endpoints to send the request to, as we’ve invalidated our 
> token ring cache 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
>  # However, at this time we’ve only removed the IP address X (former IP 
> address), and have not re-added IP address Z.
>  # As a result, we will choose a new host to route our request to. In our 
> case, our keyspaces all run with NetworkTopologyStrategy, and so we simply 
> choose the node with the next closest token in the same rack as host Y 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]).
>  # Thus, the request is routed to a _different_ host, rather than the host 
> that has came back online.
>  # However, shortly later, we re-add the host (via it’s _new_ endpoint) to 
> the token ring 
> [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549]
>  # This will result in us invalidating our cache, and then again re-routing 
> requests appropriately.
> Couple of additional thoughts:
>  - This doesn’t affect clusters where nodes <= RF with network topology 
> strategy.
>  - During this very brief period of time, CL for all user queries are 
> violated, but are ACK’d as successful.
>  - It’s easy to reproduce this race condition by simply adding a sleep here 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
>  - If a cleanup is not ran before any range movement, it’s possible for rows 
> that were temporarily written to the wrong node re-appear. 
>  - We tested that the race condition exists in our Cassandra 2.x fork (we're 
> not on 3.x or 4.x). So, there is a possibility here that it's only for 
> Cassandra 2.x, however unlikely from reading the code. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node

2022-04-21 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525905#comment-17525905
 ] 

Brandon Williams commented on CASSANDRA-17572:
--

Actually, the window here should already be very small, it's all done in the 
same path.

> Race condition when IP address changes for a node can cause reads/writes to 
> route to the wrong node
> ---
>
> Key: CASSANDRA-17572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17572
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Sam Kramer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x
>
>
> Hi,
> We noticed that there is a race condition present in the trunk of 3.x code, 
> and confirmed that it’s there in 4.x as well, which will result in incorrect 
> reads, and missed writes, for a very short period of time.
> What brought the race condition to our attention was due to the fact we 
> started noticing a couple of missed writes for our Cassandra clusters in 
> Kubernetes. We found the Kubernetes piece interesting, as IP changes are very 
> frequent as opposed to a traditional setup.
> More concretely:
>  # When a Cassandra node is turned off, and then starts with a new IP address 
> Z (former IP address X), it announces to the cluster (via gossip) it has IP Z 
> for Host ID Y
>  # If there are no conflicts, each node will decide to remove the old IP 
> address associated with Host ID Y 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
>  from the storage ring. This also causes us to invalidate our token ring 
> cache 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
>  ).
>  # At this time, a new request could come in (read or write), and will 
> re-calculate which endpoints to send the request to, as we’ve invalidated our 
> token ring cache 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
>  # However, at this time we’ve only removed the IP address X (former IP 
> address), and have not re-added IP address Z.
>  # As a result, we will choose a new host to route our request to. In our 
> case, our keyspaces all run with NetworkTopologyStrategy, and so we simply 
> choose the node with the next closest token in the same rack as host Y 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]).
>  # Thus, the request is routed to a _different_ host, rather than the host 
> that has came back online.
>  # However, shortly later, we re-add the host (via it’s _new_ endpoint) to 
> the token ring 
> [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549]
>  # This will result in us invalidating our cache, and then again re-routing 
> requests appropriately.
> Couple of additional thoughts:
>  - This doesn’t affect clusters where nodes <= RF with network topology 
> strategy.
>  - During this very brief period of time, CL for all user queries are 
> violated, but are ACK’d as successful.
>  - It’s easy to reproduce this race condition by simply adding a sleep here 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
>  - If a cleanup is not ran before any range movement, it’s possible for rows 
> that were temporarily written to the wrong node re-appear. 
>  - We tested that the race condition exists in our Cassandra 2.x fork (we're 
> not on 3.x or 4.x). So, there is a possibility here that it's only for 
> Cassandra 2.x, however unlikely from reading the code. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-11871) Allow to aggregate by time intervals

2022-04-21 Thread Yifan Cai (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-11871:
--
Reviewers: Andres de la Peña, Yifan Cai  (was: Andres de la Peña)
   Status: Review In Progress  (was: Patch Available)

> Allow to aggregate by time intervals
> 
>
> Key: CASSANDRA-11871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11871
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Benjamin Lerer
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> For time series data it can be usefull to aggregate by time intervals.
> The idea would be to add support for one or several functions in the {{GROUP 
> BY}} clause.
> Regarding the implementation, even if in general I also prefer to follow the 
> SQL syntax, I do not believe it will be a good fit for Cassandra.
> If we have a table like:
> {code}
> CREATE TABLE trades
> {
> symbol text,
> date date,
> time time,
> priceMantissa int,
> priceExponent tinyint,
> volume int,
> PRIMARY KEY ((symbol, date), time)
> };
> {code}
> The trades will be inserted with an increasing time and sorted in the same 
> order. As we can have to process a large amount of data, we want to try to 
> limit ourself to the cases where we can build the groups on the flight (which 
> is not a requirement in the SQL world).
> If we want to get the number of trades per minutes with the SQL syntax we 
> will have to write:
> {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' 
> AND date = '2016-01-11' GROUP BY hour(time), minute(time);}}
> which is fine. The problem is that if the user invert by mistake the 
> functions like that:
> {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' 
> AND date = '2016-01-11' GROUP BY minute(time), hour(time);}}
> the query will return weird results.
> The only way to prevent that would be to check the function order and make 
> sure that we do not allow to skip functions (e.g. {{GROUP BY hour(time), 
> second(time)}}).
> In my opinion a function like {{floor(, )}} will be 
> much better as it does not allow for this type of mistakes and is much more 
> flexible (you can create 5 minutes buckets if you want to).
> {{SELECT floor(time, m), count() FROM Trades WHERE symbol = 'AAPL' AND date = 
> '2016-01-11' GROUP BY floor(time, m);}}
> An important aspect to keep in mind with a function like {{floor}} is the 
> starting point. For a query like:  {{SELECT floor(time, m), count() FROM 
> Trades WHERE symbol = 'AAPL' AND date = '2016-01-11' AND time >= '01:30:00' 
> AND time =< '07:30:00' GROUP BY floor(time, 2h);}}, I think that ideally the 
> result should return 3 groups: {{01:30:00}}, {{03:30:00}} and {{05:30:00}}.  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-11871) Allow to aggregate by time intervals

2022-04-21 Thread Yifan Cai (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525889#comment-17525889
 ] 

Yifan Cai commented on CASSANDRA-11871:
---

+1 on the patch! CI looks good too. 

> Allow to aggregate by time intervals
> 
>
> Key: CASSANDRA-11871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11871
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Benjamin Lerer
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> For time series data it can be usefull to aggregate by time intervals.
> The idea would be to add support for one or several functions in the {{GROUP 
> BY}} clause.
> Regarding the implementation, even if in general I also prefer to follow the 
> SQL syntax, I do not believe it will be a good fit for Cassandra.
> If we have a table like:
> {code}
> CREATE TABLE trades
> {
> symbol text,
> date date,
> time time,
> priceMantissa int,
> priceExponent tinyint,
> volume int,
> PRIMARY KEY ((symbol, date), time)
> };
> {code}
> The trades will be inserted with an increasing time and sorted in the same 
> order. As we can have to process a large amount of data, we want to try to 
> limit ourself to the cases where we can build the groups on the flight (which 
> is not a requirement in the SQL world).
> If we want to get the number of trades per minutes with the SQL syntax we 
> will have to write:
> {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' 
> AND date = '2016-01-11' GROUP BY hour(time), minute(time);}}
> which is fine. The problem is that if the user invert by mistake the 
> functions like that:
> {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' 
> AND date = '2016-01-11' GROUP BY minute(time), hour(time);}}
> the query will return weird results.
> The only way to prevent that would be to check the function order and make 
> sure that we do not allow to skip functions (e.g. {{GROUP BY hour(time), 
> second(time)}}).
> In my opinion a function like {{floor(, )}} will be 
> much better as it does not allow for this type of mistakes and is much more 
> flexible (you can create 5 minutes buckets if you want to).
> {{SELECT floor(time, m), count() FROM Trades WHERE symbol = 'AAPL' AND date = 
> '2016-01-11' GROUP BY floor(time, m);}}
> An important aspect to keep in mind with a function like {{floor}} is the 
> starting point. For a query like:  {{SELECT floor(time, m), count() FROM 
> Trades WHERE symbol = 'AAPL' AND date = '2016-01-11' AND time >= '01:30:00' 
> AND time =< '07:30:00' GROUP BY floor(time, 2h);}}, I think that ideally the 
> result should return 3 groups: {{01:30:00}}, {{03:30:00}} and {{05:30:00}}.  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy

2022-04-21 Thread Jakub Zytka (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525865#comment-17525865
 ] 

Jakub Zytka edited comment on CASSANDRA-17519 at 4/21/22 5:36 PM:
--

I believe that the get/tidy race condition on 4.1 may end up in unexpected 
running the obsoletion code before it is due, potentially leading to some local 
data loss. Admittedly, I don't have a real-life scenario for that to happen.
The fact that a failure of the assertion that we had on 4.0 and earlier has not 
been seen in the wild suggests that the occurrence probability is very low. 
Still, I preferred to err on the safe side, and thus the bug has been 
categorized as a recoverable loss.


was (Author: jakubzytka):
I believe that the get/tidy race condition may end up in unexpected running the 
obsoletion code before it is due, potentially leading to some local data loss. 
Admittedly, I don't have a real-life scenario for that to happen.
The fact that a failure of the assertion that we had on 4.0 and earlier has not 
been seen in the wild suggests that the occurrence probability is very low. 
Still, I preferred to err on the safe side, and thus the bug has been 
categorized as a recoverable loss.

> races/leaks in SSTableReader::GlobalTidy
> 
>
> Key: CASSANDRA-17519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Jakub Zytka
>Assignee: Jakub Zytka
>Priority: Normal
> Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, 
> CASSANDRA-17519-4.1-test-exposing-the-problem.txt
>
>
> In Cassandra 4.0/3.11 there are at least two races in 
> SSTableReader::GlobalTidy
> One is a get/get race, explicitly handled as an assertion in:
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]
> and it looks like "ok, it's a problem, but let's just not fix it"
> The other one is get/tidy race between 
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]
> and
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]
>  
> The second one can be easily hit by adding a small delay at the beginning of 
> `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually 
> such failure is what prompted the investigation of GlobalTidy correctness)
> There was an attempt on `trunk` to fix these two races.
> The details are not clear to me, and it all looks quite weird. I might be 
> mistaken, but as far as I can see the relevant changes were introduced in:
> [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490]
> that is piggybacked on a huge change in CASSANDRA-17008, without a separate 
> ticket or any sort of qa.
> As far as I can see this attempt changes the first race into a leak, and the 
> second race to another race, this time allowing to have multiple GlobalTidy 
> objects for the same sstable (and, as a result, a premature running of 
> obsoletion code).
> I'll follow up with PRs for relevant branches etc etc



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17563) Fix CircleCI Midres config

2022-04-21 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525883#comment-17525883
 ] 

David Capwell commented on CASSANDRA-17563:
---

bq.  has only an addition of a new job in_jvm which match as resource usage the 
old single in_jvm job and no other changes applied.

Currently this is a massive amount of manual work to confirm, and there isn't a 
good source of truth to compare against; this patch moves the source of truth 
into a map so we know what happens (if you don't update you get default for the 
level, else you add an override)

We talked about this a lot in slack, and my personal feeling is every step in a 
manual process is another chance for error, so the more steps done the higher 
risk to do it incorrectly; the current process as I can find is the following

1) create config-2_1.yml.MIDRES and config-2_1.yml.HIGHER using the current 
patches
2) update config-2_1.yml with your change
3) update config-2_1.yml.MIDRES with your change, and figure out how to apply 
the updated resources
4) update config-2_1.yml.HIGHER with your change and figure out how to apply 
the updated resources (method does not match step 3, so the "how" is different 
here)
5) generate diff for MIDRES and update patch
6) generate diff for HIGHER and update patch
7) test LOWER - success is defined as "what failed before is the only thing 
failing now"
8) test MIDRES - success is defined as "what failed before is the only thing 
failing now"
9) test HIGHER - success is defined as "what failed before is the only thing 
failing now"

did I miss anything?

bq. There are changes to the other generate.sh script I haven't looked at but 
any change there need to be tested

given there are 0 tests for the script, 2 different people "testing" could 
yield different results, so we would need to have some way to define success 
that is agreed upon.  

For example, I fixed what I saw was a bug, when you ask it to generate LOWER, 
MIDRES, or HIGHER it doesn't actually update those files and instead only 
updates config.yml; the help page says this but to me this is unexpected 
behavior

-a updates config.yml.LOWER, config.yml.MIDRES, and config.yml.HIGHER,
-h updates config.yml only!

I am totally cool with -h updating config.yml as well, but it feels like a bug 
that config.yml.HIGHER isn't updated... so my patch changing that... which one 
of us is the bug?

Now, if we want to define it as "they did the same thing regardless of personal 
feelings about correct behavior" then I do know for a fact my patch is 
different; I am 100% ok reverting that difference

bq. My concern is we don't know who was using what and how and it was working 
fine for quite some time

I feel like a politician... can you define the word "what"?

bq. Do we want to rewrite the whole approach one week before freeze when people 
highly utilize CI to push their latest work?

the core change is moving away from patch to modifying the yaml tree; there are 
other changes but those are personal preference and 100% fine to drop... 

To me I ask the following question "if you yaml diff the old and new files, are 
there a difference?" if the answer is no, then there isn't much of a risk other 
than the script not working on an unknown laptop (which impacts generate.sh 
only, not CI configs).

Now, if you want to de-risk that, we could use this script to generate the 
patches, but we don't solve the real problem of patches applying when they 
shouldn't (which is how I broke MIDRES).  If we want to do that to lower risk 
before 4.1 freeze I am cool with that, but do not think that is a valid long 
term solution

bq. now we will have a mix of python and shell scripts, are we sure the 
community will accept that? 

that is something anyone who touches this needs to answer, which is why I tried 
to pull in anyone who touched this logic to get their feedback.  I do know that 
many in the community basically do this already (can tell by looking at circle 
ci as my private scripts rename things and cleanup our DAG), so its just moving 
part of that private logic into OSS to help maintain these files.

bq. I really like and appreciate how you added diff but I am confused  from the 
output what I am seeing actually. I see the new name and resource change.

do you mean the output of the dump of what each job's resource is?

{code}
$ diff midres.resources midres.resources.new
9a10
> j11_jvm_dtests_vnode  medium  10
22a24
> j8_jvm_dtests_vnode   large   10
{code}

so this is diff output, so ">" means that the right-hand-side has the 
following, but there is no matching on the left hand side... aka "new job"

you see that MIDRES j8 and j11 do not have matching resources!  This is because 
they don't now (as defined before the vnode patch), so I am pointing out that 
j11 and j8 run with different resources on MIDRES and that I am not changing 
that behavi

[jira] [Commented] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy

2022-04-21 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525871#comment-17525871
 ] 

Benedict Elliott Smith commented on CASSANDRA-17519:


I suspect two things:

1) When originally written, this code depended on the assumption that there was 
mutual exclusion when creating one of these tidy objects, and that assumption 
was later broken (or perhaps was always false);

2) A variant of this race condition was encountered by the simulator when 
validating Paxos, and I fixed it without paying much attention to get things 
moving (perhaps without even intending to properly fix it at the time, as there 
was too much to do), and then forgot about it.

I'll try to find time to perform a proper analysis of your report and the wider 
problems.

> races/leaks in SSTableReader::GlobalTidy
> 
>
> Key: CASSANDRA-17519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Jakub Zytka
>Assignee: Jakub Zytka
>Priority: Normal
> Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, 
> CASSANDRA-17519-4.1-test-exposing-the-problem.txt
>
>
> In Cassandra 4.0/3.11 there are at least two races in 
> SSTableReader::GlobalTidy
> One is a get/get race, explicitly handled as an assertion in:
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]
> and it looks like "ok, it's a problem, but let's just not fix it"
> The other one is get/tidy race between 
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]
> and
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]
>  
> The second one can be easily hit by adding a small delay at the beginning of 
> `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually 
> such failure is what prompted the investigation of GlobalTidy correctness)
> There was an attempt on `trunk` to fix these two races.
> The details are not clear to me, and it all looks quite weird. I might be 
> mistaken, but as far as I can see the relevant changes were introduced in:
> [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490]
> that is piggybacked on a huge change in CASSANDRA-17008, without a separate 
> ticket or any sort of qa.
> As far as I can see this attempt changes the first race into a leak, and the 
> second race to another race, this time allowing to have multiple GlobalTidy 
> objects for the same sstable (and, as a result, a premature running of 
> obsoletion code).
> I'll follow up with PRs for relevant branches etc etc



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy

2022-04-21 Thread Jakub Zytka (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakub Zytka updated CASSANDRA-17519:

Description: 
In Cassandra 4.0/3.11 there are at least two races in SSTableReader::GlobalTidy

One is a get/get race, explicitly handled as an assertion in:

[http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]

and it looks like "ok, it's a problem, but let's just not fix it"

The other one is get/tidy race between 

[http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]

and

[http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]

 

The second one can be easily hit by adding a small delay at the beginning of 
`tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually such 
failure is what prompted the investigation of GlobalTidy correctness)

There was an attempt on `trunk` to fix these two races.
The details are not clear to me, and it all looks quite weird. I might be 
mistaken, but as far as I can see the relevant changes were introduced in:
[https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490]
that is piggybacked on a huge change in CASSANDRA-17008, without a separate 
ticket or any sort of qa.

As far as I can see this attempt changes the first race into a leak, and the 
second race to another race, this time allowing to have multiple GlobalTidy 
objects for the same sstable (and, as a result, a premature running of 
obsoletion code).

I'll follow up with PRs for relevant branches etc etc

  was:
In Cassandra 4.0/3.11 there are at least two races in SSTableReader::GlobalTidy

One is a get/get race, explicitly handled as an assertion in:

[http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]

and it looks like "ok, it's a problem, but let's just not fix it"

The other one is get/tidy race between 

[http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]

and

[http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]

 

The second one can be easily hit by adding a small delay at the beginning of 
`tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually such 
failure is what prompted the investigation of GlobalTidy correctness)

There was an attempt on `trunk` to fix these two races.
The details are not clear to me, and it all looks quite weird. I might be 
mistaken, but as far as I can see the relevant changes were introduced in:
[https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490]
that is piggybacked on a huge change in CASSANDRA-17008, without a separate 
ticket or any sort of qa.



As far as I can see this attempt changes the first race into a leak, and the 
second race to another race, this time allowing to have multiple GlobalTidy 
objects for the same sstable.

I'll follow up with PRs for relevant branches etc etc


> races/leaks in SSTableReader::GlobalTidy
> 
>
> Key: CASSANDRA-17519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Jakub Zytka
>Assignee: Jakub Zytka
>Priority: Normal
> Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, 
> CASSANDRA-17519-4.1-test-exposing-the-problem.txt
>
>
> In Cassandra 4.0/3.11 there are at least two races in 
> SSTableReader::GlobalTidy
> One is a get/get race, explicitly handled as an assertion in:
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]
> and it looks like "ok, it's a problem, but let's just not fix it"
> The other one is get/tidy race between 
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]
> and
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]
>  
> The second one can be easily hit by adding a small delay at the beginning of 
> `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually 
> such failure is what

[jira] [Updated] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy

2022-04-21 Thread Jakub Zytka (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakub Zytka updated CASSANDRA-17519:

Test and Documentation Plan: a simple concurrency unit test is included
 Status: Patch Available  (was: Open)

> races/leaks in SSTableReader::GlobalTidy
> 
>
> Key: CASSANDRA-17519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Jakub Zytka
>Assignee: Jakub Zytka
>Priority: Normal
> Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, 
> CASSANDRA-17519-4.1-test-exposing-the-problem.txt
>
>
> In Cassandra 4.0/3.11 there are at least two races in 
> SSTableReader::GlobalTidy
> One is a get/get race, explicitly handled as an assertion in:
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]
> and it looks like "ok, it's a problem, but let's just not fix it"
> The other one is get/tidy race between 
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]
> and
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]
>  
> The second one can be easily hit by adding a small delay at the beginning of 
> `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually 
> such failure is what prompted the investigation of GlobalTidy correctness)
> There was an attempt on `trunk` to fix these two races.
> The details are not clear to me, and it all looks quite weird. I might be 
> mistaken, but as far as I can see the relevant changes were introduced in:
> [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490]
> that is piggybacked on a huge change in CASSANDRA-17008, without a separate 
> ticket or any sort of qa.
> As far as I can see this attempt changes the first race into a leak, and the 
> second race to another race, this time allowing to have multiple GlobalTidy 
> objects for the same sstable.
> I'll follow up with PRs for relevant branches etc etc



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy

2022-04-21 Thread Jakub Zytka (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakub Zytka updated CASSANDRA-17519:

 Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable 
Corruption / Loss(12986)
   Complexity: Normal
  Component/s: Legacy/Core
Discovered By: Unit Test
 Severity: Normal
   Status: Open  (was: Triage Needed)

I believe that the get/tidy race condition may end up in unexpected running the 
obsoletion code before it is due, potentially leading to some local data loss. 
Admittedly, I don't have a real-life scenario for that to happen.
The fact that a failure of the assertion that we had on 4.0 and earlier has not 
been seen in the wild suggests that the occurrence probability is very low. 
Still, I preferred to err on the safe side, and thus the bug has been 
categorized as a recoverable loss.

> races/leaks in SSTableReader::GlobalTidy
> 
>
> Key: CASSANDRA-17519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Jakub Zytka
>Assignee: Jakub Zytka
>Priority: Normal
> Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, 
> CASSANDRA-17519-4.1-test-exposing-the-problem.txt
>
>
> In Cassandra 4.0/3.11 there are at least two races in 
> SSTableReader::GlobalTidy
> One is a get/get race, explicitly handled as an assertion in:
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]
> and it looks like "ok, it's a problem, but let's just not fix it"
> The other one is get/tidy race between 
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]
> and
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]
>  
> The second one can be easily hit by adding a small delay at the beginning of 
> `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually 
> such failure is what prompted the investigation of GlobalTidy correctness)
> There was an attempt on `trunk` to fix these two races.
> The details are not clear to me, and it all looks quite weird. I might be 
> mistaken, but as far as I can see the relevant changes were introduced in:
> [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490]
> that is piggybacked on a huge change in CASSANDRA-17008, without a separate 
> ticket or any sort of qa.
> As far as I can see this attempt changes the first race into a leak, and the 
> second race to another race, this time allowing to have multiple GlobalTidy 
> objects for the same sstable.
> I'll follow up with PRs for relevant branches etc etc



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy

2022-04-21 Thread Jakub Zytka (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525860#comment-17525860
 ] 

Jakub Zytka commented on CASSANDRA-17519:
-

Hi,

I've never submitted a patch to Cassandra before, so please bear with me.

 

I attached 3 files:
 # test that exposes the described problem for `trunk`: 
[^CASSANDRA-17519-4.1-test-exposing-the-problem.txt]

 #  the actual fix for `trunk`: [^CASSANDRA-17519-4.1-fix.txt]
 # the test and fix squashed, for cassandra-4.0 (there are slight differences 
due to resource leak handling): [^CASSANDRA-17519-4.0.txt]

I took the liberty of putting comments liberally around the changed code. I 
think it's a good idea especially due to previous unsuccessful attempts to fix 
the code.

 

One thing that I did not do, but I think is worth considering is to run the 
obsoletion code *before* removing the relevant entry from the lookup table. It 
looks more natural and removes the potential for yet another race condition 
(currently the obsoletion code must not assume that no other obsoletion code 
for the same descriptor is running).
I understand that this is hardly possible, but I think that in general, it is 
safer to use the postulated order of execution - first obsoletion, and only 
then the removal from lookup.

 

[~benedict] you might be interested in doing the review, as you changed the 
GlobalTidy code recently. (also, [~samt] , who was the reviewer).

 

> races/leaks in SSTableReader::GlobalTidy
> 
>
> Key: CASSANDRA-17519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17519
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jakub Zytka
>Assignee: Jakub Zytka
>Priority: Normal
> Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, 
> CASSANDRA-17519-4.1-test-exposing-the-problem.txt
>
>
> In Cassandra 4.0/3.11 there are at least two races in 
> SSTableReader::GlobalTidy
> One is a get/get race, explicitly handled as an assertion in:
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]
> and it looks like "ok, it's a problem, but let's just not fix it"
> The other one is get/tidy race between 
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]
> and
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]
>  
> The second one can be easily hit by adding a small delay at the beginning of 
> `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually 
> such failure is what prompted the investigation of GlobalTidy correctness)
> There was an attempt on `trunk` to fix these two races.
> The details are not clear to me, and it all looks quite weird. I might be 
> mistaken, but as far as I can see the relevant changes were introduced in:
> [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490]
> that is piggybacked on a huge change in CASSANDRA-17008, without a separate 
> ticket or any sort of qa.
> As far as I can see this attempt changes the first race into a leak, and the 
> second race to another race, this time allowing to have multiple GlobalTidy 
> objects for the same sstable.
> I'll follow up with PRs for relevant branches etc etc



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy

2022-04-21 Thread Jakub Zytka (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakub Zytka updated CASSANDRA-17519:

Attachment: CASSANDRA-17519-4.0.txt

> races/leaks in SSTableReader::GlobalTidy
> 
>
> Key: CASSANDRA-17519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17519
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jakub Zytka
>Assignee: Jakub Zytka
>Priority: Normal
> Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, 
> CASSANDRA-17519-4.1-test-exposing-the-problem.txt
>
>
> In Cassandra 4.0/3.11 there are at least two races in 
> SSTableReader::GlobalTidy
> One is a get/get race, explicitly handled as an assertion in:
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]
> and it looks like "ok, it's a problem, but let's just not fix it"
> The other one is get/tidy race between 
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]
> and
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]
>  
> The second one can be easily hit by adding a small delay at the beginning of 
> `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually 
> such failure is what prompted the investigation of GlobalTidy correctness)
> There was an attempt on `trunk` to fix these two races.
> The details are not clear to me, and it all looks quite weird. I might be 
> mistaken, but as far as I can see the relevant changes were introduced in:
> [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490]
> that is piggybacked on a huge change in CASSANDRA-17008, without a separate 
> ticket or any sort of qa.
> As far as I can see this attempt changes the first race into a leak, and the 
> second race to another race, this time allowing to have multiple GlobalTidy 
> objects for the same sstable.
> I'll follow up with PRs for relevant branches etc etc



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17519) races/leaks in SSTableReader::GlobalTidy

2022-04-21 Thread Jakub Zytka (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakub Zytka updated CASSANDRA-17519:

Attachment: CASSANDRA-17519-4.1-fix.txt
CASSANDRA-17519-4.1-test-exposing-the-problem.txt

> races/leaks in SSTableReader::GlobalTidy
> 
>
> Key: CASSANDRA-17519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17519
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jakub Zytka
>Assignee: Jakub Zytka
>Priority: Normal
> Attachments: CASSANDRA-17519-4.1-fix.txt, 
> CASSANDRA-17519-4.1-test-exposing-the-problem.txt
>
>
> In Cassandra 4.0/3.11 there are at least two races in 
> SSTableReader::GlobalTidy
> One is a get/get race, explicitly handled as an assertion in:
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]
> and it looks like "ok, it's a problem, but let's just not fix it"
> The other one is get/tidy race between 
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]
> and
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]
>  
> The second one can be easily hit by adding a small delay at the beginning of 
> `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually 
> such failure is what prompted the investigation of GlobalTidy correctness)
> There was an attempt on `trunk` to fix these two races.
> The details are not clear to me, and it all looks quite weird. I might be 
> mistaken, but as far as I can see the relevant changes were introduced in:
> [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490]
> that is piggybacked on a huge change in CASSANDRA-17008, without a separate 
> ticket or any sort of qa.
> As far as I can see this attempt changes the first race into a leak, and the 
> second race to another race, this time allowing to have multiple GlobalTidy 
> objects for the same sstable.
> I'll follow up with PRs for relevant branches etc etc



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node

2022-04-21 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525847#comment-17525847
 ] 

Brandon Williams commented on CASSANDRA-17572:
--

It seems like the simplest thing to do would be to move the 
tokenMetadata.removeEndpoint to updateTokenMetadata, much like is being done 
with endpointsToRemove, that way we aren't invalidating the cache until the new 
IP has ownership.

> Race condition when IP address changes for a node can cause reads/writes to 
> route to the wrong node
> ---
>
> Key: CASSANDRA-17572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17572
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Sam Kramer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x
>
>
> Hi,
> We noticed that there is a race condition present in the trunk of 3.x code, 
> and confirmed that it’s there in 4.x as well, which will result in incorrect 
> reads, and missed writes, for a very short period of time.
> What brought the race condition to our attention was due to the fact we 
> started noticing a couple of missed writes for our Cassandra clusters in 
> Kubernetes. We found the Kubernetes piece interesting, as IP changes are very 
> frequent as opposed to a traditional setup.
> More concretely:
>  # When a Cassandra node is turned off, and then starts with a new IP address 
> Z (former IP address X), it announces to the cluster (via gossip) it has IP Z 
> for Host ID Y
>  # If there are no conflicts, each node will decide to remove the old IP 
> address associated with Host ID Y 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
>  from the storage ring. This also causes us to invalidate our token ring 
> cache 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
>  ).
>  # At this time, a new request could come in (read or write), and will 
> re-calculate which endpoints to send the request to, as we’ve invalidated our 
> token ring cache 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
>  # However, at this time we’ve only removed the IP address X (former IP 
> address), and have not re-added IP address Z.
>  # As a result, we will choose a new host to route our request to. In our 
> case, our keyspaces all run with NetworkTopologyStrategy, and so we simply 
> choose the node with the next closest token in the same rack as host Y 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]).
>  # Thus, the request is routed to a _different_ host, rather than the host 
> that has came back online.
>  # However, shortly later, we re-add the host (via it’s _new_ endpoint) to 
> the token ring 
> [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549]
>  # This will result in us invalidating our cache, and then again re-routing 
> requests appropriately.
> Couple of additional thoughts:
>  - This doesn’t affect clusters where nodes <= RF with network topology 
> strategy.
>  - During this very brief period of time, CL for all user queries are 
> violated, but are ACK’d as successful.
>  - It’s easy to reproduce this race condition by simply adding a sleep here 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
>  - If a cleanup is not ran before any range movement, it’s possible for rows 
> that were temporarily written to the wrong node re-appear. 
>  - We tested that the race condition exists in our Cassandra 2.x fork (we're 
> not on 3.x or 4.x). So, there is a possibility here that it's only for 
> Cassandra 2.x, however unlikely from reading the code. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17560) Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long

2022-04-21 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525843#comment-17525843
 ] 

David Capwell commented on CASSANDRA-17560:
---

thanks!  Pushed to the source branch linked above, and watching CI

> Migrate track_warnings to more standard naming conventions and use latest 
> configuration types rather than long
> --
>
> Key: CASSANDRA-17560
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17560
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.1
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Track warnings currently is nested which is discouraged at the moment.  It 
> also was before the config standards patch which moved storage typed longs to 
> a new DataStorageSpec type, we should migrate the configs there.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node

2022-04-21 Thread Sam Kramer (Jira)

Sam Kramer created CASSANDRA-17572:
--

 Summary: Race condition when IP address changes for a node can 
cause reads/writes to route to the wrong node
 Key: CASSANDRA-17572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-17572
 Project: Cassandra
  Issue Type: Bug
Reporter: Sam Kramer


Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup. 

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address 
Z, it announces to the cluster (via gossip) it has IP X for Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X, and have not 
re-added IP address Z. 

As a result, we will choose a new host to route our request to. In our case, 
our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the 
node with the next closest token in the same rack as host Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]).
 # Thus, the request is routed to a _different_ host, rather than the host that 
has came back online.
 # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the 
token ring 
[https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549]
 # This will result in us invalidating our cache, and then again re-routing 
requests appropriately.


Couple of additional thoughts:
- This doesn’t affect clusters where nodes <= RF with network topology 
strategy. 
- During this very brief period of time, CL for all user queries are violated, 
but are ACK’d as successful. 
- It’s easy to reproduce this race condition by simply adding a sleep here 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
- If a cleanup is not ran before any range movement, it’s possible for rows 
that were temporarily written to the wrong node re-appear. 
- We tested that the race condition exist in our Cassandra 2.x fork (we're not 
on 3.x or 4.x). So, there is a possibility here that it's on for Cassandra 2.x, 
however unlikely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node

2022-04-21 Thread Brandon Williams (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-17572:
-
 Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable 
Corruption / Loss(12986)
   Complexity: Normal
  Component/s: Cluster/Membership
Discovered By: User Report
Fix Version/s: 3.0.x
   3.11.x
   4.0.x
   4.x
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Race condition when IP address changes for a node can cause reads/writes to 
> route to the wrong node
> ---
>
> Key: CASSANDRA-17572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17572
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Sam Kramer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x
>
>
> Hi,
> We noticed that there is a race condition present in the trunk of 3.x code, 
> and confirmed that it’s there in 4.x as well, which will result in incorrect 
> reads, and missed writes, for a very short period of time.
> What brought the race condition to our attention was due to the fact we 
> started noticing a couple of missed writes for our Cassandra clusters in 
> Kubernetes. We found the Kubernetes piece interesting, as IP changes are very 
> frequent as opposed to a traditional setup.
> More concretely:
>  # When a Cassandra node is turned off, and then starts with a new IP address 
> Z (former IP address X), it announces to the cluster (via gossip) it has IP Z 
> for Host ID Y
>  # If there are no conflicts, each node will decide to remove the old IP 
> address associated with Host ID Y 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
>  from the storage ring. This also causes us to invalidate our token ring 
> cache 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
>  ).
>  # At this time, a new request could come in (read or write), and will 
> re-calculate which endpoints to send the request to, as we’ve invalidated our 
> token ring cache 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
>  # However, at this time we’ve only removed the IP address X (former IP 
> address), and have not re-added IP address Z.
>  # As a result, we will choose a new host to route our request to. In our 
> case, our keyspaces all run with NetworkTopologyStrategy, and so we simply 
> choose the node with the next closest token in the same rack as host Y 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]).
>  # Thus, the request is routed to a _different_ host, rather than the host 
> that has came back online.
>  # However, shortly later, we re-add the host (via it’s _new_ endpoint) to 
> the token ring 
> [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549]
>  # This will result in us invalidating our cache, and then again re-routing 
> requests appropriately.
> Couple of additional thoughts:
>  - This doesn’t affect clusters where nodes <= RF with network topology 
> strategy.
>  - During this very brief period of time, CL for all user queries are 
> violated, but are ACK’d as successful.
>  - It’s easy to reproduce this race condition by simply adding a sleep here 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
>  - If a cleanup is not ran before any range movement, it’s possible for rows 
> that were temporarily written to the wrong node re-appear. 
>  - We tested that the race condition exists in our Cassandra 2.x fork (we're 
> not on 3.x or 4.x). So, there is a possibility here that it's only for 
> Cassandra 2.x, however unlikely from reading the code. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17166) Enhance SnakeYAML properties to be reusable outside of YAML parsing, support camel case conversion to snake case, and add support to ignore properties

2022-04-21 Thread Caleb Rackliffe (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525840#comment-17525840
 ] 

Caleb Rackliffe commented on CASSANDRA-17166:
-

Rebase and additional cleanups LGTM

> Enhance SnakeYAML properties to be reusable outside of YAML parsing, support 
> camel case conversion to snake case, and add support to ignore properties
> --
>
> Key: CASSANDRA-17166
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17166
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> SnakeYaml is rather limited in the “object mapping” layer, which forces our 
> internal code to match specific patterns (all fields public and camel case); 
> we can remove this restriction by leveraging Jackson for property lookup, and 
> leaving the YAML handling to SnakeYAML



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node

2022-04-21 Thread Sam Kramer (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Kramer updated CASSANDRA-17572:
---
Description: 
Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup.

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address Z 
(former IP address X), it announces to the cluster (via gossip) it has IP Z for 
Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X, and have not 
re-added IP address Z.
 # As a result, we will choose a new host to route our request to. In our case, 
our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the 
node with the next closest token in the same rack as host Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]).
 # Thus, the request is routed to a _different_ host, rather than the host that 
has came back online.
 # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the 
token ring 
[https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549]
 # This will result in us invalidating our cache, and then again re-routing 
requests appropriately.

Couple of additional thoughts:
 - This doesn’t affect clusters where nodes <= RF with network topology 
strategy.
 - During this very brief period of time, CL for all user queries are violated, 
but are ACK’d as successful.
 - It’s easy to reproduce this race condition by simply adding a sleep here 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 - If a cleanup is not ran before any range movement, it’s possible for rows 
that were temporarily written to the wrong node re-appear. 
 - We tested that the race condition exists in our Cassandra 2.x fork (we're 
not on 3.x or 4.x). So, there is a possibility here that it's only for 
Cassandra 2.x, however unlikely from reading the code. 

  was:
Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup.

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address 
Z, it announces to the cluster (via gossip) it has IP X for Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X, and have not 
re-added IP address Z.
 # As a result, we will choose a new host to route our request to. In our case, 
our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the 
node wi

[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node

2022-04-21 Thread Sam Kramer (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Kramer updated CASSANDRA-17572:
---
Description: 
Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup.

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address Z 
(former IP address X), it announces to the cluster (via gossip) it has IP Z for 
Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X (former IP 
address), and have not re-added IP address Z.
 # As a result, we will choose a new host to route our request to. In our case, 
our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the 
node with the next closest token in the same rack as host Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]).
 # Thus, the request is routed to a _different_ host, rather than the host that 
has came back online.
 # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the 
token ring 
[https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549]
 # This will result in us invalidating our cache, and then again re-routing 
requests appropriately.

Couple of additional thoughts:
 - This doesn’t affect clusters where nodes <= RF with network topology 
strategy.
 - During this very brief period of time, CL for all user queries are violated, 
but are ACK’d as successful.
 - It’s easy to reproduce this race condition by simply adding a sleep here 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 - If a cleanup is not ran before any range movement, it’s possible for rows 
that were temporarily written to the wrong node re-appear. 
 - We tested that the race condition exists in our Cassandra 2.x fork (we're 
not on 3.x or 4.x). So, there is a possibility here that it's only for 
Cassandra 2.x, however unlikely from reading the code. 

  was:
Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup.

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address Z 
(former IP address X), it announces to the cluster (via gossip) it has IP Z for 
Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X, and have not 
re-added IP address Z.
 # As a result, we will choose a new host to route our request to. In our case, 
our keyspaces all run with NetworkTopologyStr

[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node

2022-04-21 Thread Sam Kramer (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Kramer updated CASSANDRA-17572:
---
Description: 
Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup.

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address 
Z, it announces to the cluster (via gossip) it has IP X for Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X, and have not 
re-added IP address Z.
 # As a result, we will choose a new host to route our request to. In our case, 
our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the 
node with the next closest token in the same rack as host Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]).
 # Thus, the request is routed to a _different_ host, rather than the host that 
has came back online.
 # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the 
token ring 
[https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549]
 # This will result in us invalidating our cache, and then again re-routing 
requests appropriately.

Couple of additional thoughts:
 - This doesn’t affect clusters where nodes <= RF with network topology 
strategy.
 - During this very brief period of time, CL for all user queries are violated, 
but are ACK’d as successful.
 - It’s easy to reproduce this race condition by simply adding a sleep here 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 - If a cleanup is not ran before any range movement, it’s possible for rows 
that were temporarily written to the wrong node re-appear. 
 - We tested that the race condition exists in our Cassandra 2.x fork (we're 
not on 3.x or 4.x). So, there is a possibility here that it's only for 
Cassandra 2.x, however unlikely from reading the code. 

  was:
Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup.

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address 
Z, it announces to the cluster (via gossip) it has IP X for Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X, and have not 
re-added IP address Z.
 # As a result, we will choose a new host to route our request to. In our case, 
our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the 
node with the next closest tok

[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node

2022-04-21 Thread Sam Kramer (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Kramer updated CASSANDRA-17572:
---
Description: 
Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup.

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address 
Z, it announces to the cluster (via gossip) it has IP X for Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X, and have not 
re-added IP address Z.
 # As a result, we will choose a new host to route our request to. In our case, 
our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the 
node with the next closest token in the same rack as host Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]).
 # Thus, the request is routed to a _different_ host, rather than the host that 
has came back online.
 # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the 
token ring 
[https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549]
 # This will result in us invalidating our cache, and then again re-routing 
requests appropriately.

Couple of additional thoughts:
 - This doesn’t affect clusters where nodes <= RF with network topology 
strategy.
 - During this very brief period of time, CL for all user queries are violated, 
but are ACK’d as successful.
 - It’s easy to reproduce this race condition by simply adding a sleep here 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 - If a cleanup is not ran before any range movement, it’s possible for rows 
that were temporarily written to the wrong node re-appear. 
 - We tested that the race condition exists in our Cassandra 2.x fork (we're 
not on 3.x or 4.x). So, there is a possibility here that it's on for Cassandra 
2.x, however unlikely. 

  was:
Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup.

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address 
Z, it announces to the cluster (via gossip) it has IP X for Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X, and have not 
re-added IP address Z.
 # As a result, we will choose a new host to route our request to. In our case, 
our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the 
node with the next closest token in the same rack as h

[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node

2022-04-21 Thread Sam Kramer (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Kramer updated CASSANDRA-17572:
---
Description: 
Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup.

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address 
Z, it announces to the cluster (via gossip) it has IP X for Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X, and have not 
re-added IP address Z.
 # As a result, we will choose a new host to route our request to. In our case, 
our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the 
node with the next closest token in the same rack as host Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]).
 # Thus, the request is routed to a _different_ host, rather than the host that 
has came back online.
 # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the 
token ring 
[https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549]
 # This will result in us invalidating our cache, and then again re-routing 
requests appropriately.

Couple of additional thoughts:
 - This doesn’t affect clusters where nodes <= RF with network topology 
strategy.
 - During this very brief period of time, CL for all user queries are violated, 
but are ACK’d as successful.
 - It’s easy to reproduce this race condition by simply adding a sleep here 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 - If a cleanup is not ran before any range movement, it’s possible for rows 
that were temporarily written to the wrong node re-appear. 
 - We tested that the race condition exist in our Cassandra 2.x fork (we're not 
on 3.x or 4.x). So, there is a possibility here that it's on for Cassandra 2.x, 
however unlikely. 

  was:
Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup.

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address 
Z, it announces to the cluster (via gossip) it has IP X for Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X, and have not 
re-added IP address Z.
 # As a result, we will choose a new host to route our request to. 

In our case, our keyspaces all run with NetworkTopologyStrategy, and so we 
simply choose the node with the next closest token in the same rack as h

[jira] [Updated] (CASSANDRA-17572) Race condition when IP address changes for a node can cause reads/writes to route to the wrong node

2022-04-21 Thread Sam Kramer (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Kramer updated CASSANDRA-17572:
---
Description: 
Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup.

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address 
Z, it announces to the cluster (via gossip) it has IP X for Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X, and have not 
re-added IP address Z.
 # As a result, we will choose a new host to route our request to. 

In our case, our keyspaces all run with NetworkTopologyStrategy, and so we 
simply choose the node with the next closest token in the same rack as host Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]).
 # Thus, the request is routed to a _different_ host, rather than the host that 
has came back online.
 # However, shortly later, we re-add the host (via it’s _new_ endpoint) to the 
token ring 
[https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549]
 # This will result in us invalidating our cache, and then again re-routing 
requests appropriately.

Couple of additional thoughts:
 - This doesn’t affect clusters where nodes <= RF with network topology 
strategy.
 - During this very brief period of time, CL for all user queries are violated, 
but are ACK’d as successful.
 - It’s easy to reproduce this race condition by simply adding a sleep here 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 - If a cleanup is not ran before any range movement, it’s possible for rows 
that were temporarily written to the wrong node re-appear. 
 - We tested that the race condition exist in our Cassandra 2.x fork (we're not 
on 3.x or 4.x). So, there is a possibility here that it's on for Cassandra 2.x, 
however unlikely. 

  was:
Hi,

We noticed that there is a race condition present in the trunk of 3.x code, and 
confirmed that it’s there in 4.x as well, which will result in incorrect reads, 
and missed writes, for a very short period of time.

What brought the race condition to our attention was due to the fact we started 
noticing a couple of missed writes for our Cassandra clusters in Kubernetes. We 
found the Kubernetes piece interesting, as IP changes are very frequent as 
opposed to a traditional setup. 

More concretely:
 # When a Cassandra node is turned off, and then starts with a new IP address 
Z, it announces to the cluster (via gossip) it has IP X for Host ID Y
 # If there are no conflicts, each node will decide to remove the old IP 
address associated with Host ID Y 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
 from the storage ring. This also causes us to invalidate our token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
 ).
 # At this time, a new request could come in (read or write), and will 
re-calculate which endpoints to send the request to, as we’ve invalidated our 
token ring cache 
([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
 # However, at this time we’ve only removed the IP address X, and have not 
re-added IP address Z. 

As a result, we will choose a new host to route our request to. In our case, 
our keyspaces all run with NetworkTopologyStrategy, and so we simply choose the 
node with the next closest token in the same rack as h

[jira] [Commented] (CASSANDRA-16456) Add Plugin Support for CQLSH

2022-04-21 Thread Brian Houser (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525830#comment-17525830
 ] 

Brian Houser commented on CASSANDRA-16456:
--

Thanks for the notes I'll update the code.

Hmm ok let me explain my thinking


Cqlsh.py is in charge of parsing stuff at the command line level, and 
processing the legacy authentication section.
>From this it gets a user name and password.  At this point my thinking was 
>that it should work exactly as it did before:
 * if there is a username, but no password, it should prompt for a password.  
 * If there is no username, no password and no auth_provider, it should just 
use None for auth provider
 * If there is a username and a password, it should use it directly.

If you are specifying a new AuthProvider (that is something that isn't 
PlainTextAuthProvider) than the convention is very simple...
 * Get the module and class name form the auth_provider section of the cqlshrc 
file
 * Get additional properties from any properties left in [auth_provider]section 
of the cqlshrc file
 * Get additional properties from everything in the credentials section labeled 
with the auth_provider class name.

For example... 

If I am using the FooAuthProvider...  my cqlshrc file would look like this...

```

[auth_provider]

module = foo.foo

classname = FooAuthProvider

prop1 = value1

```

 

My credentials file might look like this...

```

[FooAuthProvider]

prop2= value2

```

FooAuthProvider would get called with the name prop1, prop2.  Notice that if 
there is no auth_provider section in cqlshrc file specifying what you want to 
load... the credentials file won't find any properties.  You need to specify an 
auth_provider to use the "new school" way of loading the credentials file. 

The whole intent of specifying the auth provider name in the credentials file 
seemed to be to allow there to be different credentials in one place depending 
on the auth provider specified.  In keeping with python convention, I was 
trying to force you to be specific if you were going to use the new way of 
loading stuff... since this is meant for custom loading of auth providers.  
There's already a legacy case for authentication section, specifying the 
username on the command line. 

It seems you want it to default to PlainTextAuthProvider in all cases when auth 
provider isn't specified, I can do that pretty easily in the Authhandling bit.  
In which case, if you don't specify any provider in the cqlshrc file, I'll 
assume you meant PlainTextAuthProvider, and pull it from the credentials file 
if it exists and no other auth_provider is specified.

I appreciate that you provided a fix for your concern, but unfortunately it's 
easy to see this creating a clash with newer providers.  If a provider happens 
to use a property called 'username' with the fix you propose, I'll end up 
loading the plaintextauth provider instead of the one specified, which would be 
pretty confusing.  I'd rather shove any new logic into the authhandling piece 
where it can be unit tested more easily.

> Add Plugin Support for CQLSH
> 
>
> Key: CASSANDRA-16456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16456
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/cqlsh
>Reporter: Brian Houser
>Assignee: Brian Houser
>Priority: Normal
>  Labels: gsoc2021, mentor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently the Cassandra drivers offer a plugin authenticator architecture for 
> the support of different authentication methods. This has been leveraged to 
> provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, 
> cqlsh, the included CLI tool, does not offer such support. Switching to a new 
> enhanced authentication scheme thus means being cut off from using cqlsh in 
> normal operation.
> We should have a means of using the same plugins and authentication providers 
> as the Python Cassandra driver.
> Here's a link to an initial draft of 
> [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17565) Fix test_parallel_upgrade_with_internode_ssl

2022-04-21 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525815#comment-17525815
 ] 

Brandon Williams commented on CASSANDRA-17565:
--

Here are the branches and precommit CI on circle:

||Branch||Precommit CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-17565-4.0]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/447/workflows/d512f2ec-d340-4075-9c3c-22099d23c73c],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/447/workflows/5a5204f5-fa28-40b0-9f56-64765601999f]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-17565-trunk]|[j8|https://app.circleci.com/pipelines/github/driftx/cassandra/446/workflows/6796d66c-c1db-4a32-80cf-d069100fe19a],
 
[j11|https://app.circleci.com/pipelines/github/driftx/cassandra/446/workflows/cf5c2d25-274e-4821-91dc-aabf9c5ad986]|


> Fix test_parallel_upgrade_with_internode_ssl
> 
>
> Key: CASSANDRA-17565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17565
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.x
>
>
> While working on CASSANDRA-17341 I hit this flaky test, very rarely failing 
> but it is failing on trunk.
> More info in this CI run:
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/1563/workflows/61bda0b7-f699-4897-877f-c7d523a03127/jobs/10318



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17571) Config upper bound should be handled earlier

2022-04-21 Thread Ekaterina Dimitrova (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525782#comment-17525782
 ] 

Ekaterina Dimitrova commented on CASSANDRA-17571:
-

Marking as 4.1 block as there was a discussion to add extended classes for Int 
to handle old int parameters upper bound and changing those in Config will be 
considered breaking change after a release. CC [~dcapwell]  and [~maedhroz] and 
[~mck] 

I will push the suggested classes in the next few hours for approval before 
moving any config to them. 

> Config upper bound should be handled earlier
> 
>
> Key: CASSANDRA-17571
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17571
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.1
>
>
> Config upper bound should be handled on startup/config setup and not during 
> conversion



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17571) Config upper bound should be handled earlier

2022-04-21 Thread Ekaterina Dimitrova (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-17571:

Fix Version/s: 4.1
   (was: 4.x)

> Config upper bound should be handled earlier
> 
>
> Key: CASSANDRA-17571
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17571
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.1
>
>
> Config upper bound should be handled on startup/config setup and not during 
> conversion



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17571) Config upper bound should be handled earlier

2022-04-21 Thread Ekaterina Dimitrova (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-17571:

Fix Version/s: 4.x

> Config upper bound should be handled earlier
> 
>
> Key: CASSANDRA-17571
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17571
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.x
>
>
> Config upper bound should be handled on startup/config setup and not during 
> conversion



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17571) Config upper bound should be handled earlier

2022-04-21 Thread Ekaterina Dimitrova (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-17571:

 Bug Category: Parent values: Code(13163)
   Complexity: Low Hanging Fruit
  Component/s: Local/Config
Discovered By: User Report
 Severity: Low
 Assignee: Ekaterina Dimitrova
   Status: Open  (was: Triage Needed)

> Config upper bound should be handled earlier
> 
>
> Key: CASSANDRA-17571
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17571
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
>
> Config upper bound should be handled on startup/config setup and not during 
> conversion



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-17571) Config upper bound should be handled earlier

2022-04-21 Thread Ekaterina Dimitrova (Jira)

Ekaterina Dimitrova created CASSANDRA-17571:
---

 Summary: Config upper bound should be handled earlier
 Key: CASSANDRA-17571
 URL: https://issues.apache.org/jira/browse/CASSANDRA-17571
 Project: Cassandra
  Issue Type: Bug
Reporter: Ekaterina Dimitrova


Config upper bound should be handled on startup/config setup and not during 
conversion



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17563) Fix CircleCI Midres config

2022-04-21 Thread Ekaterina Dimitrova (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525764#comment-17525764
 ] 

Ekaterina Dimitrova commented on CASSANDRA-17563:
-

My previous point still stands that we need to be sure everything on all 
branches works as before. 

IMHO what we really need to see is that every new highres and midres on every 
branch has only an addition of a new job in_jvm which match as resource usage 
the old single in_jvm job and no other changes applied. 

Thanks for all the updates and adding the docs, etc. I understand and really 
appreciate your good intentions for improvement and ease of maintenance. 
Unfortunately, I have a few immediate concerns w which make me think we need to 
have immediate fix for midres and rewrites of the scripts after the release:
 * There are changes to the other generate.sh script I haven't looked at but 
any change there need to be tested that it didn't break any of the options 
added and tested one by one by [~adelapena] 
 * My concern is we don't know who was using what and how and it was working 
fine for quite some time. Do we want to rewrite the whole approach one week 
before freeze when people highly utilize CI to push their latest work? What do 
others think?
 * Also, now we will have a mix of python and shell scripts, are we sure the 
community will accept that? 

I really like and appreciate how you added diff but I am confused  from the 
output what I am seeing actually. I see the new name and resource change.

 

> Fix CircleCI Midres config
> --
>
> Key: CASSANDRA-17563
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17563
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.1
>
>
> During CircleCI addition of a new job to the config, the midres file got 
> messy. Two of the immediate issues (but we need to verify all jobs will use 
> the right executors and resources):
>  * the new job needs to use higher parallelism as the original in-jvm job
>  *  j8_dtests_with_vnodes should get from midres 50 large but currently 
> midres makes it run with 25 and medium which fails around 100 tests



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17568) Implement nodetool command to list data directories of existing tables

2022-04-21 Thread Tibor Repasi (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525749#comment-17525749
 ] 

Tibor Repasi commented on CASSANDRA-17568:
--

I've addressed all review comments for now and looking forward to add an option 
listing the orphaned directories.

> Implement nodetool command to list data directories of existing tables
> --
>
> Key: CASSANDRA-17568
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17568
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/nodetool
>Reporter: Tibor Repasi
>Assignee: Tibor Repasi
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> When a table is created, dropped and re-created with the same name, 
> directories remain within data paths. Operators may be challenged finding out 
> which directories belong to existing tables and which may be subject to 
> removal. However, the information is available in CQL as well as in MBeans 
> via JMX, a convenient access to this information is still missing.
> My proposal is a new nodetool subcommand allowing to list data paths of all 
> existing tables.
> {code}
> % bin/nodetool datapaths -- example
> Keyspace : example
>   Table : test
>   Paths :
>   
> /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17564) Add synchronization to wait for outstanding tasks in the compaction executor and nonPeriodicTasks during CassandraDaemon setup

2022-04-21 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525712#comment-17525712
 ] 

Stefan Miklosovic commented on CASSANDRA-17564:
---

units passed 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/929/workflows/a33ef23d-0ae1-4a5d-9a36-a55f914f484f

> Add synchronization to wait for outstanding tasks in the compaction executor 
> and nonPeriodicTasks during CassandraDaemon setup
> --
>
> Key: CASSANDRA-17564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17564
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Haoze Wu
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have been testing Cassandra 3.11.10 for a while. During a node start, we 
> found that a synchronization guarantee implied by the code comments is not 
> enforced. Specifically, in the `invalidate` method called in this call stack 
> (in version 3.11.10):
> {code:java}
> org.apache.cassandra.service.CassandraDaemon#main:786
> org.apache.cassandra.service.CassandraDaemon#activate:633
> org.apache.cassandra.service.CassandraDaemon#setup:261
> org.apache.cassandra.schema.LegacySchemaMigrator#migrate:83
> org.apache.cassandra.schema.LegacySchemaMigrator#unloadLegacySchemaTables:137
> java.lang.Iterable#forEach:75
> org.apache.cassandra.schema.LegacySchemaMigrator#lambda$unloadLegacySchemaTables$1:137
> org.apache.cassandra.db.ColumnFamilyStore#invalidate:542 {code}
> In line 564~570 within `public void invalidate(boolean expectMBean)`:
> {code:java}
>         latencyCalculator.cancel(false);
>         compactionStrategyManager.shutdown();
>         SystemKeyspace.removeTruncationRecord(metadata.cfId);  // line 566
>         data.dropSSTables();                                   // line 568
>         LifecycleTransaction.waitForDeletions();               // line 569
>         indexManager.invalidateAllIndexesBlocking();
> {code}
> According to the code and the comments, we suppose `data.dropSSTables()` in 
> line 568 will submit some tidier tasks to the `nonPeriodicTasks` thread pool. 
> Call stack in version 3.11.10:
> {code:java}
> org.apache.cassandra.db.lifecycle.Tracker#dropSSTables:233
> org.apache.cassandra.db.lifecycle.Tracker#dropSSTables:238
> org.apache.cassandra.db.lifecycle.Tracker#dropSSTables:267
> org.apache.cassandra.utils.concurrent.Refs#release:241
> org.apache.cassandra.utils.concurrent.Ref#release:119
> org.apache.cassandra.utils.concurrent.Ref#release:225
> org.apache.cassandra.utils.concurrent.Ref#release:326
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier#tidy:2205 
> {code}
> Then, `LifecycleTransaction.waitForDeletions()` in line 569 is
> {code:java}
>     /**
>      * Deletions run on the nonPeriodicTasks executor, (both failedDeletions 
> or global tidiers in SSTableReader)
>      * so by scheduling a new empty task and waiting for it we ensure any 
> prior deletion has completed.
>      */
>     public static void waitForDeletions()
>     {
>         LogTransaction.waitForDeletions();
>     }
> {code}
> And then call `waitForDeletions` in `LogTransaction`:
> {code:java}
>     static void waitForDeletions()
>     {
>         
> FBUtilities.waitOnFuture(ScheduledExecutors.nonPeriodicTasks.schedule(Runnables.doNothing(),
>  0, TimeUnit.MILLISECONDS));
>     }
> {code}
> From the comments, we think it ensures that all existing tasks in 
> `nonPeriodicTasks` are drained. However, we found some tidier tasks are still 
> running in `nonPeriodicTasks` thread pool.
> We suspect that those tidier tasks should be guaranteed to finish during 
> server setup, because of its exception handling. In version 3.11.10, these 
> tidier tasks are submitted to `nonPeriodicTasks` in 
> `SSTableReader$InstanceTidier#tidy:2205`, and have the exception handling 
> `FileUtils.handleFSErrorAndPropagate(new FSWriteError(e, file))` (within the 
> call stack `SSTableReader$InstanceTidier$1#run:2223` => 
> `LogTransaction$SSTableTidier#run:386` => `LogTransaction#delete:261`).
> The `FileUtils.handleFSErrorAndPropagate` handles this `FSWriteError`. We 
> found that it checks the `CassandraDaemon.setupCompleted` flag in call stack 
> within (`FileUtils#handleFSErrorAndPropagate:507` => 
> `JVMStabilityInspector#inspectThrowable:60` => 
> `JVMStabilityInspector#inspectThrowable:106` => 
> `JVMStabilityInspector#inspectDiskError:73` => `FileUtils#handleFSError:494` 
> => `DefaultFSErrorHandler:handleFSError:58`)
> {code:java}
>         if (!StorageService.instance.isDaemonSetupCompleted())  // line 58
>             handleStartupFSError(e);

[jira] [Commented] (CASSANDRA-17564) Add synchronization to wait for outstanding tasks in the compaction executor and nonPeriodicTasks during CassandraDaemon setup

2022-04-21 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525672#comment-17525672
 ] 

Stefan Miklosovic commented on CASSANDRA-17564:
---

Hi [~functioner]

Would you please try if this is any better for you? (1)

I think the issue is that if, hypothetically, that runnable submitted in 
InstanceTidier throws, the global ref will never be released.

cc [~benedict]

(1) https://github.com/instaclustr/cassandra/tree/CASSANDRA-17564

> Add synchronization to wait for outstanding tasks in the compaction executor 
> and nonPeriodicTasks during CassandraDaemon setup
> --
>
> Key: CASSANDRA-17564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17564
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Haoze Wu
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have been testing Cassandra 3.11.10 for a while. During a node start, we 
> found that a synchronization guarantee implied by the code comments is not 
> enforced. Specifically, in the `invalidate` method called in this call stack 
> (in version 3.11.10):
> {code:java}
> org.apache.cassandra.service.CassandraDaemon#main:786
> org.apache.cassandra.service.CassandraDaemon#activate:633
> org.apache.cassandra.service.CassandraDaemon#setup:261
> org.apache.cassandra.schema.LegacySchemaMigrator#migrate:83
> org.apache.cassandra.schema.LegacySchemaMigrator#unloadLegacySchemaTables:137
> java.lang.Iterable#forEach:75
> org.apache.cassandra.schema.LegacySchemaMigrator#lambda$unloadLegacySchemaTables$1:137
> org.apache.cassandra.db.ColumnFamilyStore#invalidate:542 {code}
> In line 564~570 within `public void invalidate(boolean expectMBean)`:
> {code:java}
>         latencyCalculator.cancel(false);
>         compactionStrategyManager.shutdown();
>         SystemKeyspace.removeTruncationRecord(metadata.cfId);  // line 566
>         data.dropSSTables();                                   // line 568
>         LifecycleTransaction.waitForDeletions();               // line 569
>         indexManager.invalidateAllIndexesBlocking();
> {code}
> According to the code and the comments, we suppose `data.dropSSTables()` in 
> line 568 will submit some tidier tasks to the `nonPeriodicTasks` thread pool. 
> Call stack in version 3.11.10:
> {code:java}
> org.apache.cassandra.db.lifecycle.Tracker#dropSSTables:233
> org.apache.cassandra.db.lifecycle.Tracker#dropSSTables:238
> org.apache.cassandra.db.lifecycle.Tracker#dropSSTables:267
> org.apache.cassandra.utils.concurrent.Refs#release:241
> org.apache.cassandra.utils.concurrent.Ref#release:119
> org.apache.cassandra.utils.concurrent.Ref#release:225
> org.apache.cassandra.utils.concurrent.Ref#release:326
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier#tidy:2205 
> {code}
> Then, `LifecycleTransaction.waitForDeletions()` in line 569 is
> {code:java}
>     /**
>      * Deletions run on the nonPeriodicTasks executor, (both failedDeletions 
> or global tidiers in SSTableReader)
>      * so by scheduling a new empty task and waiting for it we ensure any 
> prior deletion has completed.
>      */
>     public static void waitForDeletions()
>     {
>         LogTransaction.waitForDeletions();
>     }
> {code}
> And then call `waitForDeletions` in `LogTransaction`:
> {code:java}
>     static void waitForDeletions()
>     {
>         
> FBUtilities.waitOnFuture(ScheduledExecutors.nonPeriodicTasks.schedule(Runnables.doNothing(),
>  0, TimeUnit.MILLISECONDS));
>     }
> {code}
> From the comments, we think it ensures that all existing tasks in 
> `nonPeriodicTasks` are drained. However, we found some tidier tasks are still 
> running in `nonPeriodicTasks` thread pool.
> We suspect that those tidier tasks should be guaranteed to finish during 
> server setup, because of its exception handling. In version 3.11.10, these 
> tidier tasks are submitted to `nonPeriodicTasks` in 
> `SSTableReader$InstanceTidier#tidy:2205`, and have the exception handling 
> `FileUtils.handleFSErrorAndPropagate(new FSWriteError(e, file))` (within the 
> call stack `SSTableReader$InstanceTidier$1#run:2223` => 
> `LogTransaction$SSTableTidier#run:386` => `LogTransaction#delete:261`).
> The `FileUtils.handleFSErrorAndPropagate` handles this `FSWriteError`. We 
> found that it checks the `CassandraDaemon.setupCompleted` flag in call stack 
> within (`FileUtils#handleFSErrorAndPropagate:507` => 
> `JVMStabilityInspector#inspectThrowable:60` => 
> `JVMStabilityInspector#inspectThrowable:106` => 
> `JVMStabilityInspector#inspectDiskError:73` => `FileUtils#handleFSError:494` 
> => `DefaultFSErrorHandler:hand

[jira] [Commented] (CASSANDRA-17570) Update the CQL version for the 4.1 release

2022-04-21 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525668#comment-17525668
 ] 

Brandon Williams commented on CASSANDRA-17570:
--

bq. we should probably consider removing the CQL.textile file.

This is also probably a good idea because it's the only file in that format in 
the repo (it's 10 years old)

> Update the CQL version for the 4.1 release
> --
>
> Key: CASSANDRA-17570
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17570
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Syntax
>Reporter: Benjamin Lerer
>Priority: Normal
> Fix For: 4.1
>
>
> We made several changes to CQL during that version. We need to document those 
> changes in the {{CQL.textile}} file and update the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-11871) Allow to aggregate by time intervals

2022-04-21 Thread Jira



[ 
https://issues.apache.org/jira/browse/CASSANDRA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525667#comment-17525667
 ] 

Andres de la Peña commented on CASSANDRA-11871:
---

One last detail, we should probably add an entry on {{NEWS.txt}}.

> Allow to aggregate by time intervals
> 
>
> Key: CASSANDRA-11871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11871
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Benjamin Lerer
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> For time series data it can be usefull to aggregate by time intervals.
> The idea would be to add support for one or several functions in the {{GROUP 
> BY}} clause.
> Regarding the implementation, even if in general I also prefer to follow the 
> SQL syntax, I do not believe it will be a good fit for Cassandra.
> If we have a table like:
> {code}
> CREATE TABLE trades
> {
> symbol text,
> date date,
> time time,
> priceMantissa int,
> priceExponent tinyint,
> volume int,
> PRIMARY KEY ((symbol, date), time)
> };
> {code}
> The trades will be inserted with an increasing time and sorted in the same 
> order. As we can have to process a large amount of data, we want to try to 
> limit ourself to the cases where we can build the groups on the flight (which 
> is not a requirement in the SQL world).
> If we want to get the number of trades per minutes with the SQL syntax we 
> will have to write:
> {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' 
> AND date = '2016-01-11' GROUP BY hour(time), minute(time);}}
> which is fine. The problem is that if the user invert by mistake the 
> functions like that:
> {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' 
> AND date = '2016-01-11' GROUP BY minute(time), hour(time);}}
> the query will return weird results.
> The only way to prevent that would be to check the function order and make 
> sure that we do not allow to skip functions (e.g. {{GROUP BY hour(time), 
> second(time)}}).
> In my opinion a function like {{floor(, )}} will be 
> much better as it does not allow for this type of mistakes and is much more 
> flexible (you can create 5 minutes buckets if you want to).
> {{SELECT floor(time, m), count() FROM Trades WHERE symbol = 'AAPL' AND date = 
> '2016-01-11' GROUP BY floor(time, m);}}
> An important aspect to keep in mind with a function like {{floor}} is the 
> starting point. For a query like:  {{SELECT floor(time, m), count() FROM 
> Trades WHERE symbol = 'AAPL' AND date = '2016-01-11' AND time >= '01:30:00' 
> AND time =< '07:30:00' GROUP BY floor(time, 2h);}}, I think that ideally the 
> result should return 3 groups: {{01:30:00}}, {{03:30:00}} and {{05:30:00}}.  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17570) Update the CQL version for the 4.1 release

2022-04-21 Thread Benjamin Lerer (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525665#comment-17525665
 ] 

Benjamin Lerer commented on CASSANDRA-17570:


Thanks [~brandon.williams]. 
It seems that we have now 2 sources for the CQL the CQL.textile file and the 
documentation and none of them is accurate. The doc seems nevertheless better 
so we should probably consider removing the {{CQL.textile}} file. 
The CQL version need to be set to 3.4.6 and the change for that version will 
need to be mentioned.
 

> Update the CQL version for the 4.1 release
> --
>
> Key: CASSANDRA-17570
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17570
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Syntax
>Reporter: Benjamin Lerer
>Priority: Normal
> Fix For: 4.1
>
>
> We made several changes to CQL during that version. We need to document those 
> changes in the {{CQL.textile}} file and update the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17212) Migrate threshold for minimum keyspace replication factor to guardrails

2022-04-21 Thread Jira



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525662#comment-17525662
 ] 

Andres de la Peña commented on CASSANDRA-17212:
---

[~jmckenzie] I'm not sure we want to always exclude system keyspaces. For 
example, we probably want to apply the guardrails for restrictions on IN 
queries even when querying system tables, since those queries can be quite 
harmful (see CASSANDRA-17187 and CASSANDRA-17186). I guess that our main reason 
to exclude system keyspaces in cases such as the guardrail for disabling 
{{ALLOW FILTERING}} is that drivers might internally use the guarded queries 
for doing their thing.

> Migrate threshold for minimum keyspace replication factor to guardrails
> ---
>
> Key: CASSANDRA-17212
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17212
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Guardrails
>Reporter: Andres de la Peña
>Assignee: Savni Nagarkar
>Priority: Normal
> Fix For: 4.x
>
>
> The config property 
> [{{minimum_keyspace_rf}}|https://github.com/apache/cassandra/blob/5fdadb25f95099b8945d9d9ee11d3e380d3867f4/conf/cassandra.yaml]
>  that was added by CASSANDRA-14557 can be migrated to guardrails, for example:
> {code}
> guardrails:
> ...
> replication_factor:
> warn_threshold: 2
> abort_threshold: 3
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17370) Add flag enabling operators to restrict use of ALLOW FILTERING in queries

2022-04-21 Thread Jira



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525651#comment-17525651
 ] 

Andres de la Peña commented on CASSANDRA-17370:
---

Looks good to me, +1

> Add flag enabling operators to restrict use of ALLOW FILTERING in queries
> -
>
> Key: CASSANDRA-17370
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17370
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics, Feature/Guardrails
>Reporter: Savni Nagarkar
>Assignee: Savni Nagarkar
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This ticket adds the ability for operators to disallow use of ALLOW FILTERING 
> predicates in CQL SELECT statements. As queries that ALLOW FILTERING can 
> place additional load on the database, the flag enables operators to provide 
> tighter bounds on performance guarantees. The patch includes a new yaml 
> property, as well as a hot property enabling the value to be modified via JMX 
> at runtime.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17570) Update the CQL version for the 4.1 release

2022-04-21 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525650#comment-17525650
 ] 

Brandon Williams commented on CASSANDRA-17570:
--

Just fyi, cqlsh was bumped for this in CASSANDRA-17432.

> Update the CQL version for the 4.1 release
> --
>
> Key: CASSANDRA-17570
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17570
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Syntax
>Reporter: Benjamin Lerer
>Priority: Normal
> Fix For: 4.1
>
>
> We made several changes to CQL during that version. We need to document those 
> changes in the {{CQL.textile}} file and update the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15510) BTree: Improve Building, Inserting and Transforming

2022-04-21 Thread Benjamin Lerer (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525645#comment-17525645
 ] 

Benjamin Lerer commented on CASSANDRA-15510:


Sorry, I was focussing on porting CASSANDRA-15511.
I will merge the changes and run CI. If I remember correctly I think that the 
patch broke some tests.
We also need to run CI on Jenkins as I do not think that CircleCi can run the 
burn tests.

> BTree: Improve Building, Inserting and Transforming
> ---
>
> Key: CASSANDRA-15510
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15510
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Other
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0.x, 4.x
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> This work was originally undertaken as a follow-up to CASSANDRA-15367 to 
> ensure performance is strictly improved, but it may no longer be needed for 
> that purpose.  It’s still hugely impactful, however.  It remains to be 
> decided where this should land.
> The current {{BTree}} implementation is suboptimal in a number of ways, with 
> very little focus having been given to its performance besides its 
> memory-occupancy.  This patch aims to address that, specifically improving 
> the performance and allocations involved in: building, transforming and 
> inserting into a tree.
> To facilitate this work, the {{BTree}} definition is modified slightly, so 
> that we can perform some simple arithmetic on tree sizes.  Specifically, 
> trees of depth n are defined to have a maximum capacity of {{branchFactor^n - 
> 1}}, which translates into capping the number of leaf children at 
> {{branchFactor-1}}, as opposed to {{branchFactor}}.  Since {{branchFactor}} 
> is a power of 2, this permits fast tree size arithmetic, enabling some of 
> these changes.
> h2. Building
> The static build method has been modified to utilise dedicated 
> {{buildPerfect}} methods that build either perfectly dense or perfectly 
> sparse sub-trees.  These perfect trees all share their {{sizeMap}} with each 
> other, and can be built more efficiently than trees of arbitrary size.  The 
> specifics are described in detail in the comments, but this building block 
> can be used to construct trees of any size, using at most one child at each 
> level that is not either perfectly sparse or perfectly dense.  Bulk methods 
> are used where possible.
> For large trees this can produce up to 30x throughput improvement and 30% 
> allocation reduction vs 3.0 (TBC, and to be tested vs 4.0).
> {{FastBuilder}} is introduced for building a tree in-order (or in reverse) 
> without duplicate elements to resolve, without necessarily knowing the size 
> upfront.  This meets the needs of most use cases.  Data is built directly 
> into nodes, with up to one already-constructed node, and one partially 
> constructed node, on each level, being mutated to share their contents in the 
> event of insufficient data to populate the tree.  These builders are 
> thread-locally shared.  These leads to minimal copying, the same sharing of 
> {{sizeMap}} as above, zero wasted allocations, and results in minimal 
> difference in performance between utilising the less-ergonomic static build 
> and builder approach.
> For large trees this leads to ~4.5x throughput improvement, and 70% reduction 
> in allocations vs a normal Builder.  For small trees performance is 
> comparable, but allocations similarly reduced.
> h2. Inserting
> It turns out that we only ever insert another tree into a tree, so we exploit 
> this to implement an efficient union of two trees, operating on them directly 
> via stacks in the transformer, instead of via a collection interface.  A 
> builder-like object is introduced that shares functionality with 
> {{FastBuilder}}, and permits us to build the result of the union directly 
> into the final nodes, reusing as much of the original trees as possible.  
> Bulk methods are used where possible.
> The result is not _uniformly_ faster, but is _significantly_ faster on 
> average: median _improvement_ of 1.4x (that is, 2.4x total throughput), mean 
> improvement of 10x.  Worst reduction is 30%, and it may be that we can 
> isolate and alleviate that.  Allocations are also reduced significantly, with 
> a median of 30% and mean of 42% for the tested workloads.  As the trees get 
> larger the improvement drops, but remains uniformly lower.
> h2. Transforming
> Transformations garbage overhead is minimal, i.e. the main allocations are 
> those necessary to represent the new tree.  It is significantly faster and 
> particularly more efficient when removing elements, utilising the shared 
> functionality of th

[jira] [Commented] (CASSANDRA-17560) Migrate track_warnings to more standard naming conventions and use latest configuration types rather than long

2022-04-21 Thread Jira



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525644#comment-17525644
 ] 

Andres de la Peña commented on CASSANDRA-17560:
---

Looks good to me, I have just added a comment about how we log the updating of 
properties, it can be addressed on commit.

> Migrate track_warnings to more standard naming conventions and use latest 
> configuration types rather than long
> --
>
> Key: CASSANDRA-17560
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17560
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.1
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Track warnings currently is nested which is discouraged at the moment.  It 
> also was before the config standards patch which moved storage typed longs to 
> a new DataStorageSpec type, we should migrate the configs there.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17565) Fix test_parallel_upgrade_with_internode_ssl

2022-04-21 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525641#comment-17525641
 ] 

Brandon Williams commented on CASSANDRA-17565:
--

bq. but that didn't work, so ignore this.

To clarify, there are 7 failures in that run, but 6 were git errors and one was 
legit.

Not trusting the results, I did another [4000 
runs|https://app.circleci.com/pipelines/github/driftx/cassandra/443/workflows/8e7a307a-4a13-4c00-ab45-ca65b48ac602/jobs/5184]
 and got one failure again...however, examining the line number, that has to be 
from the 4.0 side, and indeed it needs the same patch.  But now the question 
is, can the upgrade test be run with both a custom 4.0 and trunk branch?  If 
not, perhaps this is enough to commit the trunk side, and then we can run 4k 
with a custom 4.0 branch against it, which should prove out the whole thing.

> Fix test_parallel_upgrade_with_internode_ssl
> 
>
> Key: CASSANDRA-17565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17565
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.x
>
>
> While working on CASSANDRA-17341 I hit this flaky test, very rarely failing 
> but it is failing on trunk.
> More info in this CI run:
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/1563/workflows/61bda0b7-f699-4897-877f-c7d523a03127/jobs/10318



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17568) Implement nodetool command to list data directories of existing tables

2022-04-21 Thread Stefan Miklosovic (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-17568:
--
Summary: Implement nodetool command to list data directories of existing 
tables  (was: Tool to list data directories)

> Implement nodetool command to list data directories of existing tables
> --
>
> Key: CASSANDRA-17568
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17568
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/nodetool
>Reporter: Tibor Repasi
>Assignee: Tibor Repasi
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> When a table is created, dropped and re-created with the same name, 
> directories remain within data paths. Operators may be challenged finding out 
> which directories belong to existing tables and which may be subject to 
> removal. However, the information is available in CQL as well as in MBeans 
> via JMX, a convenient access to this information is still missing.
> My proposal is a new nodetool subcommand allowing to list data paths of all 
> existing tables.
> {code}
> % bin/nodetool datapaths -- example
> Keyspace : example
>   Table : test
>   Paths :
>   
> /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17568) Tool to list data directories

2022-04-21 Thread Stefan Miklosovic (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-17568:
--
Status: Changes Suggested  (was: Review In Progress)

I have did the first pass and I wait for author's feedback.

> Tool to list data directories
> -
>
> Key: CASSANDRA-17568
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17568
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/nodetool
>Reporter: Tibor Repasi
>Assignee: Tibor Repasi
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> When a table is created, dropped and re-created with the same name, 
> directories remain within data paths. Operators may be challenged finding out 
> which directories belong to existing tables and which may be subject to 
> removal. However, the information is available in CQL as well as in MBeans 
> via JMX, a convenient access to this information is still missing.
> My proposal is a new nodetool subcommand allowing to list data paths of all 
> existing tables.
> {code}
> % bin/nodetool datapaths -- example
> Keyspace : example
>   Table : test
>   Paths :
>   
> /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17568) Tool to list data directories

2022-04-21 Thread Stefan Miklosovic (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-17568:
--
Test and Documentation Plan: unit tests
 Status: Patch Available  (was: In Progress)

https://github.com/apache/cassandra/pull/1580

> Tool to list data directories
> -
>
> Key: CASSANDRA-17568
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17568
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/nodetool
>Reporter: Tibor Repasi
>Assignee: Tibor Repasi
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> When a table is created, dropped and re-created with the same name, 
> directories remain within data paths. Operators may be challenged finding out 
> which directories belong to existing tables and which may be subject to 
> removal. However, the information is available in CQL as well as in MBeans 
> via JMX, a convenient access to this information is still missing.
> My proposal is a new nodetool subcommand allowing to list data paths of all 
> existing tables.
> {code}
> % bin/nodetool datapaths -- example
> Keyspace : example
>   Table : test
>   Paths :
>   
> /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17568) Tool to list data directories

2022-04-21 Thread Stefan Miklosovic (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-17568:
--
Reviewers: Stefan Miklosovic
   Status: Review In Progress  (was: Patch Available)

> Tool to list data directories
> -
>
> Key: CASSANDRA-17568
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17568
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/nodetool
>Reporter: Tibor Repasi
>Assignee: Tibor Repasi
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> When a table is created, dropped and re-created with the same name, 
> directories remain within data paths. Operators may be challenged finding out 
> which directories belong to existing tables and which may be subject to 
> removal. However, the information is available in CQL as well as in MBeans 
> via JMX, a convenient access to this information is still missing.
> My proposal is a new nodetool subcommand allowing to list data paths of all 
> existing tables.
> {code}
> % bin/nodetool datapaths -- example
> Keyspace : example
>   Table : test
>   Paths :
>   
> /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17563) Fix CircleCI Midres config

2022-04-21 Thread Berenguer Blasi (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525606#comment-17525606
 ] 

Berenguer Blasi commented on CASSANDRA-17563:
-

[~dcapwell] I am in the middle of sthg but I will try to look into this asap

> Fix CircleCI Midres config
> --
>
> Key: CASSANDRA-17563
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17563
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.1
>
>
> During CircleCI addition of a new job to the config, the midres file got 
> messy. Two of the immediate issues (but we need to verify all jobs will use 
> the right executors and resources):
>  * the new job needs to use higher parallelism as the original in-jvm job
>  *  j8_dtests_with_vnodes should get from midres 50 large but currently 
> midres makes it run with 25 and medium which fails around 100 tests



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-17568) Tool to list data directories

2022-04-21 Thread Stefan Miklosovic (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525598#comment-17525598
 ] 

Stefan Miklosovic commented on CASSANDRA-17568:
---

I did first more serious pass on the PR. I would love to have all issues 
addressed and we can consider more seriously what to do with it next.

> Tool to list data directories
> -
>
> Key: CASSANDRA-17568
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17568
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tool/nodetool
>Reporter: Tibor Repasi
>Assignee: Tibor Repasi
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> When a table is created, dropped and re-created with the same name, 
> directories remain within data paths. Operators may be challenged finding out 
> which directories belong to existing tables and which may be subject to 
> removal. However, the information is available in CQL as well as in MBeans 
> via JMX, a convenient access to this information is still missing.
> My proposal is a new nodetool subcommand allowing to list data paths of all 
> existing tables.
> {code}
> % bin/nodetool datapaths -- example
> Keyspace : example
>   Table : test
>   Paths :
>   
> /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-11871) Allow to aggregate by time intervals

2022-04-21 Thread Benjamin Lerer (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-11871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525593#comment-17525593
 ] 

Benjamin Lerer commented on CASSANDRA-11871:


[~yifanc] Your comment about the CQL.textile file made me realized that we need 
to upgrade the CQL version for the next release and make sure that all the CQL 
changes are mentioned in the CQL version change. I opened CASSANDRA-17570 for 
that.

I addressed your comments :-)

> Allow to aggregate by time intervals
> 
>
> Key: CASSANDRA-11871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11871
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Benjamin Lerer
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 4.x
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> For time series data it can be usefull to aggregate by time intervals.
> The idea would be to add support for one or several functions in the {{GROUP 
> BY}} clause.
> Regarding the implementation, even if in general I also prefer to follow the 
> SQL syntax, I do not believe it will be a good fit for Cassandra.
> If we have a table like:
> {code}
> CREATE TABLE trades
> {
> symbol text,
> date date,
> time time,
> priceMantissa int,
> priceExponent tinyint,
> volume int,
> PRIMARY KEY ((symbol, date), time)
> };
> {code}
> The trades will be inserted with an increasing time and sorted in the same 
> order. As we can have to process a large amount of data, we want to try to 
> limit ourself to the cases where we can build the groups on the flight (which 
> is not a requirement in the SQL world).
> If we want to get the number of trades per minutes with the SQL syntax we 
> will have to write:
> {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' 
> AND date = '2016-01-11' GROUP BY hour(time), minute(time);}}
> which is fine. The problem is that if the user invert by mistake the 
> functions like that:
> {{SELECT hour(time), minute(time), count() FROM Trades WHERE symbol = 'AAPL' 
> AND date = '2016-01-11' GROUP BY minute(time), hour(time);}}
> the query will return weird results.
> The only way to prevent that would be to check the function order and make 
> sure that we do not allow to skip functions (e.g. {{GROUP BY hour(time), 
> second(time)}}).
> In my opinion a function like {{floor(, )}} will be 
> much better as it does not allow for this type of mistakes and is much more 
> flexible (you can create 5 minutes buckets if you want to).
> {{SELECT floor(time, m), count() FROM Trades WHERE symbol = 'AAPL' AND date = 
> '2016-01-11' GROUP BY floor(time, m);}}
> An important aspect to keep in mind with a function like {{floor}} is the 
> starting point. For a query like:  {{SELECT floor(time, m), count() FROM 
> Trades WHERE symbol = 'AAPL' AND date = '2016-01-11' AND time >= '01:30:00' 
> AND time =< '07:30:00' GROUP BY floor(time, 2h);}}, I think that ideally the 
> result should return 3 groups: {{01:30:00}}, {{03:30:00}} and {{05:30:00}}.  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17570) Update the CQL version for the 4.1 release

2022-04-21 Thread Benjamin Lerer (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-17570:
---
Change Category: Semantic
 Complexity: Low Hanging Fruit
 Status: Open  (was: Triage Needed)

> Update the CQL version for the 4.1 release
> --
>
> Key: CASSANDRA-17570
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17570
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Syntax
>Reporter: Benjamin Lerer
>Priority: Normal
> Fix For: 4.1
>
>
> We made several changes to CQL during that version. We need to document those 
> changes in the {{CQL.textile}} file and update the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-17570) Update the CQL version for the 4.1 release

2022-04-21 Thread Benjamin Lerer (Jira)

Benjamin Lerer created CASSANDRA-17570:
--

 Summary: Update the CQL version for the 4.1 release
 Key: CASSANDRA-17570
 URL: https://issues.apache.org/jira/browse/CASSANDRA-17570
 Project: Cassandra
  Issue Type: Improvement
  Components: CQL/Syntax
Reporter: Benjamin Lerer


We made several changes to CQL during that version. We need to document those 
changes in the {{CQL.textile}} file and update the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-17570) Update the CQL version for the 4.1 release

2022-04-21 Thread Benjamin Lerer (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-17570:
---
Fix Version/s: 4.1

> Update the CQL version for the 4.1 release
> --
>
> Key: CASSANDRA-17570
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17570
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Syntax
>Reporter: Benjamin Lerer
>Priority: Normal
> Fix For: 4.1
>
>
> We made several changes to CQL during that version. We need to document those 
> changes in the {{CQL.textile}} file and update the version.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

1 2 >

1 - 100 of 114 matches

Mail list logo