[jira] [Commented] (CASSANDRA-19427) Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries with multiple coordinator-local partitions

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821546#comment-17821546
 ] 

Stefan Miklosovic commented on CASSANDRA-19427:
---

[CASSANDRA-19427-4.1|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19427-4.1]
{noformat}
java11_pre-commit_tests 
java11_separate_tests
java8_pre-commit_tests  
  ✓ j8_build 8m 18s
  ✓ j8_cqlsh_dtests_py3  7m 13s
  ✓ j8_cqlsh_dtests_py3118m 45s
  ✓ j8_cqlsh_dtests_py311_vnode   8m 8s
  ✓ j8_cqlsh_dtests_py38 6m 41s
  ✓ j8_cqlsh_dtests_py38_vnode6m 1s
  ✓ j8_cqlsh_dtests_py3_vnode6m 59s
  ✓ j8_cqlshlib_cython_tests13m 37s
  ✓ j8_cqlshlib_tests8m 39s
  ✓ j8_dtests34m 1s
  ✓ j8_dtests_vnode 35m 20s
  ✓ j8_jvm_dtests   18m 29s
  ✓ j8_jvm_dtests_vnode 16m 11s
  ✓ j8_simulator_dtests  1m 33s
  ✓ j11_jvm_dtests_vnode12m 35s
  ✓ j11_jvm_dtests  15m 32s
  ✓ j11_dtests_vnode 37m 3s
  ✓ j11_dtests  33m 48s
  ✓ j11_cqlshlib_tests   6m 41s
  ✓ j11_cqlshlib_cython_tests7m 14s
  ✓ j11_cqlsh_dtests_py3_vnode   5m 37s
  ✓ j11_cqlsh_dtests_py38_vnode   6m 7s
  ✓ j11_cqlsh_dtests_py385m 42s
  ✓ j11_cqlsh_dtests_py311_vnode 5m 48s
  ✓ j11_cqlsh_dtests_py311   5m 43s
  ✓ j11_cqlsh_dtests_py3 5m 48s
  ✕ j8_unit_tests   10m 29s
  org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
  ✕ j8_utests_system_keyspace_directory 11m 13s
  org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
  ✕ j11_unit_tests  11m 23s
  org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
java8_separate_tests 
{noformat}

[java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3933/workflows/92e861db-4f2c-4bdf-83ed-1966aca7a3f7]
[java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3933/workflows/b12c07ea-62f1-492d-8959-4c2737ce53c6]
[java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3933/workflows/65dba7d5-40d9-46a1-8322-388a70e1e2aa]
[java8_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3933/workflows/4b5b14bf-5d78-4c0c-9de9-322ca6a9aab5]


> Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries 
> with multiple coordinator-local partitions
> 
>
> Key: CASSANDRA-19427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Legacy/Local Write-Read Paths
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> On one of our clusters, we noticed rare but periodic 
> ArrayIndexOutOfBoundsExceptions:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-3,5,main]"
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException"{code}
>  
>  
> The error was in a Runnable, so the stacktrace didn't directly indicate where 
> the error was coming from. We enabled JFR to log the underlying exception 
> that was thrown:
>  
> 

[jira] [Commented] (CASSANDRA-19414) Skinny dev circle workflow

2024-02-27 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821535#comment-17821535
 ] 

Berenguer Blasi commented on CASSANDRA-19414:
-

Thx for the reviews

> Skinny dev circle workflow
> --
>
> Key: CASSANDRA-19414
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19414
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CI
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> CircleCi CI runs are getting pretty heavy. During dev iterations we trigger 
> many CI pre-commit jobs which are just an overkill.
> This ticket has the purpose to purge from the pre-commit workflow all 
> variations of the test matrix but the vanilla one. That should enable us for 
> a quick and cheap to iterate *during dev*, this is not a substitute for 
> pre-commit . This ticket's work will serve as the basis for the upcoming 
> changes being discussed 
> [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19414) Skinny dev circle workflow

2024-02-27 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi updated CASSANDRA-19414:

  Fix Version/s: 5.0-rc
 (was: 5.0.x)
Source Control Link: 
https://github.com/apache/cassandra/commit/ab25cae4c568312a4a2a5798296b7e97300306fd
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Skinny dev circle workflow
> --
>
> Key: CASSANDRA-19414
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19414
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CI
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> CircleCi CI runs are getting pretty heavy. During dev iterations we trigger 
> many CI pre-commit jobs which are just an overkill.
> This ticket has the purpose to purge from the pre-commit workflow all 
> variations of the test matrix but the vanilla one. That should enable us for 
> a quick and cheap to iterate *during dev*, this is not a substitute for 
> pre-commit . This ticket's work will serve as the basis for the upcoming 
> changes being discussed 
> [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19414) Skinny dev circle workflow

2024-02-27 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi updated CASSANDRA-19414:

Status: Ready to Commit  (was: Review In Progress)

> Skinny dev circle workflow
> --
>
> Key: CASSANDRA-19414
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19414
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CI
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> CircleCi CI runs are getting pretty heavy. During dev iterations we trigger 
> many CI pre-commit jobs which are just an overkill.
> This ticket has the purpose to purge from the pre-commit workflow all 
> variations of the test matrix but the vanilla one. That should enable us for 
> a quick and cheap to iterate *during dev*, this is not a substitute for 
> pre-commit . This ticket's work will serve as the basis for the upcoming 
> changes being discussed 
> [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch trunk updated (ce963bc991 -> c26be50107)

2024-02-27 Thread bereng
This is an automated email from the ASF dual-hosted git repository.

bereng pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


from ce963bc991 Merge branch 'cassandra-5.0' into trunk
 new ab25cae4c5 Skinny dev circle workflow
 new c26be50107 Merge branch 'cassandra-5.0' into trunk

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .circleci/generate.sh | 88 +--
 1 file changed, 86 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cassandra-5.0 updated: Skinny dev circle workflow

2024-02-27 Thread bereng
This is an automated email from the ASF dual-hosted git repository.

bereng pushed a commit to branch cassandra-5.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cassandra-5.0 by this push:
 new ab25cae4c5 Skinny dev circle workflow
ab25cae4c5 is described below

commit ab25cae4c568312a4a2a5798296b7e97300306fd
Author: Bereng 
AuthorDate: Wed Feb 21 09:27:42 2024 +0100

Skinny dev circle workflow

patch by Berenguer Blasi; reviewed by Ekaterina Dimitrova, Stefan 
Miklosovic for CASSANDRA-19414
---
 .circleci/generate.sh | 88 +--
 1 file changed, 86 insertions(+), 2 deletions(-)

diff --git a/.circleci/generate.sh b/.circleci/generate.sh
index 1352f5839f..98cc2aec70 100755
--- a/.circleci/generate.sh
+++ b/.circleci/generate.sh
@@ -34,6 +34,8 @@ print_help()
   echo "   -a Generate the config.yml, config.yml.FREE and config.yml.PAID 
expanded configuration"
   echo "  files from the main config_template.yml reusable configuration 
file."
   echo "  Use this for permanent changes in config.yml that will be 
committed to the main repo."
+  echo "   -d Minimal development checks only. Sanity check during your dev 
before sending it to review for speed and cost reductions."
+  echo "  Submitting cleaning pre-commit clean CI run is still a 
requirement when the patch is ready for review"
   echo "   -f Generate config.yml for tests compatible with the CircleCI free 
tier resources"
   echo "   -p Generate config.yml for tests compatible with the CircleCI paid 
tier resources"
   echo "   -b Specify the base git branch for comparison when determining 
changed tests to"
@@ -80,15 +82,18 @@ print_help()
 all=false
 free=false
 paid=false
+dev_min=false
 env_vars=""
 has_env_vars=false
 check_env_vars=true
 detect_changed_tests=true
-while getopts "e:afpib:s" opt; do
+while getopts "e:afpdib:s" opt; do
   case $opt in
   a ) all=true
   detect_changed_tests=false
   ;;
+  d ) dev_min=true
+  ;;
   f ) free=true
   ;;
   p ) paid=true
@@ -253,7 +258,7 @@ if $has_env_vars; then
 fi
 
 # Define function to remove unneeded jobs.
-# The first argument is the file name, and the second arguemnt is the job name.
+# The first argument is the file name, and the second argument is the job name.
 delete_job()
 {
   delete_yaml_block()
@@ -332,8 +337,87 @@ delete_repeated_jobs()
   fi
 }
 
+# Update the workflow names
+rename_workflow()
+{
+  file="$BASEDIR/$1"
+  echo "Updating workflow names in the configuration $2 -> $3"
+
+  sed -Ei.bak "s/$2/$3/g" "$file"
+}
+
+# Define function to leave only a single config run for each test group.
+# This builds a minimal sanity check config for dev only for time and cost 
purposes.
+# The first and only argument is the file name.
+build_dev_min_jobs()
+{
+  delete_job "$1" "j11_cqlsh_dtests_py311_offheap"
+  delete_job "$1" "j11_cqlsh_dtests_py38_offheap"
+  delete_job "$1" "j17_cqlsh_dtests_py311_offheap"
+  delete_job "$1" "j17_cqlsh_dtests_py38_offheap"
+  delete_job "$1" "j11_cqlsh_dtests_py311_vnode"
+  delete_job "$1" "j11_cqlsh_dtests_py38_vnode"
+  delete_job "$1" "j11_cqlsh_dtests_py38"
+  delete_job "$1" "j11_cqlshlib_cython_tests"
+  delete_job "$1" "j17_cqlsh_dtests_py311_vnode"
+  delete_job "$1" "j17_cqlsh_dtests_py311"
+  delete_job "$1" "j17_cqlsh_dtests_py38_vnode"
+  delete_job "$1" "j17_cqlsh_dtests_py38"
+  delete_job "$1" "j17_cqlshlib_tests"
+  delete_job "$1" "j17_cqlshlib_cython_tests"
+  delete_job "$1" "j11_dtests_vnode"
+  delete_job "$1" "j11_dtests_large_vnode"
+  delete_job "$1" "j11_dtests_offheap"
+  delete_job "$1" "j17_dtests_vnode"
+  delete_job "$1" "j17_dtests_large"
+  delete_job "$1" "j17_dtests_large_vnode"
+  delete_job "$1" "j17_dtests_offheap"
+  delete_job "$1" "j17_dtests"
+  delete_job "$1" "j11_jvm_dtests_vnode"
+  delete_job "$1" "j17_jvm_dtests_vnode"
+  delete_job "$1" "j17_jvm_dtests"
+  delete_job "$1" "j11_utests_oa"
+  delete_job "$1" "j11_utests_cdc"
+  delete_job "$1" "j11_utests_compression"
+  delete_job "$1" "j11_utests_fqltool"
+  delete_job "$1" "j11_utests_long"
+  delete_job "$1" "j11_utests_stress"
+  delete_job "$1" "j11_utests_trie"
+  delete_job "$1" "j11_utests_system_keyspace_directory"
+  delete_job "$1" "j17_unit_tests"
+  delete_job "$1" "j17_utests_oa"
+  delete_job "$1" "j17_utests_cdc"
+  delete_job "$1" "j17_utests_compression"
+  delete_job "$1" "j17_utests_fqltool"
+  delete_job "$1" "j17_utests_long"
+  delete_job "$1" "j17_utests_stress"
+  delete_job "$1" "j17_utests_trie"
+  delete_job "$1" "j17_utests_trie"
+  delete_job "$1" "j17_utests_system_keyspace_directory"
+  delete_job "$1" "start_utests_trie"
+  delete_job "$1" "start_utests_system_keyspace_directory"
+  delete_job "$1" "start_utests_stress"
+  delete_job "$1" "start_utests_long"
+  delete_job "$1" "start_utests_fqltool"
+  delete_job "$1" 

(cassandra) 01/01: Merge branch 'cassandra-5.0' into trunk

2024-02-27 Thread bereng
This is an automated email from the ASF dual-hosted git repository.

bereng pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit c26be50107f9489ceb49531f2b914b30267a48bf
Merge: ce963bc991 ab25cae4c5
Author: Bereng 
AuthorDate: Wed Feb 28 08:01:21 2024 +0100

Merge branch 'cassandra-5.0' into trunk

* cassandra-5.0:
  Skinny dev circle workflow

 .circleci/generate.sh | 88 +--
 1 file changed, 86 insertions(+), 2 deletions(-)



-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19426) Fix Double Type issues in the Gossiper#maybeGossipToCMS

2024-02-27 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821497#comment-17821497
 ] 

Maxwell Guo edited comment on CASSANDRA-19426 at 2/28/24 3:40 AM:
--

agree with [~brandon.williams], but it would be more perfect if there were some 
test data on larger clusters such as 1000 nodes. Seems [~jjirsa] has made a 
version of code modification, I don’t know if there is any test data when 
probability set to 1. 


was (Author: maxwellguo):
agree with [~brandon.williams], but it would be more perfect if there were some 
test data on larger clusters such as 1000 nodes. Seems [~jjirsa] has made a 
version of code modification, I don’t know if there is any test data. 

> Fix Double Type issues in the Gossiper#maybeGossipToCMS
> ---
>
> Key: CASSANDRA-19426
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19426
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Transactional Cluster Metadata
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _*issue-1:*_
> if liveEndpoints.size()=unreachableEndpoints.size()=0; probability will be 
> {*}_Infinity_{*}.
> randDbl <= probability will always be true, then sendGossip
> _*issue-2:*_ 
> comparing two double is safe by using *<* or {*}>{*}. However missing 
> accuracy will happen if we compare the equality of two double by 
> intuition({*}={*}). For example:
> {code:java}
> double probability = 0.1;
> double randDbl = 0.10001; // Slightly greater than probability
> if (randDbl <= probability)
> {
>     System.out.println("randDbl <= probability(always here)");
> }
> else
> {
>     System.out.println("randDbl > probability");
> }
> {code}
> A good example from: _*Gossiper#maybeGossipToUnreachableMember*_
> {code:java}
> if (randDbl < prob)
> {
> sendGossip(message, Sets.filter(unreachableEndpoints.keySet(),
>                                 ep -> 
> !isDeadState(getEndpointStateMap().get(ep;
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19426) Fix Double Type issues in the Gossiper#maybeGossipToCMS

2024-02-27 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821497#comment-17821497
 ] 

Maxwell Guo commented on CASSANDRA-19426:
-

agree with [~brandon.williams], but it would be more perfect if there were some 
test data on larger clusters such as 1000 nodes. Seems [~jjirsa] has made a 
version of code modification, I don’t know if there is any test data. 

> Fix Double Type issues in the Gossiper#maybeGossipToCMS
> ---
>
> Key: CASSANDRA-19426
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19426
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Transactional Cluster Metadata
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _*issue-1:*_
> if liveEndpoints.size()=unreachableEndpoints.size()=0; probability will be 
> {*}_Infinity_{*}.
> randDbl <= probability will always be true, then sendGossip
> _*issue-2:*_ 
> comparing two double is safe by using *<* or {*}>{*}. However missing 
> accuracy will happen if we compare the equality of two double by 
> intuition({*}={*}). For example:
> {code:java}
> double probability = 0.1;
> double randDbl = 0.10001; // Slightly greater than probability
> if (randDbl <= probability)
> {
>     System.out.println("randDbl <= probability(always here)");
> }
> else
> {
>     System.out.println("randDbl > probability");
> }
> {code}
> A good example from: _*Gossiper#maybeGossipToUnreachableMember*_
> {code:java}
> if (randDbl < prob)
> {
> sendGossip(message, Sets.filter(unreachableEndpoints.keySet(),
>                                 ep -> 
> !isDeadState(getEndpointStateMap().get(ep;
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19417) LIST SUPERUSERS cql command

2024-02-27 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821491#comment-17821491
 ] 

Maxwell Guo commented on CASSANDRA-19417:
-

Hi, [~skoppu],   
[https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlListUsers.html] 
this url seems to be datastax 's page, not apache cassandra .

I think DISCUSS through ML may give us more  new information input , and it is 
also a way to let others know that we are going to add a new grammar.

> LIST SUPERUSERS cql command
> ---
>
> Key: CASSANDRA-19417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19417
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/cqlsh
>Reporter: Shailaja Koppu
>Assignee: Shailaja Koppu
>Priority: Normal
>  Labels: CQL
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Developing a new CQL command LIST SUPERUSERS to return list of roles with 
> superuser privilege. This includes roles who acquired superuser privilege in 
> the hierarchy. 
> Context: LIST ROLES cql command lists roles, their membership details and 
> displays super=true for immediate superusers. But there can be roles who 
> acquired superuser privilege due to a grant. LIST ROLES command won't display 
> super=true for such roles and the only way to recognize such roles is to look 
> for atleast one row with super=true in the output of LIST ROLES OF  name> command. While this works to check is a given role has superuser 
> privilege, there may be services (for example, Sidecar) working with C* and 
> may need to maintain list of roles with superuser privilege. There is no 
> existing command/tool to retrieve such roles details. Hence developing this 
> command which returns all roles having superuser privilege.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache

2024-02-27 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821475#comment-17821475
 ] 

Paulo Motta commented on CASSANDRA-17401:
-

Thanks for the detailed reports and repro steps. I've taken a look and this 
looks to me to be a legitimate race condition that can cause a re-prepare storm 
under large concurrency and unlucky timing.

My understanding is that [these evict 
statements|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L735]
 are not required for the correctness of the upgrade compatibility logic and 
can be safely removed. Would you have some cycles to confirm this [~ifesdjeen] ?

In addition to this, I think there's a pending issue from CASSANDRA-17248 that 
can leak prepared statements between keyspaces during mixed upgrade mode. Since 
these issues are in a related area I think it makes sense to address them 
together (in separate commits) to ensure these changes are tested together.

I think the {{PreparedStatementCollisionTest}} suite from [this 
commit|https://github.com/apache/cassandra/pull/1872/commits/758bc4a89d7ca9d0bfe27e6f41000484724261bc]
 can help improve the validation coverage of this logic. That change looks 
correct to me but may need some cleanup. We should probably keep the metric 
changes out of this to keep the scope of this patch to a minimum.

After proper review and validation I think there's value in including these 
fixes in the final 3.X releases to address these outstanding issues as users 
will still do upgrade cycles as 5.x release approaches. This will make 
resolution more laborious as we will need to provide patches for 3.x all the 
way up to trunk + CI for all branches. What do you think [~brandon.williams] 
[~stefan.miklosovic]  ?

> Race condition in QueryProcessor causes just prepared statement not to be in 
> the prepared statements cache
> --
>
> Key: CASSANDRA-17401
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17401
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Ivan Senic
>Assignee: Jaydeepkumar Chovatia
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The changes in the 
> [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638]
>  method that were introduced in versions *4.0.2* and *3.11.12* can cause a 
> race condition between two threads trying to concurrently prepare the same 
> statement. This race condition can cause removing of a prepared statement 
> from the cache, after one of the threads has received the result of the 
> prepare and eventually uses MD5Digest to call 
> [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215].
> The race condition looks like this:
>  * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false
>  * Thread1 executes eviction of hashes
>  * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false
>  * Thread1 prepares the statement and caches it
>  * Thread1 returns the result of the prepare
>  * Thread2 executes eviction of hashes
>  * Thread1 tries to execute the prepared statement with the received 
> MD5Digest, but statement is not in the cache as it was evicted by Thread2
> I tried to reproduce this by using a Java driver, but hitting this case from 
> a client side is highly unlikely and I can not simulate the needed race 
> condition. However, we can easily reproduce this in Stargate (details 
> [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to 
> QueryProcessor.
> Reproducing this in a unit test is fairly easy. I am happy to showcase this 
> if needed.
> Note that the issue can occur only when  safeToReturnCached is resolved as 
> false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-17401) Race condition in QueryProcessor causes just prepared statement not to be in the prepared statements cache

2024-02-27 Thread Paulo Motta (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-17401:

 Bug Category: Parent values: Correctness(12982)Level 1 values: Transient 
Incorrect Response(12987)
   Complexity: Normal
  Component/s: Messaging/Client
Discovered By: User Report
Reviewers: Paulo Motta
 Severity: Normal
 Assignee: Jaydeepkumar Chovatia
   Status: Open  (was: Triage Needed)

> Race condition in QueryProcessor causes just prepared statement not to be in 
> the prepared statements cache
> --
>
> Key: CASSANDRA-17401
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17401
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Ivan Senic
>Assignee: Jaydeepkumar Chovatia
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The changes in the 
> [QueryProcessor#prepare|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L575-L638]
>  method that were introduced in versions *4.0.2* and *3.11.12* can cause a 
> race condition between two threads trying to concurrently prepare the same 
> statement. This race condition can cause removing of a prepared statement 
> from the cache, after one of the threads has received the result of the 
> prepare and eventually uses MD5Digest to call 
> [QueryProcessor#getPrepared|https://github.com/apache/cassandra/blame/cassandra-4.0.2/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L212-L215].
> The race condition looks like this:
>  * Thread1 enters _prepare_ method and resolves _safeToReturnCached_ as false
>  * Thread1 executes eviction of hashes
>  * Thread2 enters _prepare_ method and resolves _safeToReturnCached_ as false
>  * Thread1 prepares the statement and caches it
>  * Thread1 returns the result of the prepare
>  * Thread2 executes eviction of hashes
>  * Thread1 tries to execute the prepared statement with the received 
> MD5Digest, but statement is not in the cache as it was evicted by Thread2
> I tried to reproduce this by using a Java driver, but hitting this case from 
> a client side is highly unlikely and I can not simulate the needed race 
> condition. However, we can easily reproduce this in Stargate (details 
> [here|https://github.com/stargate/stargate/pull/1647]), as it's closer to 
> QueryProcessor.
> Reproducing this in a unit test is fairly easy. I am happy to showcase this 
> if needed.
> Note that the issue can occur only when  safeToReturnCached is resolved as 
> false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19398) Test Failure: org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading

2024-02-27 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821463#comment-17821463
 ] 

Brandon Williams commented on CASSANDRA-19398:
--

This bisects to CASSANDRA-18645, which can be seen failing 
[here|https://app.circleci.com/pipelines/github/driftx/cassandra/1486/workflows/eb7d55c8-165a-45c9-84b8-001a9e9b603c/jobs/73902]
 but 
[passing|https://app.circleci.com/pipelines/github/driftx/cassandra/1487/workflows/0c210947-133c-4ec0-945c-af064d74f691/jobs/74004]
 on the commit just before. I wasn't able to reproduce this locally in many 
thousands of runs which seems a bit odd, and circle doesn't have the actual 
logs but we can know from the assertion they contain "Compaction interrupted" 
but it still doesn't really make sense why the Guava upgrade would cause this.

> Test Failure: 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading
> --
>
> Key: CASSANDRA-19398
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19398
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2646/workflows/bc2bba74-9e56-4bea-8de7-4ff840c4f450/jobs/56028/tests#failed-test-0]
> {code:java}
> junit.framework.AssertionFailedError at 
> org.apache.cassandra.distributed.test.UpgradeSSTablesTest.truncateWhileUpgrading(UpgradeSSTablesTest.java:220)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821460#comment-17821460
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


from ~500k to ~150k ops. I am using all default settings - no optimizations on 
Cassandra or OS.

yes that instance has 96 cores CPU but the CPU utilization is around 25% when 
it reaches 150k ops. In addition, this is not something specific to Graviton 
CPU since it happens also on Intel.

I have tested the 50/50 R/W settings with :

 
{code:java}
bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop 
keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && 
tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 
127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool compact 
keyspace1   && sleep 30s && tools/bin/cassandra-stress mixed 
ratio\(write=50,read=50\) duration=10m cl=ONE -rate threads=100 -node localhost 
-log file=result.log -graph file=graph.html {code}

Results:

 

- 4.1.3 released:
{code:java}
Results:
Op rate                   :  142,571 op/s  [READ: 71,293 op/s, WRITE: 71,278 
op/s]
Partition rate            :  142,571 pk/s  [READ: 71,293 pk/s, WRITE: 71,278 
pk/s]
Row rate                  :  142,571 row/s [READ: 71,293 row/s, WRITE: 71,278 
row/s]
Latency mean              :    0.7 ms [READ: 1.3 ms, WRITE: 0.1 ms]
Latency median            :    0.2 ms [READ: 1.2 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    2.0 ms [READ: 2.3 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    2.6 ms [READ: 2.9 ms, WRITE: 0.2 ms]
Latency 99.9th percentile :    8.4 ms [READ: 9.6 ms, WRITE: 0.4 ms]
Latency max               :   51.0 ms [READ: 51.0 ms, WRITE: 47.7 ms]
Total partitions          : 85,661,309 [READ: 42,835,266, WRITE: 42,826,043]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 1,310
Total GC memory           : 2067.821 GiB
Total GC time             :    9.1 seconds
Avg GC time               :    7.0 ms
StdDev GC time            :    3.6 ms
Total operation time      : 00:10:00 {code}

- 4.1.3 with patch:
{code:java}
Results:
Op rate                   :  459,728 op/s  [READ: 229,910 op/s, WRITE: 229,818 
op/s]
Partition rate            :  459,728 pk/s  [READ: 229,910 pk/s, WRITE: 229,818 
pk/s]
Row rate                  :  459,728 row/s [READ: 229,910 row/s, WRITE: 229,818 
row/s]
Latency mean              :    0.2 ms [READ: 0.3 ms, WRITE: 0.2 ms]
Latency median            :    0.2 ms [READ: 0.2 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    0.3 ms [READ: 0.3 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    0.4 ms [READ: 0.6 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    8.4 ms [READ: 8.9 ms, WRITE: 7.4 ms]
Latency max               : 1887.4 ms [READ: 1,887.4 ms, WRITE: 48.1 ms]
Total partitions          : 275,966,298 [READ: 138,010,917, WRITE: 137,955,381]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 4,438
Total GC memory           : 6971.464 GiB
Total GC time             :   33.7 seconds
Avg GC time               :    7.6 ms
StdDev GC time            :    3.8 ms
Total operation time      : 00:10:00 {code}

Increasing the percentage of writes in the workload, increase the difference in 
performance between the patch and without (3.2x)

 


Did you have the chance to test it with big instances your end?

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> 

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821448#comment-17821448
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

down to ~150k from what number exactly? I don't think flushes should have such 
an impact. Your machine is a beast so maybe the flushing is the bottleneck? 
What are memtable_flush_writers, memtable_cleanup_threshold, 
memtable_allocation_type, flush_compression and similar configuration 
parameters?

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19362) An "include" is broken on the Storage Engine documentation page

2024-02-27 Thread Arun Ganesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Ganesh updated CASSANDRA-19362:

Authors: Arun Ganesh  (was: Lorina Poland)
Impacts: Docs  (was: None)
Test and Documentation Plan: See attached screenshots
 Status: Patch Available  (was: Open)

> An "include" is broken on the Storage Engine documentation page
> ---
>
> Key: CASSANDRA-19362
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19362
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Jeremy Hanna
>Assignee: Lorina Poland
>Priority: Normal
> Attachments: 3.11.png, 4.0.png, 4.1.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The example code at the bottom of the "Storage Engine" page doesn't appear to 
> be including the code properly.  See 
> https://cassandra.apache.org/doc/stable/cassandra/architecture/storage_engine.html#example-code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19362) An "include" is broken on the Storage Engine documentation page

2024-02-27 Thread Arun Ganesh (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821446#comment-17821446
 ] 

Arun Ganesh commented on CASSANDRA-19362:
-

Hi [~polandll],

I'm a new contributor, and I was looking for something to contribute. Hope you 
don't mind :) I've attached screenshots of the fix as well.

[4.1 PR|https://github.com/apache/cassandra/pull/3148]
[4.0 PR|https://github.com/apache/cassandra/pull/3149]
[3.11 PR|https://github.com/apache/cassandra/pull/3150]

> An "include" is broken on the Storage Engine documentation page
> ---
>
> Key: CASSANDRA-19362
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19362
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Jeremy Hanna
>Assignee: Lorina Poland
>Priority: Normal
> Attachments: 3.11.png, 4.0.png, 4.1.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The example code at the bottom of the "Storage Engine" page doesn't appear to 
> be including the code properly.  See 
> https://cassandra.apache.org/doc/stable/cassandra/architecture/storage_engine.html#example-code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19362) An "include" is broken on the Storage Engine documentation page

2024-02-27 Thread Arun Ganesh (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821446#comment-17821446
 ] 

Arun Ganesh edited comment on CASSANDRA-19362 at 2/27/24 10:54 PM:
---

Hi [~polandll],

I'm a new contributor, and I was looking for something to contribute. Hope you 
don't mind :) I've attached screenshots of the fix as well.

[4.1 PR|https://github.com/apache/cassandra/pull/3148]
[4.0 PR|https://github.com/apache/cassandra/pull/3149]
[3.11 PR|https://github.com/apache/cassandra/pull/3150]


was (Author: JIRAUSER303038):
Hi [~polandll],

I'm a new contributor, and I was looking for something to contribute. Hope you 
don't mind :) I've attached screenshots of the fix as well.

[4.1 PR|https://github.com/apache/cassandra/pull/3148]
[4.0 PR|https://github.com/apache/cassandra/pull/3149]
[3.11 PR|https://github.com/apache/cassandra/pull/3150]

> An "include" is broken on the Storage Engine documentation page
> ---
>
> Key: CASSANDRA-19362
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19362
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Jeremy Hanna
>Assignee: Lorina Poland
>Priority: Normal
> Attachments: 3.11.png, 4.0.png, 4.1.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The example code at the bottom of the "Storage Engine" page doesn't appear to 
> be including the code properly.  See 
> https://cassandra.apache.org/doc/stable/cassandra/architecture/storage_engine.html#example-code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19362) An "include" is broken on the Storage Engine documentation page

2024-02-27 Thread Arun Ganesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Ganesh updated CASSANDRA-19362:

Attachment: 3.11.png
4.0.png
4.1.png

> An "include" is broken on the Storage Engine documentation page
> ---
>
> Key: CASSANDRA-19362
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19362
> Project: Cassandra
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Jeremy Hanna
>Assignee: Lorina Poland
>Priority: Normal
> Attachments: 3.11.png, 4.0.png, 4.1.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The example code at the bottom of the "Storage Engine" page doesn't appear to 
> be including the code properly.  See 
> https://cassandra.apache.org/doc/stable/cassandra/architecture/storage_engine.html#example-code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19222) Leak - Strong self-ref loop detected in BTI

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19222:

Description: 
[https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1233/workflows/bb617340-f1da-4550-9c87-5541469972c4/jobs/62534/tests]
{noformat}
ERROR [Strong-Reference-Leak-Detector:1] 2023-12-21 09:50:33,072 Strong 
self-ref loop detected 
[/tmp/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/oa-1-big
private java.util.List 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.closeables-java.util.ArrayList
transient java.lang.Object[] java.util.ArrayList.elementData-[Ljava.lang.Object;
transient java.lang.Object[] 
java.util.ArrayList.elementData-org.apache.cassandra.io.util.FileHandle
final org.apache.cassandra.utils.concurrent.Ref 
org.apache.cassandra.utils.concurrent.SharedCloseableImpl.ref-org.apache.cassandra.utils.concurrent.Ref
final org.apache.cassandra.utils.concurrent.Ref$State 
org.apache.cassandra.utils.concurrent.Ref.state-org.apache.cassandra.utils.concurrent.Ref$State
final org.apache.cassandra.utils.concurrent.Ref$GlobalState 
org.apache.cassandra.utils.concurrent.Ref$State.globalState-org.apache.cassandra.utils.concurrent.Ref$GlobalState
private final org.apache.cassandra.utils.concurrent.RefCounted$Tidy 
org.apache.cassandra.utils.concurrent.Ref$GlobalState.tidy-org.apache.cassandra.io.util.FileHandle$Cleanup
final java.util.Optional 
org.apache.cassandra.io.util.FileHandle$Cleanup.chunkCache-java.util.Optional
private final java.lang.Object 
java.util.Optional.value-org.apache.cassandra.cache.ChunkCache
private final org.apache.cassandra.utils.memory.BufferPool 
org.apache.cassandra.cache.ChunkCache.bufferPool-org.apache.cassandra.utils.memory.BufferPool
private final java.util.Set 
org.apache.cassandra.utils.memory.BufferPool.localPoolReferences-java.util.Collections$SetFromMap
private final java.util.Map 
java.util.Collections$SetFromMap.m-java.util.concurrent.ConcurrentHashMap
private final java.util.Map 
java.util.Collections$SetFromMap.m-org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef
private final org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks 
org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef.chunks-org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks
private org.apache.cassandra.utils.memory.BufferPool$Chunk 
org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.chunk0-org.apache.cassandra.utils.memory.BufferPool$Chunk
private volatile org.apache.cassandra.utils.memory.BufferPool$LocalPool 
org.apache.cassandra.utils.memory.BufferPool$Chunk.owner-org.apache.cassandra.utils.memory.BufferPool$LocalPool
private final java.lang.Thread 
org.apache.cassandra.utils.memory.BufferPool$LocalPool.owningThread-io.netty.util.concurrent.FastThreadLocalThread
private java.lang.Runnable 
java.lang.Thread.target-io.netty.util.concurrent.FastThreadLocalRunnable
private final java.lang.Runnable 
io.netty.util.concurrent.FastThreadLocalRunnable.runnable-java.util.concurrent.ThreadPoolExecutor$Worker
final java.util.concurrent.ThreadPoolExecutor 
java.util.concurrent.ThreadPoolExecutor$Worker.this$0-org.apache.cassandra.concurrent.ScheduledThreadPoolExecutorPlus
private final java.util.concurrent.BlockingQueue 
java.util.concurrent.ThreadPoolExecutor.workQueue-java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue
private final java.util.concurrent.BlockingQueue 
java.util.concurrent.ThreadPoolExecutor.workQueue-java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
private java.util.concurrent.Callable 
java.util.concurrent.FutureTask.callable-java.util.concurrent.Executors$RunnableAdapter
private final java.lang.Runnable 
java.util.concurrent.Executors$RunnableAdapter.task-org.apache.cassandra.concurrent.ExecutionFailure$1
final java.lang.Runnable 
org.apache.cassandra.concurrent.ExecutionFailure$1.val$wrap-org.apache.cassandra.hints.HintsService$$Lambda$1142/0x000801576aa0
private final org.apache.cassandra.hints.HintsService 
org.apache.cassandra.hints.HintsService$$Lambda$1142/0x000801576aa0.arg$1-org.apache.cassandra.hints.HintsService
private final org.apache.cassandra.hints.HintsWriteExecutor 
org.apache.cassandra.hints.HintsService.writeExecutor-org.apache.cassandra.hints.HintsWriteExecutor
private final org.apache.cassandra.concurrent.ExecutorPlus 
org.apache.cassandra.hints.HintsWriteExecutor.executor-org.apache.cassandra.concurrent.SingleThreadExecutorPlus
private final java.util.HashSet 
java.util.concurrent.ThreadPoolExecutor.workers-java.util.HashSet
private transient java.util.HashMap java.util.HashSet.map-java.util.HashMap
transient java.util.HashMap$Node[] 
java.util.HashMap.table-[Ljava.util.HashMap$Node;
transient java.util.HashMap$Node[] 
java.util.HashMap.table-java.util.HashMap$Node
final 

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821439#comment-17821439
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


yeah, interesting... let me re-test it again on my side.

One thing I have noticed is that at the beginning of the benchmark both 
versions has similar performance. After few minutes (and few flushes), the 
released version's performance starts to drop down to ~150k and then stay 
constant. Since it is read only workload, we do not have flushes and so not 
have a drop in performance. I will try to see if I can understand it more

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821437#comment-17821437
 ] 

Stefan Miklosovic edited comment on CASSANDRA-19429 at 2/27/24 10:22 PM:
-

How is it so? Because we are making changes in SSTableReader which _reads_ 
right? So reading path should be affected.

We see the performance differs only in mixed workloads? So it seems like read 
and write path try to acquire same lock and this is what the patch workarounds?

What happens when you use 50:50 reads:writes? I would say that the overall 
number of operations will be dis-proportionally smaller.


was (Author: smiklosovic):
How is it so? Because we are making changes in SSTableReader which _reads_ 
right? So reading path should be affected.

We see the performance differs only in mixed workloads? So it seems like read 
and write path try to acquire same lock and this is what the patch workarounds?

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821437#comment-17821437
 ] 

Stefan Miklosovic edited comment on CASSANDRA-19429 at 2/27/24 10:21 PM:
-

How is it so? Because we are making changes in SSTableReader which _reads_ 
right? So reading path should be affected.

We see the performance differs only in mixed workloads? So it seems like read 
and write path try to acquire same lock and this is what the patch workarounds?


was (Author: smiklosovic):
How is it so? Because we are making changes in SSTableReader which _reads_ 
right? So reading path should be affected.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821437#comment-17821437
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

How is it so? Because we are making changes in SSTableReader which _reads_ 
right? So reading path should be affected.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821436#comment-17821436
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


Test without compaction after writes and with nodetool disableautocompaction 
and just READs

Cmd:
{code:java}
bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop 
keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && 
tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 
127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool 
disableautocompaction && sleep 30s && tools/bin/cassandra-stress mixed 
ratio\(write=0,read=100\) duration=10m cl=ONE -rate threads=100 -node localhost 
-log file=result.log -graph file=graph.html  {code}
 

 

Results using Ubuntu22.04 on r8g.24xlarge with stress test colocated on the 
same instance :
 * 4.1.3 released:

 
{code:java}
Results:
Op rate                   :  637,636 op/s  [READ: 637,636 op/s]
Partition rate            :  637,636 pk/s  [READ: 637,636 pk/s]
Row rate                  :  637,636 row/s [READ: 637,636 row/s]
Latency mean              :    0.1 ms [READ: 0.1 ms]
Latency median            :    0.1 ms [READ: 0.1 ms]
Latency 95th percentile   :    0.2 ms [READ: 0.2 ms]
Latency 99th percentile   :    0.2 ms [READ: 0.2 ms]
Latency 99.9th percentile :    2.6 ms [READ: 2.6 ms]
Latency max               :   21.8 ms [READ: 21.8 ms]
Total partitions          : 382,809,565 [READ: 382,809,565]
Total errors              :          0 [READ: 0]
Total GC count            : 3,379
Total GC memory           : 5406.356 GiB
Total GC time             :    5.5 seconds
Avg GC time               :    1.6 ms
StdDev GC time            :    0.8 ms
Total operation time      : 00:10:00 {code}
 
 * 4.1.3 with patch:

{code:java}
Results:
Op rate                   :  636,043 op/s  [READ: 636,043 op/s]
Partition rate            :  636,043 pk/s  [READ: 636,043 pk/s]
Row rate                  :  636,043 row/s [READ: 636,043 row/s]
Latency mean              :    0.1 ms [READ: 0.1 ms]
Latency median            :    0.1 ms [READ: 0.1 ms]
Latency 95th percentile   :    0.2 ms [READ: 0.2 ms]
Latency 99th percentile   :    0.2 ms [READ: 0.2 ms]
Latency 99.9th percentile :    2.7 ms [READ: 2.7 ms]
Latency max               :   16.2 ms [READ: 16.2 msz]
Total partitions          : 381,776,733 [READ: 381,776,733]
Total errors              :          0 [READ: 0]
Total GC count            : 3,396
Total GC memory           : 5433.623 GiB
Total GC time             :    5.6 seconds
Avg GC time               :    1.6 ms
StdDev GC time            :    0.5 ms
Total operation time      : 00:10:00 {code}

The patch doesn't have any effect on the read only workload

 

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && 

[jira] [Updated] (CASSANDRA-19427) Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries with multiple coordinator-local partitions

2024-02-27 Thread Abe Ratnofsky (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abe Ratnofsky updated CASSANDRA-19427:
--
Reviewers: Caleb Rackliffe, Stefan Miklosovic  (was: Stefan Miklosovic)

> Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries 
> with multiple coordinator-local partitions
> 
>
> Key: CASSANDRA-19427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Legacy/Local Write-Read Paths
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> On one of our clusters, we noticed rare but periodic 
> ArrayIndexOutOfBoundsExceptions:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-3,5,main]"
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException"{code}
>  
>  
> The error was in a Runnable, so the stacktrace didn't directly indicate where 
> the error was coming from. We enabled JFR to log the underlying exception 
> that was thrown:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-2,5,main]" 
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds 
> for length 0
> at java.base/java.util.ArrayList.add(ArrayList.java:487)
> at java.base/java.util.ArrayList.add(ArrayList.java:499)
> at org.apache.cassandra.service.ClientWarn$State.add(ClientWarn.java:84)
> at 
> org.apache.cassandra.service.ClientWarn$State.access$000(ClientWarn.java:77)
> at org.apache.cassandra.service.ClientWarn.warn(ClientWarn.java:51)
> at 
> org.apache.cassandra.db.ReadCommand$1MetricRecording.onClose(ReadCommand.java:596)
> at 
> org.apache.cassandra.db.transform.BasePartitions.runOnClose(BasePartitions.java:70)
> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95)
> at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2260)
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2575)
> ... 6 more"{code}
>  
>  
> An AIOBE on ArrayList.add(E) should only be possible when multiple threads 
> attempt to call the method at the same time.
>  
> This was seen while executing a SELECT WHERE IN query with multiple partition 
> keys. This exception could happen when multiple local reads are dispatched by 
> the coordinator in 
> org.apache.cassandra.service.reads.AbstractReadExecutor#makeRequests. In this 
> case, multiple local reads exceed the tombstone warning threshold, so 
> multiple tombstone warnings are added to the same ClientWarn.State reference. 
>  Currently, org.apache.cassandra.service.ClientWarn.State#warnings is an 
> ArrayList, which isn't safe for concurrent modification, causing the AIOBE to 
> be thrown.
>  
> I have a patch available for this, and I'm preparing it now. The patch is 
> simple - it just changes 
> org.apache.cassandra.service.ClientWarn.State#warnings to a thread-safe 
> CopyOnWriteArrayList. I also have a jvm-dtest that demonstrates the 

[jira] [Commented] (CASSANDRA-19427) Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries with multiple coordinator-local partitions

2024-02-27 Thread Abe Ratnofsky (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821428#comment-17821428
 ] 

Abe Ratnofsky commented on CASSANDRA-19427:
---

Had some discussion with [~maedhroz] about CopyOnWriteArrayList vs. 
Collections.synchronizedList. There's a reasonable argument that lock 
acquisition from synchronizedList is better than copying the entire list on 
mutation in CopyOnWriteArrayList, especially since CopyOnWriteArrayList is also 
synchronized when mutating the list. But the list should stay small in number 
of elements (there aren't many client warnings that exist in the first place, 
and unlikely that many will be returned on a single response) and each element 
is small (truncated to a fixed maximum length when added).

The main reason I chose CopyOnWriteArrayList is because of synchronizedList's 
semantics for iteration: 
[https://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#synchronizedList-java.util.List-]

> It is imperative that the user manually synchronize on the returned list when 
>iterating over it

I just don't love these semantics - even though we don't iterate over the list 
until we serialize the warnings into the response, it just feels too easy to 
miss that requirement to synchronize on iteration.

> Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries 
> with multiple coordinator-local partitions
> 
>
> Key: CASSANDRA-19427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Legacy/Local Write-Read Paths
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> On one of our clusters, we noticed rare but periodic 
> ArrayIndexOutOfBoundsExceptions:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-3,5,main]"
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException"{code}
>  
>  
> The error was in a Runnable, so the stacktrace didn't directly indicate where 
> the error was coming from. We enabled JFR to log the underlying exception 
> that was thrown:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-2,5,main]" 
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds 
> for length 0
> at java.base/java.util.ArrayList.add(ArrayList.java:487)
> at java.base/java.util.ArrayList.add(ArrayList.java:499)
> at org.apache.cassandra.service.ClientWarn$State.add(ClientWarn.java:84)
> at 
> org.apache.cassandra.service.ClientWarn$State.access$000(ClientWarn.java:77)
> at org.apache.cassandra.service.ClientWarn.warn(ClientWarn.java:51)
> at 
> org.apache.cassandra.db.ReadCommand$1MetricRecording.onClose(ReadCommand.java:596)
> at 
> org.apache.cassandra.db.transform.BasePartitions.runOnClose(BasePartitions.java:70)
> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95)
> at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2260)
> at 
> 

[jira] [Commented] (CASSANDRA-19427) Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries with multiple coordinator-local partitions

2024-02-27 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821427#comment-17821427
 ] 

Caleb Rackliffe commented on CASSANDRA-19427:
-

+1 on all branches

> Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries 
> with multiple coordinator-local partitions
> 
>
> Key: CASSANDRA-19427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Legacy/Local Write-Read Paths
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> On one of our clusters, we noticed rare but periodic 
> ArrayIndexOutOfBoundsExceptions:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-3,5,main]"
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException"{code}
>  
>  
> The error was in a Runnable, so the stacktrace didn't directly indicate where 
> the error was coming from. We enabled JFR to log the underlying exception 
> that was thrown:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-2,5,main]" 
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds 
> for length 0
> at java.base/java.util.ArrayList.add(ArrayList.java:487)
> at java.base/java.util.ArrayList.add(ArrayList.java:499)
> at org.apache.cassandra.service.ClientWarn$State.add(ClientWarn.java:84)
> at 
> org.apache.cassandra.service.ClientWarn$State.access$000(ClientWarn.java:77)
> at org.apache.cassandra.service.ClientWarn.warn(ClientWarn.java:51)
> at 
> org.apache.cassandra.db.ReadCommand$1MetricRecording.onClose(ReadCommand.java:596)
> at 
> org.apache.cassandra.db.transform.BasePartitions.runOnClose(BasePartitions.java:70)
> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95)
> at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2260)
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2575)
> ... 6 more"{code}
>  
>  
> An AIOBE on ArrayList.add(E) should only be possible when multiple threads 
> attempt to call the method at the same time.
>  
> This was seen while executing a SELECT WHERE IN query with multiple partition 
> keys. This exception could happen when multiple local reads are dispatched by 
> the coordinator in 
> org.apache.cassandra.service.reads.AbstractReadExecutor#makeRequests. In this 
> case, multiple local reads exceed the tombstone warning threshold, so 
> multiple tombstone warnings are added to the same ClientWarn.State reference. 
>  Currently, org.apache.cassandra.service.ClientWarn.State#warnings is an 
> ArrayList, which isn't safe for concurrent modification, causing the AIOBE to 
> be thrown.
>  
> I have a patch available for this, and I'm preparing it now. The patch is 
> simple - it just changes 
> org.apache.cassandra.service.ClientWarn.State#warnings to a thread-safe 
> CopyOnWriteArrayList. I also have a jvm-dtest that demonstrates the issue but 
> 

[jira] [Commented] (CASSANDRA-19222) Leak - Strong self-ref loop detected in BTI

2024-02-27 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821424#comment-17821424
 ] 

Ekaterina Dimitrova commented on CASSANDRA-19222:
-

I was looking into CASSANDRA-17056, and I stumbled into CASSANDRA-18737 where 
it is mentioned there are some leakages to be investigated, but I do not find 
any other info. [~jlewandowski] , do you remember anything?

> Leak - Strong self-ref loop detected in BTI
> ---
>
> Key: CASSANDRA-19222
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19222
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Jacek Lewandowski
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1233/workflows/bb617340-f1da-4550-9c87-5541469972c4/jobs/62534/tests
> {noformat}
> ERROR [Strong-Reference-Leak-Detector:1] 2023-12-21 09:50:33,072 Strong 
> self-ref loop detected 
> [/tmp/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/oa-1-big
> private java.util.List 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.closeables-java.util.ArrayList
> transient java.lang.Object[] 
> java.util.ArrayList.elementData-[Ljava.lang.Object;
> transient java.lang.Object[] 
> java.util.ArrayList.elementData-org.apache.cassandra.io.util.FileHandle
> final org.apache.cassandra.utils.concurrent.Ref 
> org.apache.cassandra.utils.concurrent.SharedCloseableImpl.ref-org.apache.cassandra.utils.concurrent.Ref
> final org.apache.cassandra.utils.concurrent.Ref$State 
> org.apache.cassandra.utils.concurrent.Ref.state-org.apache.cassandra.utils.concurrent.Ref$State
> final org.apache.cassandra.utils.concurrent.Ref$GlobalState 
> org.apache.cassandra.utils.concurrent.Ref$State.globalState-org.apache.cassandra.utils.concurrent.Ref$GlobalState
> private final org.apache.cassandra.utils.concurrent.RefCounted$Tidy 
> org.apache.cassandra.utils.concurrent.Ref$GlobalState.tidy-org.apache.cassandra.io.util.FileHandle$Cleanup
> final java.util.Optional 
> org.apache.cassandra.io.util.FileHandle$Cleanup.chunkCache-java.util.Optional
> private final java.lang.Object 
> java.util.Optional.value-org.apache.cassandra.cache.ChunkCache
> private final org.apache.cassandra.utils.memory.BufferPool 
> org.apache.cassandra.cache.ChunkCache.bufferPool-org.apache.cassandra.utils.memory.BufferPool
> private final java.util.Set 
> org.apache.cassandra.utils.memory.BufferPool.localPoolReferences-java.util.Collections$SetFromMap
> private final java.util.Map 
> java.util.Collections$SetFromMap.m-java.util.concurrent.ConcurrentHashMap
> private final java.util.Map 
> java.util.Collections$SetFromMap.m-org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef
> private final org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks 
> org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef.chunks-org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks
> private org.apache.cassandra.utils.memory.BufferPool$Chunk 
> org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.chunk0-org.apache.cassandra.utils.memory.BufferPool$Chunk
> private volatile org.apache.cassandra.utils.memory.BufferPool$LocalPool 
> org.apache.cassandra.utils.memory.BufferPool$Chunk.owner-org.apache.cassandra.utils.memory.BufferPool$LocalPool
> private final java.lang.Thread 
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.owningThread-io.netty.util.concurrent.FastThreadLocalThread
> private java.lang.Runnable 
> java.lang.Thread.target-io.netty.util.concurrent.FastThreadLocalRunnable
> private final java.lang.Runnable 
> io.netty.util.concurrent.FastThreadLocalRunnable.runnable-java.util.concurrent.ThreadPoolExecutor$Worker
> final java.util.concurrent.ThreadPoolExecutor 
> java.util.concurrent.ThreadPoolExecutor$Worker.this$0-org.apache.cassandra.concurrent.ScheduledThreadPoolExecutorPlus
> private final java.util.concurrent.BlockingQueue 
> java.util.concurrent.ThreadPoolExecutor.workQueue-java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue
> private final java.util.concurrent.BlockingQueue 
> java.util.concurrent.ThreadPoolExecutor.workQueue-java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
> private java.util.concurrent.Callable 
> java.util.concurrent.FutureTask.callable-java.util.concurrent.Executors$RunnableAdapter
> private final java.lang.Runnable 
> java.util.concurrent.Executors$RunnableAdapter.task-org.apache.cassandra.concurrent.ExecutionFailure$1
> final java.lang.Runnable 
> org.apache.cassandra.concurrent.ExecutionFailure$1.val$wrap-org.apache.cassandra.hints.HintsService$$Lambda$1142/0x000801576aa0
> private final org.apache.cassandra.hints.HintsService 
> 

[jira] [Updated] (CASSANDRA-19447) Register the measurements of the bootstrap as Dropwizard metrics

2024-02-27 Thread Maxim Muzafarov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Muzafarov updated CASSANDRA-19447:

Test and Documentation Plan: Run BootstrapTest and FailedBootstrapTest 
which are a part of the patch. Run CI
 Status: Patch Available  (was: Open)

> Register the measurements of the bootstrap as Dropwizard metrics
> 
>
> Key: CASSANDRA-19447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19447
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/JMX, Observability/Metrics
>Reporter: Maxim Muzafarov
>Assignee: Maxim Muzafarov
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we can view the node's bootstrap state in the following ways:
> - via the nodetool cli tool, e.g. by running the "resume" command;
> - querying the bootstrapped column of the system_veis.local virtual table;
> In addition, we can also expose the status and state of the node's bootstrap 
> via JMX. This is used by third-party tools that rely entirely on the JMX API 
> and don't have access to the CQL interface. The operator will be able to get 
> all the information they need from the dashboards without having to use CLIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19447) Register the measurements of the bootstrap as Dropwizard metrics

2024-02-27 Thread Maxim Muzafarov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Muzafarov updated CASSANDRA-19447:

Fix Version/s: 5.x

> Register the measurements of the bootstrap as Dropwizard metrics
> 
>
> Key: CASSANDRA-19447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19447
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/JMX, Observability/Metrics
>Reporter: Maxim Muzafarov
>Assignee: Maxim Muzafarov
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we can view the node's bootstrap state in the following ways:
> - via the nodetool cli tool, e.g. by running the "resume" command;
> - querying the bootstrapped column of the system_veis.local virtual table;
> In addition, we can also expose the status and state of the node's bootstrap 
> via JMX. This is used by third-party tools that rely entirely on the JMX API 
> and don't have access to the CQL interface. The operator will be able to get 
> all the information they need from the dashboards without having to use CLIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19447) Register the measurements of the bootstrap as Dropwizard metrics

2024-02-27 Thread Maxim Muzafarov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Muzafarov updated CASSANDRA-19447:

Change Category: Operability
 Complexity: Normal
Component/s: Observability/JMX
 Observability/Metrics
 Status: Open  (was: Triage Needed)

> Register the measurements of the bootstrap as Dropwizard metrics
> 
>
> Key: CASSANDRA-19447
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19447
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/JMX, Observability/Metrics
>Reporter: Maxim Muzafarov
>Assignee: Maxim Muzafarov
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we can view the node's bootstrap state in the following ways:
> - via the nodetool cli tool, e.g. by running the "resume" command;
> - querying the bootstrapped column of the system_veis.local virtual table;
> In addition, we can also expose the status and state of the node's bootstrap 
> via JMX. This is used by third-party tools that rely entirely on the JMX API 
> and don't have access to the CQL interface. The operator will be able to get 
> all the information they need from the dashboards without having to use CLIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19447) Register the measurements of the bootstrap as Dropwizard metrics

2024-02-27 Thread Maxim Muzafarov (Jira)
Maxim Muzafarov created CASSANDRA-19447:
---

 Summary: Register the measurements of the bootstrap as Dropwizard 
metrics
 Key: CASSANDRA-19447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19447
 Project: Cassandra
  Issue Type: Improvement
Reporter: Maxim Muzafarov
Assignee: Maxim Muzafarov


Currently, we can view the node's bootstrap state in the following ways:
- via the nodetool cli tool, e.g. by running the "resume" command;
- querying the bootstrapped column of the system_veis.local virtual table;

In addition, we can also expose the status and state of the node's bootstrap 
via JMX. This is used by third-party tools that rely entirely on the JMX API 
and don't have access to the CQL interface. The operator will be able to get 
all the information they need from the dashboards without having to use CLIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19414) Skinny dev circle workflow

2024-02-27 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821390#comment-17821390
 ] 

Ekaterina Dimitrova commented on CASSANDRA-19414:
-

PRs approved, thanks

> Skinny dev circle workflow
> --
>
> Key: CASSANDRA-19414
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19414
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CI
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> CircleCi CI runs are getting pretty heavy. During dev iterations we trigger 
> many CI pre-commit jobs which are just an overkill.
> This ticket has the purpose to purge from the pre-commit workflow all 
> variations of the test matrix but the vanilla one. That should enable us for 
> a quick and cheap to iterate *during dev*, this is not a substitute for 
> pre-commit . This ticket's work will serve as the basis for the upcoming 
> changes being discussed 
> [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821388#comment-17821388
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


2. Test without compaction after writes and nodetool disableautocompaction
Cmd:

 
{code:java}
bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop 
keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && 
tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 
127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool 
disableautocompaction && sleep 30s && tools/bin/cassandra-stress mixed 
ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost 
-log file=result.log -graph file=graph.html

## Compact and re-run it
bin/nodetool compact keyspace1 && sleep 30s && tools/bin/cassandra-stress mixed 
ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost 
-log file=result.log -graph file=graph.html |& tee stress.txt
 {code}
 

Results using Ubuntu22.04 on r8g.24xlarge with stress test colocated on the 
same instance :
 * 4.1.3 released:

{code:java}
Results:
Op rate                   :  135,805 op/s  [READ: 122,231 op/s, WRITE: 13,574 
op/s]
Partition rate            :  135,805 pk/s  [READ: 122,231 pk/s, WRITE: 13,574 
pk/s]
Row rate                  :  135,805 row/s [READ: 122,231 row/s, WRITE: 13,574 
row/s]
Latency mean              :    0.7 ms [READ: 0.8 ms, WRITE: 0.2 ms]
Latency median            :    0.6 ms [READ: 0.7 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    1.9 ms [READ: 2.0 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    2.6 ms [READ: 2.6 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    7.0 ms [READ: 7.2 ms, WRITE: 1.3 ms]
Latency max               :   51.3 ms [READ: 51.3 ms, WRITE: 48.8 ms]
Total partitions          : 81,488,855 [READ: 73,343,700, WRITE: 8,145,155]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 1,583
Total GC memory           : 2522.153 GiB
Total GC time             :    7.4 seconds
Avg GC time               :    4.7 ms
StdDev GC time            :    2.2 ms
Total operation time      : 00:10:00

## Compact and re-run it
bin/nodetool compact keyspace1   && sleep 30s && tools/bin/cassandra-stress 
mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node 
localhost -log file=result.log -graph file=graph.html |& tee stress.txt
...
Results:
Op rate                   :  136,878 op/s  [READ: 123,177 op/s, WRITE: 13,701 
op/s]
Partition rate            :  136,878 pk/s  [READ: 123,177 pk/s, WRITE: 13,701 
pk/s]
Row rate                  :  136,878 row/s [READ: 123,177 row/s, WRITE: 13,701 
row/s]
Latency mean              :    0.7 ms [READ: 0.8 ms, WRITE: 0.2 ms]
Latency median            :    0.6 ms [READ: 0.7 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    1.9 ms [READ: 2.0 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    2.6 ms [READ: 2.6 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    6.5 ms [READ: 6.7 ms, WRITE: 1.2 ms]
Latency max               :   52.6 ms [READ: 52.6 ms, WRITE: 50.2 ms]
Total partitions          : 82,197,489 [READ: 73,969,820, WRITE: 8,227,669]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 1,395
Total GC memory           : 2225.329 GiB
Total GC time             :    6.6 seconds
Avg GC time               :    4.7 ms
StdDev GC time            :    2.2 ms
Total operation time      : 00:10:00{code}
 
 * 4.1.3 with patch:

{code:java}
Results:
Op rate                   :  241,176 op/s  [READ: 217,059 op/s, WRITE: 24,117 
op/s]
Partition rate            :  241,176 pk/s  [READ: 217,059 pk/s, WRITE: 24,117 
pk/s]
Row rate                  :  241,176 row/s [READ: 217,059 row/s, WRITE: 24,117 
row/s]
Latency mean              :    0.4 ms [READ: 0.4 ms, WRITE: 0.2 ms]
Latency median            :    0.3 ms [READ: 0.3 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    0.7 ms [READ: 0.7 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    0.8 ms [READ: 0.8 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    7.2 ms [READ: 7.3 ms, WRITE: 5.1 ms]
Latency max               : 5003.8 ms [READ: 5,003.8 ms, WRITE: 48.5 ms]
Total partitions          : 144,931,367 [READ: 130,438,344, WRITE: 14,493,023]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 4,186
Total GC memory           : 6673.759 GiB
Total GC time             :   23.3 seconds
Avg GC time               :    5.6 ms
StdDev GC time            :    3.7 ms
Total operation time      : 00:10:00

## Compact and re-run it
bin/nodetool compact keyspace1   && sleep 30s && tools/bin/cassandra-stress 
mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node 
localhost -log file=result.log -graph file=graph.html |& tee stress.txt
...
Results:
Op rate                   :  232,130 op/s  [READ: 208,904 op/s, 

[jira] [Commented] (CASSANDRA-19427) Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries with multiple coordinator-local partitions

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821387#comment-17821387
 ] 

Stefan Miklosovic commented on CASSANDRA-19427:
---

[CASSANDRA-19427-4.0|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19427-4.0]
{noformat}
java11_pre-commit_tests 
  ✓ j11_build1m 40s
  ✓ j11_cqlsh-dtests-py2-no-vnodes6m 0s
  ✓ j11_cqlsh-dtests-py2-with-vnodes 5m 16s
  ✓ j11_cqlsh_dtests_py3  5m 8s
  ✓ j11_cqlsh_dtests_py311   5m 16s
  ✓ j11_cqlsh_dtests_py311_vnode 5m 45s
  ✓ j11_cqlsh_dtests_py385m 31s
  ✓ j11_cqlsh_dtests_py38_vnode  5m 25s
  ✓ j11_cqlsh_dtests_py3_vnode   5m 23s
  ✓ j11_cqlshlib_tests   7m 11s
  ✓ j11_dtests  32m 56s
  ✓ j11_dtests_vnode34m 37s
  ✓ j11_jvm_dtests  11m 59s
  ✕ j11_unit_tests8m 6s
  org.apache.cassandra.net.ConnectionTest testTimeout
  org.apache.cassandra.cql3.MemtableSizeTest testTruncationReleasesLogSpace
java11_separate_tests
java8_pre-commit_tests  
java8_separate_tests 
{noformat}

[java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3932/workflows/155c855b-bfcb-4c9e-a940-2848966d6bb1]
[java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3932/workflows/cd8a246e-5e62-48eb-bca3-7370d135b2dd]
[java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3932/workflows/ada174c4-2dc0-449a-9c88-162f69bdf7b5]
[java8_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3932/workflows/f3e98051-a2d4-4da0-bbc3-1f900166ce96]


> Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries 
> with multiple coordinator-local partitions
> 
>
> Key: CASSANDRA-19427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Legacy/Local Write-Read Paths
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> On one of our clusters, we noticed rare but periodic 
> ArrayIndexOutOfBoundsExceptions:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-3,5,main]"
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException"{code}
>  
>  
> The error was in a Runnable, so the stacktrace didn't directly indicate where 
> the error was coming from. We enabled JFR to log the underlying exception 
> that was thrown:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-2,5,main]" 
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds 
> for 

[jira] [Updated] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dipietro Salvatore updated CASSANDRA-19429:
---
Attachment: Screenshot 2024-02-27 at 11.29.41.png

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821384#comment-17821384
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

I will try to provision similar machine in our AWS infrastructure and I will 
let you know. I just need to see this increase on my own, don't take it 
personally. 

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821383#comment-17821383
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


1. Test without compaction after writes.
Cmd used:
{code:java}
sbin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop 
keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && 
tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 
127.0.0.1 -log file=cload.log -graph file=cload.html && sleep 30s && 
tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE 
-rate threads=100 -node localhost -log file=result.log -graph file=graph.html  
{code}
 

Results using Ubuntu22.04 on r8g.24xlarge with stress test colocated on the 
same instance :

- 4.1.3 released:
{code:java}
Results:
Op rate                   :  167,478 op/s  [READ: 150,727 op/s, WRITE: 16,750 
op/s]
Partition rate            :  167,478 pk/s  [READ: 150,727 pk/s, WRITE: 16,750 
pk/s]
Row rate                  :  167,478 row/s [READ: 150,727 row/s, WRITE: 16,750 
row/s]
Latency mean              :    0.6 ms [READ: 0.6 ms, WRITE: 0.2 ms]
Latency median            :    0.4 ms [READ: 0.5 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    1.6 ms [READ: 1.6 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    2.2 ms [READ: 2.2 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    7.1 ms [READ: 7.3 ms, WRITE: 1.2 ms]
Latency max               :   48.3 ms [READ: 48.3 ms, WRITE: 46.0 ms]
Total partitions          : 100,600,039 [READ: 90,538,533, WRITE: 10,061,506]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 1,515
Total GC memory           : 2411.407 GiB
Total GC time             :    8.3 seconds
Avg GC time               :    5.5 ms
StdDev GC time            :    2.4 ms
Total operation time      : 00:10:00{code}
 

- 4.1.3 with patch:
{code:java}
Results:
Op rate                   :  435,180 op/s  [READ: 391,655 op/s, WRITE: 43,525 
op/s]
Partition rate            :  435,180 pk/s  [READ: 391,655 pk/s, WRITE: 43,525 
pk/s]
Row rate                  :  435,180 row/s [READ: 391,655 row/s, WRITE: 43,525 
row/s]
Latency mean              :    0.2 ms [READ: 0.2 ms, WRITE: 0.2 ms]
Latency median            :    0.2 ms [READ: 0.2 ms, WRITE: 0.2 ms]
Latency 95th percentile   :    0.3 ms [READ: 0.3 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    0.4 ms [READ: 0.4 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    6.7 ms [READ: 6.7 ms, WRITE: 6.2 ms]
Latency max               : 1057.5 ms [READ: 1,057.5 ms, WRITE: 47.7 ms]
Total partitions          : 261,410,615 [READ: 235,265,301, WRITE: 26,145,314]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 4,543
Total GC memory           : 7225.988 GiB
Total GC time             :   25.7 seconds
Avg GC time               :    5.7 ms
StdDev GC time            :    3.1 ms
Total operation time      : 00:10:00{code}

Patch seems to have huge benefit on this case as well (2.6x).
Noticed also huge benefit in P99 latency (5.5x decrease).

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists 

[jira] [Commented] (CASSANDRA-19427) Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries with multiple coordinator-local partitions

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821381#comment-17821381
 ] 

Stefan Miklosovic commented on CASSANDRA-19427:
---

[CASSANDRA-19427-3.11|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19427-3.11]
{noformat}
pre-commit_tests
  ✓ build6m 36s
  ✓ j8_cqlshlib_cython_tests 8m 10s
  ✓ j8_cqlshlib_tests7m 59s
  ✓ j8_unit_tests8m 52s
  ✕ j8_dtests36m 4s
  repair_tests.repair_test.TestRepair test_dead_sync_initiator
  ✕ j8_dtests_vnode 39m 49s
  repair_tests.repair_test.TestRepair test_dead_sync_initiator
  ✕ j8_jvm_dtests8m 20s
  org.apache.cassandra.distributed.test.ReprepareOldBehaviourTest 
testReprepareMixedVersionWithoutReset
separate_tests   
{noformat}

[pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3931/workflows/1fb872ef-8cc0-4c00-848a-14b7704e5ae9]
[separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3931/workflows/cb1cca77-41e7-4b84-956b-ddfa3831d57e]


> Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries 
> with multiple coordinator-local partitions
> 
>
> Key: CASSANDRA-19427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Legacy/Local Write-Read Paths
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> On one of our clusters, we noticed rare but periodic 
> ArrayIndexOutOfBoundsExceptions:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-3,5,main]"
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException"{code}
>  
>  
> The error was in a Runnable, so the stacktrace didn't directly indicate where 
> the error was coming from. We enabled JFR to log the underlying exception 
> that was thrown:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-2,5,main]" 
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds 
> for length 0
> at java.base/java.util.ArrayList.add(ArrayList.java:487)
> at java.base/java.util.ArrayList.add(ArrayList.java:499)
> at org.apache.cassandra.service.ClientWarn$State.add(ClientWarn.java:84)
> at 
> org.apache.cassandra.service.ClientWarn$State.access$000(ClientWarn.java:77)
> at org.apache.cassandra.service.ClientWarn.warn(ClientWarn.java:51)
> at 
> org.apache.cassandra.db.ReadCommand$1MetricRecording.onClose(ReadCommand.java:596)
> at 
> org.apache.cassandra.db.transform.BasePartitions.runOnClose(BasePartitions.java:70)
> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95)
> at 
> 

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821377#comment-17821377
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


> But yeah ... I am not running this on r8g.24xlarge or r7i.24xlarge .
Not sure what is your HW but, based on our tests, the benefit of removing this 
lock starts using 4xlarge instance and above with ~1.40x improvement (tested on 
r7g.4xl and r7i.4xl). On small instances like r7i.xlarge, not a significant 
different has been recorded.


> [~dipiets] I think the mistake you do is that you do "nodetool compact" after 
>you write the data and then you run mixed workload against that.

Not sure I have got this completely. Why the "nodetool compact" has such huge 
difference with the patch and not with the released version? I see that it uses 
1 SSTable but why does not have similar performance if this is the problem?

I am testing the patch as you suggesting

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19368) Add way for SAI to disable row to token index so internal tables may leverage SAI

2024-02-27 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-19368:
--
Resolution: Won't Fix
Status: Resolved  (was: Open)

There was a conversation in slack and the take away is that the Accord work is 
far too custom that it makes more sense to build its own index... We can 
refactor SAI so that building new indexes are lower effort, but trying to bend 
SAI to work with the Accord requirements is not worth the effort.

What were the Accord requirements?

* Accord index only works on a single system table and works for a single query
* system tables are LocalPartitioner, SAI only works with Murmur
* We write a "blob" which Accord happens to know is a internal data structure 
called a "route", we index the sub-elements of this route and search off those 
sub-elements... SAI only knows "blob" so we are trying to trick SAI to work for 
us (we have a blob that is ordered using blob ordering, but routes are ordered 
differently... so we need to construct invalid blobs to match the SAI order so 
filtering doesn't break us...)

What Accord gains from SAI is the "storage attached index" and not the query 
simplification, so refactoring the management lets Accord leverage SAI but 
define its own custom semantics.

> Add way for SAI to disable row to token index so internal tables may leverage 
> SAI
> -
>
> Key: CASSANDRA-19368
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19368
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/2i Index
>Reporter: David Capwell
>Priority: Normal
> Fix For: 5.x
>
>
> Internal tables tend to use LocalPartitioner and may not actually have murmur 
> tokens but rather LocalPartitioner, which is variable length bytes tokens!  
> For internal use cases we don’t always care about paging so don’t really need 
> this index to function.
> The use case motivating this work is for Accord, we wish to add a custom SAI 
> index on the system_accord.commands#routes column.  Since this logic is 
> purely internal we don’t care about paging, but can not leverage SAI at this 
> moment as it hard codes murmur tokens, and fails during memtable flush



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] Include support for comma seperated strings for basic.contact-points [cassandra-java-driver]

2024-02-27 Thread via GitHub


aratno commented on PR #1897:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1897#issuecomment-1967354072

   Is there a reason you can't configure Pekko to use a custom session? It 
looks like that should be possible from this documentation: 
https://pekko.apache.org/docs/pekko-connectors/current/cassandra.html#custom-session-creation
   
   You can create a custom CqlSessionProvider that sets contact points based on 
the environment: 
https://pekko.apache.org/api/pekko-connectors/snapshot/org/apache/pekko/stream/connectors/cassandra/CqlSessionProvider.html
   
   The main drawbacks of the implementation in this PR are:
   1. Changing the parsing behavior of `basic.contact-points` could break 
existing client configurations
   2. Allowing some list-typed configurations to support comma-separated 
strings when others don't is confusing for users


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19222) Leak - Strong self-ref loop detected in BTI

2024-02-27 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821362#comment-17821362
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-19222 at 2/27/24 5:55 PM:
--

Jenkins has something like 30 runs history and the issue is reported observed 
in CircleCI, so I ran some tests trying to reproduce it there with no success 
running repeated runs on current 5.0 and trunk plus on top of the original 
commit where it was reported (thinking whether it might have been just fixed 
somewhere in between):

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-5.0]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-trunk]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-5.0]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-trunk]

We do have the leak detector trace in the description, which is what we use to 
solve the strong self-ref loops (examples - CASSANDRA-11120, CASSANDRA-17205, 
CASSANDRA-12413) but I am not sure what test to add to reproduce it (so we can 
also test potential fix). [~jlewandowski] , any thoughts here? You've worked 
way more in that part of the code than I've. 


was (Author: e.dimitrova):
Jenkins has something like 30 runs history and the issue is reported observed 
in CircleCI, so I ran some tests trying to reproduce it there with no success 
running repeated runs on current 5.0 and trunk plus on top of the original 
commit where it was reported (thinking whether it might have been just fixed 
somewhere in between):

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-5.0]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-trunk]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-5.0]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-trunk]

We do have the leak detector trace in the description which is what we use to 
solve the strong self-ref loops but I am not sure what test to add to reproduce 
it (so we can test also potential fix). [~jlewandowski] , any thoughts here? 
You've worked way more in that part of the code than I've. 

> Leak - Strong self-ref loop detected in BTI
> ---
>
> Key: CASSANDRA-19222
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19222
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Jacek Lewandowski
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1233/workflows/bb617340-f1da-4550-9c87-5541469972c4/jobs/62534/tests
> {noformat}
> ERROR [Strong-Reference-Leak-Detector:1] 2023-12-21 09:50:33,072 Strong 
> self-ref loop detected 
> [/tmp/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/oa-1-big
> private java.util.List 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.closeables-java.util.ArrayList
> transient java.lang.Object[] 
> java.util.ArrayList.elementData-[Ljava.lang.Object;
> transient java.lang.Object[] 
> java.util.ArrayList.elementData-org.apache.cassandra.io.util.FileHandle
> final org.apache.cassandra.utils.concurrent.Ref 
> org.apache.cassandra.utils.concurrent.SharedCloseableImpl.ref-org.apache.cassandra.utils.concurrent.Ref
> final org.apache.cassandra.utils.concurrent.Ref$State 
> org.apache.cassandra.utils.concurrent.Ref.state-org.apache.cassandra.utils.concurrent.Ref$State
> final org.apache.cassandra.utils.concurrent.Ref$GlobalState 
> org.apache.cassandra.utils.concurrent.Ref$State.globalState-org.apache.cassandra.utils.concurrent.Ref$GlobalState
> private final org.apache.cassandra.utils.concurrent.RefCounted$Tidy 
> org.apache.cassandra.utils.concurrent.Ref$GlobalState.tidy-org.apache.cassandra.io.util.FileHandle$Cleanup
> final java.util.Optional 
> org.apache.cassandra.io.util.FileHandle$Cleanup.chunkCache-java.util.Optional
> private final java.lang.Object 
> java.util.Optional.value-org.apache.cassandra.cache.ChunkCache
> private final org.apache.cassandra.utils.memory.BufferPool 
> org.apache.cassandra.cache.ChunkCache.bufferPool-org.apache.cassandra.utils.memory.BufferPool
> private final java.util.Set 
> org.apache.cassandra.utils.memory.BufferPool.localPoolReferences-java.util.Collections$SetFromMap
> private final java.util.Map 
> java.util.Collections$SetFromMap.m-java.util.concurrent.ConcurrentHashMap
> private final java.util.Map 
> java.util.Collections$SetFromMap.m-org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef
> private final org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks 
> 

[jira] [Comment Edited] (CASSANDRA-19222) Leak - Strong self-ref loop detected in BTI

2024-02-27 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821362#comment-17821362
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-19222 at 2/27/24 5:53 PM:
--

Jenkins has something like 30 runs history and the issue is reported observed 
in CircleCI, so I ran some tests trying to reproduce it there with no success 
running repeated runs on current 5.0 and trunk plus on top of the original 
commit where it was reported (thinking whether it might have been just fixed 
somewhere in between):

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-5.0]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-trunk]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-5.0]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-trunk]

We do have the leak detector trace in the description which is what we use to 
solve the strong self-ref loops but I am not sure what test to add to reproduce 
it (so we can test also potential fix). [~jlewandowski] , any thoughts here? 
You've worked way more in that part of the code than I've. 


was (Author: e.dimitrova):
Jenkins has something like 30 runs history and the issue is reported observed 
in CircleCI, so I ran some tests trying to reproduce it there with no success 
running repeated runs on current 5.0 and trunk plus on top of the original 
commit where it was reported (thinking whether it might have been just fixed 
somewhere in between):

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-5.0]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-trunk]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-5.0]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-trunk]

We do have the leak detector trace in the description which is what we use to 
solve the strong self-ref loops but I am not sure what test to add to reproduce 
it. [~jlewandowski] , any thoughts here? You've been working way more in that 
part of the code than me. 

> Leak - Strong self-ref loop detected in BTI
> ---
>
> Key: CASSANDRA-19222
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19222
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Jacek Lewandowski
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1233/workflows/bb617340-f1da-4550-9c87-5541469972c4/jobs/62534/tests
> {noformat}
> ERROR [Strong-Reference-Leak-Detector:1] 2023-12-21 09:50:33,072 Strong 
> self-ref loop detected 
> [/tmp/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/oa-1-big
> private java.util.List 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.closeables-java.util.ArrayList
> transient java.lang.Object[] 
> java.util.ArrayList.elementData-[Ljava.lang.Object;
> transient java.lang.Object[] 
> java.util.ArrayList.elementData-org.apache.cassandra.io.util.FileHandle
> final org.apache.cassandra.utils.concurrent.Ref 
> org.apache.cassandra.utils.concurrent.SharedCloseableImpl.ref-org.apache.cassandra.utils.concurrent.Ref
> final org.apache.cassandra.utils.concurrent.Ref$State 
> org.apache.cassandra.utils.concurrent.Ref.state-org.apache.cassandra.utils.concurrent.Ref$State
> final org.apache.cassandra.utils.concurrent.Ref$GlobalState 
> org.apache.cassandra.utils.concurrent.Ref$State.globalState-org.apache.cassandra.utils.concurrent.Ref$GlobalState
> private final org.apache.cassandra.utils.concurrent.RefCounted$Tidy 
> org.apache.cassandra.utils.concurrent.Ref$GlobalState.tidy-org.apache.cassandra.io.util.FileHandle$Cleanup
> final java.util.Optional 
> org.apache.cassandra.io.util.FileHandle$Cleanup.chunkCache-java.util.Optional
> private final java.lang.Object 
> java.util.Optional.value-org.apache.cassandra.cache.ChunkCache
> private final org.apache.cassandra.utils.memory.BufferPool 
> org.apache.cassandra.cache.ChunkCache.bufferPool-org.apache.cassandra.utils.memory.BufferPool
> private final java.util.Set 
> org.apache.cassandra.utils.memory.BufferPool.localPoolReferences-java.util.Collections$SetFromMap
> private final java.util.Map 
> java.util.Collections$SetFromMap.m-java.util.concurrent.ConcurrentHashMap
> private final java.util.Map 
> java.util.Collections$SetFromMap.m-org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef
> private final org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks 
> 

[jira] [Comment Edited] (CASSANDRA-19222) Leak - Strong self-ref loop detected in BTI

2024-02-27 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821362#comment-17821362
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-19222 at 2/27/24 5:47 PM:
--

Jenkins has something like 30 runs history and the issue is reported observed 
in CircleCI, so I ran some tests trying to reproduce it there with no success 
running repeated runs on current 5.0 and trunk plus on top of the original 
commit where it was reported (thinking whether it might have been just fixed 
somewhere in between):

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-5.0]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-trunk]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-5.0]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-trunk]

We do have the leak detector trace in the description which is what we use to 
solve the strong self-ref loops but I am not sure what test to add to reproduce 
it. [~jlewandowski] , any thoughts here? You've been working way more in that 
part of the code than me. 


was (Author: e.dimitrova):
Jenkins has something like 30 runs history and the issue is reported observed 
in CircleCI, so I ran some tests trying to reproduce it there with no success 
running repeated runs on current 5.0 and trunk plus on top of the original 
commit where it was reported (it might have been just fixed somewhere in 
between):

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-5.0]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-trunk]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-5.0]

https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-trunk

We do have the leak detector trace in the description which is what we use to 
solve the strong self-ref loops but I am not sure what test to add to reproduce 
it. [~jlewandowski] , any thoughts here? You've been working way more in that 
part of the code than me. 

> Leak - Strong self-ref loop detected in BTI
> ---
>
> Key: CASSANDRA-19222
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19222
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Jacek Lewandowski
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1233/workflows/bb617340-f1da-4550-9c87-5541469972c4/jobs/62534/tests
> {noformat}
> ERROR [Strong-Reference-Leak-Detector:1] 2023-12-21 09:50:33,072 Strong 
> self-ref loop detected 
> [/tmp/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/oa-1-big
> private java.util.List 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.closeables-java.util.ArrayList
> transient java.lang.Object[] 
> java.util.ArrayList.elementData-[Ljava.lang.Object;
> transient java.lang.Object[] 
> java.util.ArrayList.elementData-org.apache.cassandra.io.util.FileHandle
> final org.apache.cassandra.utils.concurrent.Ref 
> org.apache.cassandra.utils.concurrent.SharedCloseableImpl.ref-org.apache.cassandra.utils.concurrent.Ref
> final org.apache.cassandra.utils.concurrent.Ref$State 
> org.apache.cassandra.utils.concurrent.Ref.state-org.apache.cassandra.utils.concurrent.Ref$State
> final org.apache.cassandra.utils.concurrent.Ref$GlobalState 
> org.apache.cassandra.utils.concurrent.Ref$State.globalState-org.apache.cassandra.utils.concurrent.Ref$GlobalState
> private final org.apache.cassandra.utils.concurrent.RefCounted$Tidy 
> org.apache.cassandra.utils.concurrent.Ref$GlobalState.tidy-org.apache.cassandra.io.util.FileHandle$Cleanup
> final java.util.Optional 
> org.apache.cassandra.io.util.FileHandle$Cleanup.chunkCache-java.util.Optional
> private final java.lang.Object 
> java.util.Optional.value-org.apache.cassandra.cache.ChunkCache
> private final org.apache.cassandra.utils.memory.BufferPool 
> org.apache.cassandra.cache.ChunkCache.bufferPool-org.apache.cassandra.utils.memory.BufferPool
> private final java.util.Set 
> org.apache.cassandra.utils.memory.BufferPool.localPoolReferences-java.util.Collections$SetFromMap
> private final java.util.Map 
> java.util.Collections$SetFromMap.m-java.util.concurrent.ConcurrentHashMap
> private final java.util.Map 
> java.util.Collections$SetFromMap.m-org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef
> private final org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks 
> org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef.chunks-org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks
> private 

[jira] [Commented] (CASSANDRA-19222) Leak - Strong self-ref loop detected in BTI

2024-02-27 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821362#comment-17821362
 ] 

Ekaterina Dimitrova commented on CASSANDRA-19222:
-

Jenkins has something like 30 runs history and the issue is reported observed 
in CircleCI, so I ran some tests trying to reproduce it there with no success 
running repeated runs on current 5.0 and trunk plus on top of the original 
commit where it was reported (it might have been just fixed somewhere in 
between):

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-5.0]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19222-trunk]

[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-5.0]

https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19197-trunk

We do have the leak detector trace in the description which is what we use to 
solve the strong self-ref loops but I am not sure what test to add to reproduce 
it. [~jlewandowski] , any thoughts here? You've been working way more in that 
part of the code than me. 

> Leak - Strong self-ref loop detected in BTI
> ---
>
> Key: CASSANDRA-19222
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19222
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Jacek Lewandowski
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1233/workflows/bb617340-f1da-4550-9c87-5541469972c4/jobs/62534/tests
> {noformat}
> ERROR [Strong-Reference-Leak-Detector:1] 2023-12-21 09:50:33,072 Strong 
> self-ref loop detected 
> [/tmp/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/oa-1-big
> private java.util.List 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.closeables-java.util.ArrayList
> transient java.lang.Object[] 
> java.util.ArrayList.elementData-[Ljava.lang.Object;
> transient java.lang.Object[] 
> java.util.ArrayList.elementData-org.apache.cassandra.io.util.FileHandle
> final org.apache.cassandra.utils.concurrent.Ref 
> org.apache.cassandra.utils.concurrent.SharedCloseableImpl.ref-org.apache.cassandra.utils.concurrent.Ref
> final org.apache.cassandra.utils.concurrent.Ref$State 
> org.apache.cassandra.utils.concurrent.Ref.state-org.apache.cassandra.utils.concurrent.Ref$State
> final org.apache.cassandra.utils.concurrent.Ref$GlobalState 
> org.apache.cassandra.utils.concurrent.Ref$State.globalState-org.apache.cassandra.utils.concurrent.Ref$GlobalState
> private final org.apache.cassandra.utils.concurrent.RefCounted$Tidy 
> org.apache.cassandra.utils.concurrent.Ref$GlobalState.tidy-org.apache.cassandra.io.util.FileHandle$Cleanup
> final java.util.Optional 
> org.apache.cassandra.io.util.FileHandle$Cleanup.chunkCache-java.util.Optional
> private final java.lang.Object 
> java.util.Optional.value-org.apache.cassandra.cache.ChunkCache
> private final org.apache.cassandra.utils.memory.BufferPool 
> org.apache.cassandra.cache.ChunkCache.bufferPool-org.apache.cassandra.utils.memory.BufferPool
> private final java.util.Set 
> org.apache.cassandra.utils.memory.BufferPool.localPoolReferences-java.util.Collections$SetFromMap
> private final java.util.Map 
> java.util.Collections$SetFromMap.m-java.util.concurrent.ConcurrentHashMap
> private final java.util.Map 
> java.util.Collections$SetFromMap.m-org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef
> private final org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks 
> org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef.chunks-org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks
> private org.apache.cassandra.utils.memory.BufferPool$Chunk 
> org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.chunk0-org.apache.cassandra.utils.memory.BufferPool$Chunk
> private volatile org.apache.cassandra.utils.memory.BufferPool$LocalPool 
> org.apache.cassandra.utils.memory.BufferPool$Chunk.owner-org.apache.cassandra.utils.memory.BufferPool$LocalPool
> private final java.lang.Thread 
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.owningThread-io.netty.util.concurrent.FastThreadLocalThread
> private java.lang.Runnable 
> java.lang.Thread.target-io.netty.util.concurrent.FastThreadLocalRunnable
> private final java.lang.Runnable 
> io.netty.util.concurrent.FastThreadLocalRunnable.runnable-java.util.concurrent.ThreadPoolExecutor$Worker
> final java.util.concurrent.ThreadPoolExecutor 
> java.util.concurrent.ThreadPoolExecutor$Worker.this$0-org.apache.cassandra.concurrent.ScheduledThreadPoolExecutorPlus
> private final java.util.concurrent.BlockingQueue 
> 

[jira] [Updated] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19446:

Since Version:   (was: 4.0)

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: NA
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
> dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'
> def copyfile(src, dst, *, follow_symlinks=True):
> """Copy data from src to dst in the most efficient way possible.
> 
> If follow_symlinks is not set and src is a symbolic link, a new
> symlink will be created instead of copying the file it points to.
> 
> """
> sys.audit("shutil.copyfile", src, dst)
> 
> if _samefile(src, dst):
> raise SameFileError("{!r} and {!r} are the same file".format(src, 
> dst))
> 
> file_size = 0
> for i, fn in enumerate([src, dst]):
> try:
> st = _stat(fn)
> except OSError:
> # File most likely does not exist
> pass
> else:
> # XXX What about other special files? (sockets, devices...)
> if 

[jira] [Updated] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19446:

  Since Version: 4.0
Source Control Link: 
https://github.com/apache/cassandra-dtest/commit/e1846961b652979517d5a89e7f1731cacfb7a5c6
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: NA
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
> dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'
> def copyfile(src, dst, *, follow_symlinks=True):
> """Copy data from src to dst in the most efficient way possible.
> 
> If follow_symlinks is not set and src is a symbolic link, a new
> symlink will be created instead of copying the file it points to.
> 
> """
> sys.audit("shutil.copyfile", src, dst)
> 
> if _samefile(src, dst):
> raise SameFileError("{!r} and {!r} are the same file".format(src, 
> dst))
> 
> file_size = 0
> for i, fn in enumerate([src, dst]):
> try:
> st = _stat(fn)
> except OSError:
> 

[jira] [Commented] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821345#comment-17821345
 ] 

Ekaterina Dimitrova commented on CASSANDRA-19446:
-

Committed, thanks

https://github.com/apache/cassandra-dtest/commit/e1846961b652979517d5a89e7f1731cacfb7a5c6

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: NA
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
> dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'
> def copyfile(src, dst, *, follow_symlinks=True):
> """Copy data from src to dst in the most efficient way possible.
> 
> If follow_symlinks is not set and src is a symbolic link, a new
> symlink will be created instead of copying the file it points to.
> 
> """
> sys.audit("shutil.copyfile", src, dst)
> 
> if _samefile(src, dst):
> raise SameFileError("{!r} and {!r} are the same file".format(src, 
> dst))
> 
> file_size = 0
> for i, fn in enumerate([src, dst]):
> try:
> st = _stat(fn)
> except OSError:
> # File most likely does not exist
> pass
> 

(cassandra-dtest) branch trunk updated: TypeError in logging fixed

2024-02-27 Thread edimitrova
This is an automated email from the ASF dual-hosted git repository.

edimitrova pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git


The following commit(s) were added to refs/heads/trunk by this push:
 new e1846961 TypeError in logging fixed
e1846961 is described below

commit e1846961b652979517d5a89e7f1731cacfb7a5c6
Author: Ekaterina Dimitrova 
AuthorDate: Tue Feb 27 10:14:36 2024 -0500

TypeError in logging fixed

patch by Ekaterina Dimitrova; reviewed by Brandon Williams for 
CASSANDRA-19446
---
 conftest.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/conftest.py b/conftest.py
index f324b3d5..be4f02c1 100644
--- a/conftest.py
+++ b/conftest.py
@@ -370,7 +370,7 @@ def fixture_dtest_setup(request,
 if failed or not dtest_config.delete_logs:
 copy_logs(request, dtest_setup.cluster)
 except Exception as e:
-logger.error("Error saving log:", str(e))
+logger.error("Error saving log: %s", str(e))
 finally:
 dtest_setup.cleanup_cluster(request, failed)
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19446:

Reviewers: Brandon Williams  (was: Brandon Williams, Ekaterina Dimitrova)

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: NA
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
> dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'
> def copyfile(src, dst, *, follow_symlinks=True):
> """Copy data from src to dst in the most efficient way possible.
> 
> If follow_symlinks is not set and src is a symbolic link, a new
> symlink will be created instead of copying the file it points to.
> 
> """
> sys.audit("shutil.copyfile", src, dst)
> 
> if _samefile(src, dst):
> raise SameFileError("{!r} and {!r} are the same file".format(src, 
> dst))
> 
> file_size = 0
> for i, fn in enumerate([src, dst]):
> try:
> st = _stat(fn)
> except OSError:
> # File most likely does not exist
> pass
> else:
> # XXX What about other special files? (sockets, 

[jira] [Updated] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19446:

Status: Ready to Commit  (was: Review In Progress)

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 5.0.x
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
> dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'
> def copyfile(src, dst, *, follow_symlinks=True):
> """Copy data from src to dst in the most efficient way possible.
> 
> If follow_symlinks is not set and src is a symbolic link, a new
> symlink will be created instead of copying the file it points to.
> 
> """
> sys.audit("shutil.copyfile", src, dst)
> 
> if _samefile(src, dst):
> raise SameFileError("{!r} and {!r} are the same file".format(src, 
> dst))
> 
> file_size = 0
> for i, fn in enumerate([src, dst]):
> try:
> st = _stat(fn)
> except OSError:
> # File most likely does not exist
> pass
> else:
> # XXX What about other special files? (sockets, devices...)
>

[jira] [Updated] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19446:

Reviewers: Brandon Williams, Ekaterina Dimitrova
   Status: Review In Progress  (was: Patch Available)

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 5.0.x
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
> dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'
> def copyfile(src, dst, *, follow_symlinks=True):
> """Copy data from src to dst in the most efficient way possible.
> 
> If follow_symlinks is not set and src is a symbolic link, a new
> symlink will be created instead of copying the file it points to.
> 
> """
> sys.audit("shutil.copyfile", src, dst)
> 
> if _samefile(src, dst):
> raise SameFileError("{!r} and {!r} are the same file".format(src, 
> dst))
> 
> file_size = 0
> for i, fn in enumerate([src, dst]):
> try:
> st = _stat(fn)
> except OSError:
> # File most likely does not exist
> pass
> else:
> # XXX What 

[jira] [Updated] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19446:

Fix Version/s: NA
   (was: 5.0.x)

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: NA
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
> dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'
> def copyfile(src, dst, *, follow_symlinks=True):
> """Copy data from src to dst in the most efficient way possible.
> 
> If follow_symlinks is not set and src is a symbolic link, a new
> symlink will be created instead of copying the file it points to.
> 
> """
> sys.audit("shutil.copyfile", src, dst)
> 
> if _samefile(src, dst):
> raise SameFileError("{!r} and {!r} are the same file".format(src, 
> dst))
> 
> file_size = 0
> for i, fn in enumerate([src, dst]):
> try:
> st = _stat(fn)
> except OSError:
> # File most likely does not exist
> pass
> else:
> # XXX What about other special files? (sockets, devices...)
>

[jira] [Updated] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19446:

Test and Documentation Plan: 
I suspect the missing log file could be something with the environment, but it 
exhibited a tiny bug in the error logging which I fixed here: 
[https://github.com/ekaterinadimitrova2/cassandra-dtest/commit/2a8c384861f4b5015d1882000f7f157a9d75704c]

Running the test in a loop now here - 
[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19446-5.0]
The CI run finished fully green.
 Status: Patch Available  (was: In Progress)

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 5.0.x
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
> dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'
> def copyfile(src, dst, *, follow_symlinks=True):
> """Copy data from src to dst in the most efficient way possible.
> 
> If follow_symlinks is not set and src is a symbolic link, a new
> symlink will be created instead of copying the file it points to.
> 
> """
> sys.audit("shutil.copyfile", src, dst)
> 
>   

[jira] [Commented] (CASSANDRA-19445) Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"

2024-02-27 Thread Zbyszek Z (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821339#comment-17821339
 ] 

Zbyszek Z commented on CASSANDRA-19445:
---

as I cannot share full log publicly, I have filtered some paxos related entries 
end removed keyspace names. This logs represents what happen in cycle every 5 
minutes. I have also attached full log line so you can see this is massive. 
[^paxos-multiple.txt]

> Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"
> --
>
> Key: CASSANDRA-19445
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19445
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions
>Reporter: Zbyszek Z
>Priority: Normal
> Attachments: paxos-entry.txt, paxos-multiple.txt
>
>
> Hello,
> On our cluster logs are flooded with: 
> {code:java}
> INFO  [OptionalTasks:1] 2024-02-27 14:27:51,213 
> PaxosCleanupLocalCoordinator.java:185 - Completed 0 uncommitted paxos 
> instances for X on ranges 
> [(9210458530128018597,-9222146739399525061], 
> (-9222146739399525061,-9174246180597321488], 
> (-9174246180597321488,-9155837684527496840], 
> (-9155837684527496840,-9148981328078890812], 
> (-9148981328078890812,-9141853035919151700], 
> (-9141853035919151700,-9138872620588476741], {code}
> I cannot find anything in doc regarding this longline. Also this are huge log 
> payloads that heavy flood system.log. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19445) Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"

2024-02-27 Thread Zbyszek Z (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zbyszek Z updated CASSANDRA-19445:
--
Attachment: paxos-multiple.txt
paxos-entry.txt

> Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"
> --
>
> Key: CASSANDRA-19445
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19445
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions
>Reporter: Zbyszek Z
>Priority: Normal
> Attachments: paxos-entry.txt, paxos-multiple.txt
>
>
> Hello,
> On our cluster logs are flooded with: 
> {code:java}
> INFO  [OptionalTasks:1] 2024-02-27 14:27:51,213 
> PaxosCleanupLocalCoordinator.java:185 - Completed 0 uncommitted paxos 
> instances for X on ranges 
> [(9210458530128018597,-9222146739399525061], 
> (-9222146739399525061,-9174246180597321488], 
> (-9174246180597321488,-9155837684527496840], 
> (-9155837684527496840,-9148981328078890812], 
> (-9148981328078890812,-9141853035919151700], 
> (-9141853035919151700,-9138872620588476741], {code}
> I cannot find anything in doc regarding this longline. Also this are huge log 
> payloads that heavy flood system.log. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821326#comment-17821326
 ] 

Brandon Williams commented on CASSANDRA-19446:
--

+1

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 5.0.x
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
> dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'
> def copyfile(src, dst, *, follow_symlinks=True):
> """Copy data from src to dst in the most efficient way possible.
> 
> If follow_symlinks is not set and src is a symbolic link, a new
> symlink will be created instead of copying the file it points to.
> 
> """
> sys.audit("shutil.copyfile", src, dst)
> 
> if _samefile(src, dst):
> raise SameFileError("{!r} and {!r} are the same file".format(src, 
> dst))
> 
> file_size = 0
> for i, fn in enumerate([src, dst]):
> try:
> st = _stat(fn)
> except OSError:
> # File most likely does not exist
> pass
> else:
> # XXX What about other special files? (sockets, devices...)
> if 

[jira] [Updated] (CASSANDRA-19427) Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries with multiple coordinator-local partitions

2024-02-27 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-19427:
--
Reviewers: Stefan Miklosovic, Stefan Miklosovic  (was: Stefan Miklosovic)
   Stefan Miklosovic, Stefan Miklosovic  (was: Stefan Miklosovic)
   Status: Review In Progress  (was: Patch Available)

> Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries 
> with multiple coordinator-local partitions
> 
>
> Key: CASSANDRA-19427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Legacy/Local Write-Read Paths
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> On one of our clusters, we noticed rare but periodic 
> ArrayIndexOutOfBoundsExceptions:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-3,5,main]"
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException"{code}
>  
>  
> The error was in a Runnable, so the stacktrace didn't directly indicate where 
> the error was coming from. We enabled JFR to log the underlying exception 
> that was thrown:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-2,5,main]" 
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds 
> for length 0
> at java.base/java.util.ArrayList.add(ArrayList.java:487)
> at java.base/java.util.ArrayList.add(ArrayList.java:499)
> at org.apache.cassandra.service.ClientWarn$State.add(ClientWarn.java:84)
> at 
> org.apache.cassandra.service.ClientWarn$State.access$000(ClientWarn.java:77)
> at org.apache.cassandra.service.ClientWarn.warn(ClientWarn.java:51)
> at 
> org.apache.cassandra.db.ReadCommand$1MetricRecording.onClose(ReadCommand.java:596)
> at 
> org.apache.cassandra.db.transform.BasePartitions.runOnClose(BasePartitions.java:70)
> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95)
> at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2260)
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2575)
> ... 6 more"{code}
>  
>  
> An AIOBE on ArrayList.add(E) should only be possible when multiple threads 
> attempt to call the method at the same time.
>  
> This was seen while executing a SELECT WHERE IN query with multiple partition 
> keys. This exception could happen when multiple local reads are dispatched by 
> the coordinator in 
> org.apache.cassandra.service.reads.AbstractReadExecutor#makeRequests. In this 
> case, multiple local reads exceed the tombstone warning threshold, so 
> multiple tombstone warnings are added to the same ClientWarn.State reference. 
>  Currently, org.apache.cassandra.service.ClientWarn.State#warnings is an 
> ArrayList, which isn't safe for concurrent modification, causing the AIOBE to 
> be thrown.
>  
> I have a patch available for this, and I'm preparing it now. The patch is 
> simple - it just changes 
> 

[jira] [Commented] (CASSANDRA-14572) Expose all table metrics in virtual table

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821317#comment-17821317
 ] 

Stefan Miklosovic commented on CASSANDRA-14572:
---

I like non-processor approach more (PR 3137). I think that the building logic 
is complicated enough already and putting more logic on top of that and 
generating some classes ... nah. Also, how often do we need to update the code 
around this, if ever, that it justifies the generation? 

I was playing with -walker branch and I was trying to make e.g. 
CounterMetricRow implement RowWalker so the row would visit 
itself with a given visitor but I just hit a dead end on some typing and I gave 
up but it is just a matter of trying it harder.

That way, we could logically couple Row with its Walker and we could drop whole 
walker package completely. I think it is better if all logic around some metric 
is centralized to one class rather than having it at multiple places.

my walker branch with a lot of other polishing etc is here 
https://github.com/smiklosovic/cassandra/commits/CASSANDRA-14572-walker/

> Expose all table metrics in virtual table
> -
>
> Key: CASSANDRA-14572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14572
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Legacy/Observability, Observability/Metrics
>Reporter: Chris Lohfink
>Assignee: Maxim Muzafarov
>Priority: Low
>  Labels: virtual-tables
> Fix For: 5.x
>
> Attachments: flight_recording_1270017199_13.jfr, keyspayces_group 
> responses times.png, keyspayces_group summary.png, select keyspaces_group by 
> string prefix.png, select keyspaces_group compare with wo.png, select 
> keyspaces_group without value.png, systemv_views.metrics_dropped_message.png, 
> thread_pools benchmark.png
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> While we want a number of virtual tables to display data in a way thats great 
> and intuitive like in nodetool. There is also much for being able to expose 
> the metrics we have for tooling via CQL instead of JMX. This is more for the 
> tooling and adhoc advanced users who know exactly what they are looking for.
> *Schema:*
> Initial idea is to expose data via {{((keyspace, table), metric)}} with a 
> column for each metric value. Could also use a Map or UDT instead of the 
> column based that can be a bit more specific to each metric type. To that end 
> there can be a {{metric_type}} column and then a UDT for each metric type 
> filled in, or a single value with more of a Map style. I am 
> purposing the column type though as with {{ALLOW FILTERING}} it does allow 
> more extensive query capabilities.
> *Implementations:*
> * Use reflection to grab all the metrics from TableMetrics (see: 
> CASSANDRA-7622 impl). This is easiest and least abrasive towards new metric 
> implementors... but its reflection and a kinda a bad idea.
> * Add a hook in TableMetrics to register with this virtual table when 
> registering
> * Pull from the CassandraMetrics registery (either reporter or iterate 
> through metrics query on read of virtual table)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19427) Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries with multiple coordinator-local partitions

2024-02-27 Thread Abe Ratnofsky (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821316#comment-17821316
 ] 

Abe Ratnofsky commented on CASSANDRA-19427:
---

Added comment, updated patch on all branches:

3.11: [https://github.com/apache/cassandra/pull/3142]

4.0: [https://github.com/apache/cassandra/pull/3143]

4.1: [https://github.com/apache/cassandra/pull/3144]

5.0: [https://github.com/apache/cassandra/pull/3145]

trunk: [https://github.com/apache/cassandra/pull/3129]

 

All the 3.11-5.0 patches are the same, then trunk is different since the 
introduction of CASSANDRA-18330 / TCM.

> Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries 
> with multiple coordinator-local partitions
> 
>
> Key: CASSANDRA-19427
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19427
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Legacy/Local Write-Read Paths
>Reporter: Abe Ratnofsky
>Assignee: Abe Ratnofsky
>Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> On one of our clusters, we noticed rare but periodic 
> ArrayIndexOutOfBoundsExceptions:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-3,5,main]"
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException"{code}
>  
>  
> The error was in a Runnable, so the stacktrace didn't directly indicate where 
> the error was coming from. We enabled JFR to log the underlying exception 
> that was thrown:
>  
> {code:java}
> message="Uncaught exception on thread Thread[ReadStage-2,5,main]" 
> exception="java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds 
> for length 0
> at java.base/java.util.ArrayList.add(ArrayList.java:487)
> at java.base/java.util.ArrayList.add(ArrayList.java:499)
> at org.apache.cassandra.service.ClientWarn$State.add(ClientWarn.java:84)
> at 
> org.apache.cassandra.service.ClientWarn$State.access$000(ClientWarn.java:77)
> at org.apache.cassandra.service.ClientWarn.warn(ClientWarn.java:51)
> at 
> org.apache.cassandra.db.ReadCommand$1MetricRecording.onClose(ReadCommand.java:596)
> at 
> org.apache.cassandra.db.transform.BasePartitions.runOnClose(BasePartitions.java:70)
> at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95)
> at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2260)
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2575)
> ... 6 more"{code}
>  
>  
> An AIOBE on ArrayList.add(E) should only be possible when multiple threads 
> attempt to call the method at the same time.
>  
> This was seen while executing a SELECT WHERE IN query with multiple partition 
> keys. This exception could happen when multiple local reads are dispatched by 
> the coordinator in 
> org.apache.cassandra.service.reads.AbstractReadExecutor#makeRequests. In this 
> case, multiple local reads exceed the tombstone warning threshold, so 
> multiple tombstone warnings are added to the same ClientWarn.State reference. 
>  Currently, 

[jira] [Comment Edited] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821293#comment-17821293
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-19446 at 2/27/24 3:50 PM:
--

I suspect the missing log file could be something with the environment, but it 
exhibited a tiny bug in the error logging which I fixed here: 
[https://github.com/ekaterinadimitrova2/cassandra-dtest/commit/2a8c384861f4b5015d1882000f7f157a9d75704c]

Running the test in a loop now here - 
[https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19446-5.0]
The CI run finished fully green.
So I believe the dtest fix is the only one we need here. [~brandonwilliams], do 
you mind to review it? 


was (Author: e.dimitrova):
I suspect the missing log file could be something with the environment, but it 
exhibited a tiny bug in the error logging which I fixed here: 
[https://github.com/ekaterinadimitrova2/cassandra-dtest/commit/2a8c384861f4b5015d1882000f7f157a9d75704c]

Running the test in a loop now here - 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19446-5.0

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 5.0.x
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> 

[jira] [Updated] (CASSANDRA-14476) ShortType and ByteType are incorrectly considered variable-length types

2024-02-27 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-14476:
--
Fix Version/s: 5.0.x
   5.1

> ShortType and ByteType are incorrectly considered variable-length types
> ---
>
> Key: CASSANDRA-14476
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14476
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Vladimir Krivopalov
>Assignee: Jacek Lewandowski
>Priority: Low
>  Labels: lhf
> Fix For: 5.0.x, 5.1
>
>
> The AbstractType class has a method valueLengthIfFixed() that returns -1 for 
> data types with a variable length and a positive value for types with a fixed 
> length. This is primarily used for efficient serialization and 
> deserialization. 
>  
> It turns out that there is an inconsistency in types ShortType and ByteType 
> as those are in fact fixed-length types (2 bytes and 1 byte, respectively) 
> but they don't have the valueLengthIfFixed() method overloaded and it returns 
> -1 as if they were of variable length.
>  
> It would be good to fix that at some appropriate point, for example, when 
> introducing a new version of SSTables format, to keep the meaning of the 
> function consistent across data types. Saving some bytes in serialized format 
> is a minor but pleasant bonus.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14476) ShortType and ByteType are incorrectly considered variable-length types

2024-02-27 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821308#comment-17821308
 ] 

Jacek Lewandowski commented on CASSANDRA-14476:
---

In 5.0 the problem affects more types: {{ByteType}}, {{ShortType}}, 
{{SimpleDateType}}, {{TimeType}}, {{TimestampType}}. I'm going to fix it and 
move the original method checking for whether the type serialization is 
variable or fixed length directly to {{TypeSerializer}}. I'll also provide some 
upgrade tests to make sure the old sstables can be read without problems. I 
don't think we need to bump SSTable version though because it does not change 
anything with serialization. It may certainly break some implicit casting in 
CQL though.


> ShortType and ByteType are incorrectly considered variable-length types
> ---
>
> Key: CASSANDRA-14476
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14476
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Vladimir Krivopalov
>Assignee: Jacek Lewandowski
>Priority: Low
>  Labels: lhf
>
> The AbstractType class has a method valueLengthIfFixed() that returns -1 for 
> data types with a variable length and a positive value for types with a fixed 
> length. This is primarily used for efficient serialization and 
> deserialization. 
>  
> It turns out that there is an inconsistency in types ShortType and ByteType 
> as those are in fact fixed-length types (2 bytes and 1 byte, respectively) 
> but they don't have the valueLengthIfFixed() method overloaded and it returns 
> -1 as if they were of variable length.
>  
> It would be good to fix that at some appropriate point, for example, when 
> introducing a new version of SSTables format, to keep the meaning of the 
> function consistent across data types. Saving some bytes in serialized format 
> is a minor but pleasant bonus.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14476) ShortType and ByteType are incorrectly considered variable-length types

2024-02-27 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski reassigned CASSANDRA-14476:
-

Assignee: Jacek Lewandowski  (was: Jearvon Dharrie)

> ShortType and ByteType are incorrectly considered variable-length types
> ---
>
> Key: CASSANDRA-14476
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14476
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Vladimir Krivopalov
>Assignee: Jacek Lewandowski
>Priority: Low
>  Labels: lhf
>
> The AbstractType class has a method valueLengthIfFixed() that returns -1 for 
> data types with a variable length and a positive value for types with a fixed 
> length. This is primarily used for efficient serialization and 
> deserialization. 
>  
> It turns out that there is an inconsistency in types ShortType and ByteType 
> as those are in fact fixed-length types (2 bytes and 1 byte, respectively) 
> but they don't have the valueLengthIfFixed() method overloaded and it returns 
> -1 as if they were of variable length.
>  
> It would be good to fix that at some appropriate point, for example, when 
> introducing a new version of SSTables format, to keep the meaning of the 
> function consistent across data types. Saving some bytes in serialized format 
> is a minor but pleasant bonus.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cassandra-14572-walker deleted (was 03a90b40a2)

2024-02-27 Thread mmuzaf
This is an automated email from the ASF dual-hosted git repository.

mmuzaf pushed a change to branch cassandra-14572-walker
in repository https://gitbox.apache.org/repos/asf/cassandra.git


 was 03a90b40a2 CASSANDRA-14572 Expose all table metrics in virtual tables

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821293#comment-17821293
 ] 

Ekaterina Dimitrova commented on CASSANDRA-19446:
-

I suspect the missing log file could be something with the environment, but it 
exhibited a tiny bug in the error logging which I fixed here: 
[https://github.com/ekaterinadimitrova2/cassandra-dtest/commit/2a8c384861f4b5015d1882000f7f157a9d75704c]

Running the test in a loop now here - 
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra?branch=19446-5.0

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 5.0.x
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
> dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'
> def copyfile(src, dst, *, follow_symlinks=True):
> """Copy data from src to dst in the most efficient way possible.
> 
> If follow_symlinks is not set and src is a symbolic link, a new
> symlink will be created instead of copying the file it points to.
> 
> """
> sys.audit("shutil.copyfile", src, dst)
> 
> if _samefile(src, dst):
> raise SameFileError("{!r} and {!r} are the same 

[jira] [Updated] (CASSANDRA-19445) Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"

2024-02-27 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19445:
-
 Bug Category: Parent values: Degradation(12984)Level 1 values: Resource 
Management(12995)
   Complexity: Normal
  Component/s: Feature/Lightweight Transactions
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

[~kolargol] can you attach a full log please?

> Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"
> --
>
> Key: CASSANDRA-19445
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19445
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions
>Reporter: Zbyszek Z
>Priority: Normal
>
> Hello,
> On our cluster logs are flooded with: 
> {code:java}
> INFO  [OptionalTasks:1] 2024-02-27 14:27:51,213 
> PaxosCleanupLocalCoordinator.java:185 - Completed 0 uncommitted paxos 
> instances for X on ranges 
> [(9210458530128018597,-9222146739399525061], 
> (-9222146739399525061,-9174246180597321488], 
> (-9174246180597321488,-9155837684527496840], 
> (-9155837684527496840,-9148981328078890812], 
> (-9148981328078890812,-9141853035919151700], 
> (-9141853035919151700,-9138872620588476741], {code}
> I cannot find anything in doc regarding this longline. Also this are huge log 
> payloads that heavy flood system.log. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19446:

Severity: Low  (was: Normal)

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 5.0.x
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
> dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'
> def copyfile(src, dst, *, follow_symlinks=True):
> """Copy data from src to dst in the most efficient way possible.
> 
> If follow_symlinks is not set and src is a symbolic link, a new
> symlink will be created instead of copying the file it points to.
> 
> """
> sys.audit("shutil.copyfile", src, dst)
> 
> if _samefile(src, dst):
> raise SameFileError("{!r} and {!r} are the same file".format(src, 
> dst))
> 
> file_size = 0
> for i, fn in enumerate([src, dst]):
> try:
> st = _stat(fn)
> except OSError:
> # File most likely does not exist
> pass
> else:
> # XXX What about other special files? (sockets, devices...)
> if 

[jira] [Updated] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19446:

Complexity: Low Hanging Fruit  (was: Normal)

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 5.0.x
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
>  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>>
> dtest_config = 
> fixture_dtest_setup_overrides =  object at 0x7f27a43a6550>
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790>
> @pytest.fixture(scope='function', autouse=False)
> def fixture_dtest_setup(request,
> dtest_config,
> fixture_dtest_setup_overrides,
> fixture_logging_setup,
> fixture_dtest_cluster_name,
> fixture_dtest_create_cluster_func):
> if running_in_docker():
> cleanup_docker_environment_before_test_execution()
> 
> # do all of our setup operations to get the enviornment ready for the 
> actual test
> # to run (e.g. bring up a cluster with the necessary config, populate 
> variables, etc)
> initial_environment = copy.deepcopy(os.environ)
> dtest_setup = DTestSetup(dtest_config=dtest_config,
>  
> setup_overrides=fixture_dtest_setup_overrides,
>  cluster_name=fixture_dtest_cluster_name)
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)
> 
> if not dtest_config.disable_active_log_watching:
> dtest_setup.begin_active_log_watch()
> 
> # at this point we're done with our setup operations in this fixture
> # yield to allow the actual test to run
> yield dtest_setup
> 
> # phew! we're back after executing the test, now we need to do
> # all of our teardown and cleanup operations
> 
> reset_environment_vars(initial_environment)
> dtest_setup.jvm_args = []
> 
> for con in dtest_setup.connections:
> con.cluster.shutdown()
> dtest_setup.connections = []
> 
> failed = False
> try:
> if not dtest_setup.allow_log_errors:
> errors = check_logs_for_errors(dtest_setup)
> if len(errors) > 0:
> failed = True
> pytest.fail('Unexpected error found in node logs (see 
> stdout for full details). Errors: [{errors}]'
> .format(errors=str.join(", ", errors)), 
> pytrace=False)
> finally:
> try:
> # save the logs for inspection
> if failed or not dtest_config.delete_logs:
> >   copy_logs(request, dtest_setup.cluster)
> conftest.py:371: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> conftest.py:291: in copy_logs
> shutil.copyfile(file, os.path.join(logdir, target_name))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
> dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'
> def copyfile(src, dst, *, follow_symlinks=True):
> """Copy data from src to dst in the most efficient way possible.
> 
> If follow_symlinks is not set and src is a symbolic link, a new
> symlink will be created instead of copying the file it points to.
> 
> """
> sys.audit("shutil.copyfile", src, dst)
> 
> if _samefile(src, dst):
> raise SameFileError("{!r} and {!r} are the same file".format(src, 
> dst))
> 
> file_size = 0
> for i, fn in enumerate([src, dst]):
> try:
> st = _stat(fn)
> except OSError:
> # File most likely does not exist
> pass
> else:
> # XXX What about other special files? (sockets, devices...)
>  

[jira] [Updated] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19446:

Description: 
Seen here:
[https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]

 
{code:java}
Error Message

failed on teardown with "TypeError: not all arguments converted during string 
formatting"
Stacktrace

request = >
dtest_config = 
fixture_dtest_setup_overrides = 
fixture_logging_setup = None, fixture_dtest_cluster_name = 'test'
fixture_dtest_create_cluster_func = 

@pytest.fixture(scope='function', autouse=False)
def fixture_dtest_setup(request,
dtest_config,
fixture_dtest_setup_overrides,
fixture_logging_setup,
fixture_dtest_cluster_name,
fixture_dtest_create_cluster_func):
if running_in_docker():
cleanup_docker_environment_before_test_execution()

# do all of our setup operations to get the enviornment ready for the 
actual test
# to run (e.g. bring up a cluster with the necessary config, populate 
variables, etc)
initial_environment = copy.deepcopy(os.environ)
dtest_setup = DTestSetup(dtest_config=dtest_config,
 setup_overrides=fixture_dtest_setup_overrides,
 cluster_name=fixture_dtest_cluster_name)
dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func)

if not dtest_config.disable_active_log_watching:
dtest_setup.begin_active_log_watch()

# at this point we're done with our setup operations in this fixture
# yield to allow the actual test to run
yield dtest_setup

# phew! we're back after executing the test, now we need to do
# all of our teardown and cleanup operations

reset_environment_vars(initial_environment)
dtest_setup.jvm_args = []

for con in dtest_setup.connections:
con.cluster.shutdown()
dtest_setup.connections = []

failed = False
try:
if not dtest_setup.allow_log_errors:
errors = check_logs_for_errors(dtest_setup)
if len(errors) > 0:
failed = True
pytest.fail('Unexpected error found in node logs (see 
stdout for full details). Errors: [{errors}]'
.format(errors=str.join(", ", errors)), 
pytrace=False)
finally:
try:
# save the logs for inspection
if failed or not dtest_config.delete_logs:
>   copy_logs(request, dtest_setup.cluster)

conftest.py:371: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
conftest.py:291: in copy_logs
shutil.copyfile(file, os.path.join(logdir, target_name))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

src = 
'/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log'

def copyfile(src, dst, *, follow_symlinks=True):
"""Copy data from src to dst in the most efficient way possible.

If follow_symlinks is not set and src is a symbolic link, a new
symlink will be created instead of copying the file it points to.

"""
sys.audit("shutil.copyfile", src, dst)

if _samefile(src, dst):
raise SameFileError("{!r} and {!r} are the same file".format(src, 
dst))

file_size = 0
for i, fn in enumerate([src, dst]):
try:
st = _stat(fn)
except OSError:
# File most likely does not exist
pass
else:
# XXX What about other special files? (sockets, devices...)
if stat.S_ISFIFO(st.st_mode):
fn = fn.path if isinstance(fn, os.DirEntry) else fn
raise SpecialFileError("`%s` is a named pipe" % fn)
if _WINDOWS and i == 0:
file_size = st.st_size

if not follow_symlinks and _islink(src):
os.symlink(os.readlink(src), dst)
else:
>   with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
E   FileNotFoundError: [Errno 2] No such file or directory: 
'/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'

/usr/lib/python3.8/shutil.py:264: FileNotFoundError

During handling of the above exception, another exception occurred:

request = >
dtest_config = 
fixture_dtest_setup_overrides = 
fixture_logging_setup = None, 

[jira] [Created] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)
Ekaterina Dimitrova created CASSANDRA-19446:
---

 Summary: Test Failure: 
dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
 Key: CASSANDRA-19446
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
 Project: Cassandra
  Issue Type: Bug
Reporter: Ekaterina Dimitrova


Seen here:
[https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
h3.  
{code:java}
Error Message
failed on teardown with "TypeError: not all arguments converted during string 
formatting"

Stacktrace
request = > dtest_config =  fixture_dtest_setup_overrides = 
 
fixture_logging_setup = None, fixture_dtest_cluster_name = 'test' 
fixture_dtest_create_cluster_func =  @pytest.fixture(scope='function', autouse=False) def 
fixture_dtest_setup(request, dtest_config, fixture_dtest_setup_overrides, 
fixture_logging_setup, fixture_dtest_cluster_name, 
fixture_dtest_create_cluster_func): if running_in_docker(): 
cleanup_docker_environment_before_test_execution() # do all of our setup 
operations to get the enviornment ready for the actual test # to run (e.g. 
bring up a cluster with the necessary config, populate variables, etc) 
initial_environment = copy.deepcopy(os.environ) dtest_setup = 
DTestSetup(dtest_config=dtest_config, 
setup_overrides=fixture_dtest_setup_overrides, 
cluster_name=fixture_dtest_cluster_name) 
dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func) if not 
dtest_config.disable_active_log_watching: dtest_setup.begin_active_log_watch() 
# at this point we're done with our setup operations in this fixture # yield to 
allow the actual test to run yield dtest_setup # phew! we're back after 
executing the test, now we need to do # all of our teardown and cleanup 
operations reset_environment_vars(initial_environment) dtest_setup.jvm_args = 
[] for con in dtest_setup.connections: con.cluster.shutdown() 
dtest_setup.connections = [] failed = False try: if not 
dtest_setup.allow_log_errors: errors = check_logs_for_errors(dtest_setup) if 
len(errors) > 0: failed = True pytest.fail('Unexpected error found in node logs 
(see stdout for full details). Errors: [{errors}]' .format(errors=str.join(", 
", errors)), pytrace=False) finally: try: # save the logs for inspection if 
failed or not dtest_config.delete_logs: > copy_logs(request, 
dtest_setup.cluster) conftest.py:371: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ conftest.py:291: in copy_logs 
shutil.copyfile(file, os.path.join(logdir, target_name)) _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ src = 
'/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
 dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log' def 
copyfile(src, dst, *, follow_symlinks=True): """Copy data from src to dst in 
the most efficient way possible. If follow_symlinks is not set and src is a 
symbolic link, a new symlink will be created instead of copying the file it 
points to. """ sys.audit("shutil.copyfile", src, dst) if _samefile(src, dst): 
raise SameFileError("{!r} and {!r} are the same file".format(src, dst)) 
file_size = 0 for i, fn in enumerate([src, dst]): try: st = _stat(fn) except 
OSError: # File most likely does not exist pass else: # XXX What about other 
special files? (sockets, devices...) if stat.S_ISFIFO(st.st_mode): fn = fn.path 
if isinstance(fn, os.DirEntry) else fn raise SpecialFileError("`%s` is a named 
pipe" % fn) if _WINDOWS and i == 0: file_size = st.st_size if not 
follow_symlinks and _islink(src): os.symlink(os.readlink(src), dst) else: > 
with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst: E FileNotFoundError: 
[Errno 2] No such file or directory: 
'/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
 /usr/lib/python3.8/shutil.py:264: FileNotFoundError During handling of the 
above exception, another exception occurred: request = > 
dtest_config =  
fixture_dtest_setup_overrides =  fixture_logging_setup = None, 
fixture_dtest_cluster_name = 'test' fixture_dtest_create_cluster_func = 
 
@pytest.fixture(scope='function', autouse=False) def 
fixture_dtest_setup(request, dtest_config, fixture_dtest_setup_overrides, 
fixture_logging_setup, fixture_dtest_cluster_name, 
fixture_dtest_create_cluster_func): if running_in_docker(): 
cleanup_docker_environment_before_test_execution() # do all of our setup 
operations to get the enviornment ready for the actual test # to run (e.g. 
bring up a cluster with the necessary config, populate variables, etc) 
initial_environment = copy.deepcopy(os.environ) dtest_setup = 
DTestSetup(dtest_config=dtest_config, 
setup_overrides=fixture_dtest_setup_overrides, 
cluster_name=fixture_dtest_cluster_name) 

[jira] [Updated] (CASSANDRA-19446) Test Failure: dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19446:

 Bug Category: Parent values: Correctness(12982)
   Complexity: Normal
  Component/s: CI
Discovered By: User Report
Fix Version/s: 5.0.x
 Severity: Normal
 Assignee: Ekaterina Dimitrova
   Status: Open  (was: Triage Needed)

> Test Failure: 
> dtest-offheap.snapshot_test.TestSnapshot.test_basic_snapshot_and_restore
> --
>
> Key: CASSANDRA-19446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19446
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 5.0.x
>
>
> Seen here:
> [https://ci-cassandra.apache.org/job/Cassandra-5.0/173/testReport/junit/dtest-offheap.snapshot_test/TestSnapshot/test_basic_snapshot_and_restore/]
> h3.  
> {code:java}
> Error Message
> failed on teardown with "TypeError: not all arguments converted during string 
> formatting"
> Stacktrace
> request =  test_basic_snapshot_and_restore>> dtest_config =  object at 0x7f27a8053520> fixture_dtest_setup_overrides = 
>  
> fixture_logging_setup = None, fixture_dtest_cluster_name = 'test' 
> fixture_dtest_create_cluster_func =  at 0x7f27a81a2790> @pytest.fixture(scope='function', autouse=False) def 
> fixture_dtest_setup(request, dtest_config, fixture_dtest_setup_overrides, 
> fixture_logging_setup, fixture_dtest_cluster_name, 
> fixture_dtest_create_cluster_func): if running_in_docker(): 
> cleanup_docker_environment_before_test_execution() # do all of our setup 
> operations to get the enviornment ready for the actual test # to run (e.g. 
> bring up a cluster with the necessary config, populate variables, etc) 
> initial_environment = copy.deepcopy(os.environ) dtest_setup = 
> DTestSetup(dtest_config=dtest_config, 
> setup_overrides=fixture_dtest_setup_overrides, 
> cluster_name=fixture_dtest_cluster_name) 
> dtest_setup.initialize_cluster(fixture_dtest_create_cluster_func) if not 
> dtest_config.disable_active_log_watching: 
> dtest_setup.begin_active_log_watch() # at this point we're done with our 
> setup operations in this fixture # yield to allow the actual test to run 
> yield dtest_setup # phew! we're back after executing the test, now we need to 
> do # all of our teardown and cleanup operations 
> reset_environment_vars(initial_environment) dtest_setup.jvm_args = [] for con 
> in dtest_setup.connections: con.cluster.shutdown() dtest_setup.connections = 
> [] failed = False try: if not dtest_setup.allow_log_errors: errors = 
> check_logs_for_errors(dtest_setup) if len(errors) > 0: failed = True 
> pytest.fail('Unexpected error found in node logs (see stdout for full 
> details). Errors: [{errors}]' .format(errors=str.join(", ", errors)), 
> pytrace=False) finally: try: # save the logs for inspection if failed or not 
> dtest_config.delete_logs: > copy_logs(request, dtest_setup.cluster) 
> conftest.py:371: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ conftest.py:291: in copy_logs shutil.copyfile(file, 
> os.path.join(logdir, target_name)) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ src = 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
>  dst = 'logs/1708958581606_test_basic_snapshot_and_restore/node1_gc.log' def 
> copyfile(src, dst, *, follow_symlinks=True): """Copy data from src to dst in 
> the most efficient way possible. If follow_symlinks is not set and src is a 
> symbolic link, a new symlink will be created instead of copying the file it 
> points to. """ sys.audit("shutil.copyfile", src, dst) if _samefile(src, dst): 
> raise SameFileError("{!r} and {!r} are the same file".format(src, dst)) 
> file_size = 0 for i, fn in enumerate([src, dst]): try: st = _stat(fn) except 
> OSError: # File most likely does not exist pass else: # XXX What about other 
> special files? (sockets, devices...) if stat.S_ISFIFO(st.st_mode): fn = 
> fn.path if isinstance(fn, os.DirEntry) else fn raise SpecialFileError("`%s` 
> is a named pipe" % fn) if _WINDOWS and i == 0: file_size = st.st_size if not 
> follow_symlinks and _islink(src): os.symlink(os.readlink(src), dst) else: > 
> with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst: E FileNotFoundError: 
> [Errno 2] No such file or directory: 
> '/home/cassandra/cassandra/build/run-python-dtest.PRhg7u/dtest-pqe8_k2h/test/node1/logs/gc.log'
>  /usr/lib/python3.8/shutil.py:264: FileNotFoundError During handling of the 
> above exception, another exception occurred: request =  'fixture_dtest_setup' for > 
> dtest_config =  

[jira] [Created] (CASSANDRA-19445) Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"

2024-02-27 Thread Zbyszek Z (Jira)
Zbyszek Z created CASSANDRA-19445:
-

 Summary: Cassandra 4.1.4 floods logs with "Completed 0 uncommitted 
paxos instances for"
 Key: CASSANDRA-19445
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19445
 Project: Cassandra
  Issue Type: Bug
Reporter: Zbyszek Z


Hello,

On our cluster logs are flooded with: 
{code:java}
INFO  [OptionalTasks:1] 2024-02-27 14:27:51,213 
PaxosCleanupLocalCoordinator.java:185 - Completed 0 uncommitted paxos instances 
for X on ranges [(9210458530128018597,-9222146739399525061], 
(-9222146739399525061,-9174246180597321488], 
(-9174246180597321488,-9155837684527496840], 
(-9155837684527496840,-9148981328078890812], 
(-9148981328078890812,-9141853035919151700], 
(-9141853035919151700,-9138872620588476741], {code}
I cannot find anything in doc regarding this longline. Also this are huge log 
payloads that heavy flood system.log. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18851) Test failure: junit.framework.TestSuite.org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialLocalCommit

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-18851:

Fix Version/s: 5.0.x

> Test failure: 
> junit.framework.TestSuite.org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialLocalCommit
> ---
>
> Key: CASSANDRA-18851
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18851
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Berenguer Blasi
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> See CASSANDRA-18707 Where this test is 
> [proven|https://issues.apache.org/jira/browse/CASSANDRA-18707?focusedCommentId=17761803=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17761803]
>  flaky
> failure in testLocalSerialLocalCommit:
> {noformat}
> junit.framework.AssertionFailedError: numWritten: 2 < 3
>   at 
> org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialCommit(CASMultiDCTest.java:111)
>   at 
> org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialLocalCommit(CASMultiDCTest.java:121)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18851) Test failure: junit.framework.TestSuite.org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialLocalCommit

2024-02-27 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821263#comment-17821263
 ] 

Ekaterina Dimitrova commented on CASSANDRA-18851:
-

Seen also on 5.0:
https://ci-cassandra.apache.org/job/Cassandra-5.0/174/testReport/junit/org.apache.cassandra.distributed.test/CASMultiDCTest/testLocalSerialLocalCommit__jdk17_x86_64/

> Test failure: 
> junit.framework.TestSuite.org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialLocalCommit
> ---
>
> Key: CASSANDRA-18851
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18851
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Berenguer Blasi
>Priority: Normal
> Fix For: 4.1.x, 5.x
>
>
> See CASSANDRA-18707 Where this test is 
> [proven|https://issues.apache.org/jira/browse/CASSANDRA-18707?focusedCommentId=17761803=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17761803]
>  flaky
> failure in testLocalSerialLocalCommit:
> {noformat}
> junit.framework.AssertionFailedError: numWritten: 2 < 3
>   at 
> org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialCommit(CASMultiDCTest.java:111)
>   at 
> org.apache.cassandra.distributed.test.CASMultiDCTest.testLocalSerialLocalCommit(CASMultiDCTest.java:121)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19409) Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.*

2024-02-27 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-19409:

Authors: Berenguer Blasi, Ekaterina Dimitrova  (was: Berenguer Blasi)

> Test Failure: dtest-upgrade.upgrade_tests.upgrade_through_versions_test.*
> -
>
> Key: CASSANDRA-19409
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19409
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0.13, 4.1.5, 5.0-rc, 5.x
>
>
> Failing in Jenkins:
>  * 
> [dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD/test_parallel_upgrade_with_internode_ssl/]
>  * 
> [dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade-novnode-large.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/test_parallel_upgrade_with_internode_ssl/]
>  * 
> [dtest-upgrade-novnode.upgrade_tests.upgrade_through_versions_test.TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD.test_parallel_upgrade|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade-novnode.upgrade_tests.upgrade_through_versions_test/TestProtoV3Upgrade_AllVersions_EndsAt_Trunk_HEAD/test_parallel_upgrade/]
>  * 
> [dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/test_parallel_upgrade/]
>  * 
> [dtest-upgrade.upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD.test_parallel_upgrade_with_internode_ssl|https://ci-cassandra.apache.org/job/Cassandra-5.0/170/testReport/junit/dtest-upgrade.upgrade_tests.upgrade_through_versions_test/TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD/test_parallel_upgrade_with_internode_ssl/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821245#comment-17821245
 ] 

Stefan Miklosovic edited comment on CASSANDRA-19429 at 2/27/24 1:44 PM:


[~dipiets] I think the mistake you do is that you do "nodetool compact" after 
you write the data and then you run mixed workload against that.

nodetool compact will make just 1 SSTable instead of whatever number of them 
you have there.

I tested this locally too and, indeed, if you have just one table, that will be 
way more performance-friendly than having multiple of them, because ... there 
is just one table. So Cassandra does not need to look into any other - which is 
more performant by definition. 

If you do not run nodetool compact after writes, try to also do nodetool 
disableautocompaction, then run the tests. After it is finished, compact it all 
into one SSTable and run it again. You will see that it is slower and I do not 
think that looking into capacity etc has anything to do with it.

Also dont do mixed workload, do just reads.

I mean ... sure, we see less locking etc and we might add that change there, 
but in general I do not think that the effect it has is 2x  hardly.


was (Author: smiklosovic):
[~dipiets] I think the mistake you do is that you do "nodetool compact" after 
you write the data and then you run mixed workload against that.

nodetool compact will make just 1 SSTable instead of whatever number of them 
you have there.

I tested this locally too and, indeed, if you have just one table, that will be 
way more performance-friendly than having multiple of them, because ... there 
is just one table. So Cassandra does not need to look into any other - which is 
more performant by definition. 

If you do not run nodetool compact after writes, try to also do nodetool 
disableautocompaction, then run the tests. After it is finished, compact it all 
into one SSTable and run it again. You will see that it is slower and I do not 
think that looking into capacity etc has anything to do with it.

I mean ... sure, we see less locking etc and we might add that change there, 
but in general I do not think that the effect it has is 2x  hardly.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821245#comment-17821245
 ] 

Stefan Miklosovic edited comment on CASSANDRA-19429 at 2/27/24 1:43 PM:


[~dipiets] I think the mistake you do is that you do "nodetool compact" after 
you write the data and then you run mixed workload against that.

nodetool compact will make just 1 SSTable instead of whatever number of them 
you have there.

I tested this locally too and, indeed, if you have just one table, that will be 
way more performance-friendly than having multiple of them, because ... there 
is just one table. So Cassandra does not need to look into any other - which is 
more performant by definition. 

If you do not run nodetool compact after writes, try to also do nodetool 
disableautocompaction, then run the tests. After it is finished, compact it all 
into one SSTable and run it again. You will see that it is slower and I do not 
think that looking into capacity etc has anything to do with it.

I mean ... sure, we see less locking etc and we might add that change there, 
but in general I do not think that the effect it has is 2x  hardly.


was (Author: smiklosovic):
[~dipiets] I think the mistake you do is that you do "nodetool compact" after 
you write the data and they you run mixed workload against that.

nodetool compact will make just 1 SSTable instead of whatever number of them 
you have there.

I tested this locally too and, indeed, if you have just one table, that will be 
way more performance-friendly than having multiple of them, because ... there 
is just one table. So Cassandra does not need to look into any other - which is 
more performant by definition. 

If you do not run nodetool compact after writes, try to also do nodetool 
disableautocompaction, then run the tests. After it is finished, compact it all 
into one SSTable and run it again. You will see that it is slower and I do not 
think that looking into capacity etc has anything to do with it.

I mean ... sure, we see less locking etc and we might add that change there, 
but in general I do not think that the effect it has is 2x  hardly.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821245#comment-17821245
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

[~dipiets] I think the mistake you do is that you do "nodetool compact" after 
you write the data and they you run mixed workload against that.

nodetool compact will make just 1 SSTable instead of whatever number of them 
you have there.

I tested this locally too and, indeed, if you have just one table, that will be 
way more performance-friendly than having multiple of them, because ... there 
is just one table. So Cassandra does not need to look into any other - which is 
more performant by definition. 

If you do not run nodetool compact after writes, try to also do nodetool 
disableautocompaction, then run the tests. After it is finished, compact it all 
into one SSTable and run it again. You will see that it is slower and I do not 
think that looking into capacity etc has anything to do with it.

I mean ... sure, we see less locking etc and we might add that change there, 
but in general I do not think that the effect it has is 2x  hardly.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821211#comment-17821211
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

Well ... third time is a charm.

I think I suffer from what is called a confirmation bias. I turned off 
autocompaction (nodetool disableautocompaction) and run the test just for reads 
(0 writes, 100 reads) for longer time and the difference between before and 
after is really negligible, at least in my case. I am wondering where 
[~dipiets] can see 3x here.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821196#comment-17821196
 ] 

Stefan Miklosovic edited comment on CASSANDRA-19429 at 2/27/24 12:02 PM:
-

I dont see the speedup, I was just testing same commands on my local PC, before 
/ after numbers are basically more or less same, definitely not 2x or 3x. 

I used just 100 threads.

But yeah ... I am not running this on r8g.24xlarge or r7i.24xlarge .


was (Author: smiklosovic):
I dont see the speedup, I was just testing same commands on my local PC, before 
/ after numbers are basically more or less same, definitely not 2x or 3x. 

I used just 100 threads.

But yeah ... I am not running this on r8g.24xlarge or r7i.24xlarge but I would 
expect that I would also see some speedup already, no? Even on some Ryzen 7 and 
running it in Docker.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821200#comment-17821200
 ] 

Stefan Miklosovic edited comment on CASSANDRA-19429 at 2/27/24 12:02 PM:
-

Well, I do see some speedup, just not 2x / 3x, I think this change is amplified 
more powerful machine a node runs on

without this patch

{noformat}
Op rate   :   89,392 op/s  [READ: 80,439 op/s, WRITE: 8,953 
op/s]
Partition rate:   89,392 pk/s  [READ: 80,439 pk/s, WRITE: 8,953 
pk/s]
Row rate  :   89,392 row/s [READ: 80,439 row/s, WRITE: 8,953 
row/s]
Latency mean  :2.2 ms [READ: 2.2 ms, WRITE: 2.2 ms]
Latency median:1.6 ms [READ: 1.6 ms, WRITE: 1.6 ms]
Latency 95th percentile   :5.8 ms [READ: 5.8 ms, WRITE: 5.9 ms]
Latency 99th percentile   :   10.6 ms [READ: 10.6 ms, WRITE: 10.7 ms]
Latency 99.9th percentile :   23.5 ms [READ: 23.4 ms, WRITE: 23.8 ms]
Latency max   :  180.5 ms [READ: 180.5 ms, WRITE: 122.7 ms]
Total partitions  :  5,408,128 [READ: 4,866,473, WRITE: 541,655]
Total errors  :  0 [READ: 0, WRITE: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:01:00
{noformat}

with this patch, two independent runs:

{noformat}
Op rate   :  119,782 op/s  [READ: 107,849 op/s, WRITE: 11,933 
op/s]
Partition rate:  119,782 pk/s  [READ: 107,849 pk/s, WRITE: 11,933 
pk/s]
Row rate  :  119,782 row/s [READ: 107,849 row/s, WRITE: 11,933 
row/s]
Latency mean  :1.7 ms [READ: 1.6 ms, WRITE: 1.7 ms]
Latency median:1.3 ms [READ: 1.3 ms, WRITE: 1.4 ms]
Latency 95th percentile   :3.8 ms [READ: 3.8 ms, WRITE: 4.0 ms]
Latency 99th percentile   :7.7 ms [READ: 7.7 ms, WRITE: 8.0 ms]
Latency 99.9th percentile :   13.7 ms [READ: 13.7 ms, WRITE: 14.1 ms]
Latency max   :  114.6 ms [READ: 61.5 ms, WRITE: 114.6 ms]
Total partitions  :  7,188,152 [READ: 6,472,051, WRITE: 716,101]
Total errors  :  0 [READ: 0, WRITE: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:01:00
{noformat}

{noformat}
Results:
Op rate   :  104,456 op/s  [READ: 94,016 op/s, WRITE: 10,440 
op/s]
Partition rate:  104,456 pk/s  [READ: 94,016 pk/s, WRITE: 10,440 
pk/s]
Row rate  :  104,456 row/s [READ: 94,016 row/s, WRITE: 10,440 
row/s]
Latency mean  :1.9 ms [READ: 1.9 ms, WRITE: 2.0 ms]
Latency median:1.5 ms [READ: 1.4 ms, WRITE: 1.5 ms]
Latency 95th percentile   :4.7 ms [READ: 4.6 ms, WRITE: 4.8 ms]
Latency 99th percentile   :8.6 ms [READ: 8.6 ms, WRITE: 8.8 ms]
Latency 99.9th percentile :   13.9 ms [READ: 13.8 ms, WRITE: 14.1 ms]
Latency max   :   85.4 ms [READ: 77.2 ms, WRITE: 85.4 ms]
Total partitions  :  6,268,822 [READ: 5,642,258, WRITE: 626,564]
Total errors  :  0 [READ: 0, WRITE: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:01:00
{noformat}

so the speedup is like 20% which is quite nice already.


was (Author: smiklosovic):
Well, I do see some speedup, just not 2x / 3x, I think this change is amplified 
more powerful machine a node runs on

without this patch

{noformat}
Op rate   :   89,392 op/s  [READ: 80,439 op/s, WRITE: 8,953 
op/s]
Partition rate:   89,392 pk/s  [READ: 80,439 pk/s, WRITE: 8,953 
pk/s]
Row rate  :   89,392 row/s [READ: 80,439 row/s, WRITE: 8,953 
row/s]
Latency mean  :2.2 ms [READ: 2.2 ms, WRITE: 2.2 ms]
Latency median:1.6 ms [READ: 1.6 ms, WRITE: 1.6 ms]
Latency 95th percentile   :5.8 ms [READ: 5.8 ms, WRITE: 5.9 ms]
Latency 99th percentile   :   10.6 ms [READ: 10.6 ms, WRITE: 10.7 ms]
Latency 99.9th percentile :   23.5 ms [READ: 23.4 ms, WRITE: 23.8 ms]
Latency max   :  180.5 ms [READ: 180.5 ms, WRITE: 122.7 ms]
Total partitions  :  5,408,128 [READ: 4,866,473, WRITE: 541,655]
Total errors  :  0 [READ: 0, WRITE: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:01:00
{noformat}

with this patch, two independent runs:

{noformat}
Op rate   :  119,782 

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821200#comment-17821200
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

Well, I do see some speedup, just not 2x / 3x, I think this change is amplified 
more powerful machine a node runs on

without this patch

{noformat}
Op rate   :   89,392 op/s  [READ: 80,439 op/s, WRITE: 8,953 
op/s]
Partition rate:   89,392 pk/s  [READ: 80,439 pk/s, WRITE: 8,953 
pk/s]
Row rate  :   89,392 row/s [READ: 80,439 row/s, WRITE: 8,953 
row/s]
Latency mean  :2.2 ms [READ: 2.2 ms, WRITE: 2.2 ms]
Latency median:1.6 ms [READ: 1.6 ms, WRITE: 1.6 ms]
Latency 95th percentile   :5.8 ms [READ: 5.8 ms, WRITE: 5.9 ms]
Latency 99th percentile   :   10.6 ms [READ: 10.6 ms, WRITE: 10.7 ms]
Latency 99.9th percentile :   23.5 ms [READ: 23.4 ms, WRITE: 23.8 ms]
Latency max   :  180.5 ms [READ: 180.5 ms, WRITE: 122.7 ms]
Total partitions  :  5,408,128 [READ: 4,866,473, WRITE: 541,655]
Total errors  :  0 [READ: 0, WRITE: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:01:00
{noformat}

with this patch, two independent runs:

{noformat}
Op rate   :  119,782 op/s  [READ: 107,849 op/s, WRITE: 11,933 
op/s]
Partition rate:  119,782 pk/s  [READ: 107,849 pk/s, WRITE: 11,933 
pk/s]
Row rate  :  119,782 row/s [READ: 107,849 row/s, WRITE: 11,933 
row/s]
Latency mean  :1.7 ms [READ: 1.6 ms, WRITE: 1.7 ms]
Latency median:1.3 ms [READ: 1.3 ms, WRITE: 1.4 ms]
Latency 95th percentile   :3.8 ms [READ: 3.8 ms, WRITE: 4.0 ms]
Latency 99th percentile   :7.7 ms [READ: 7.7 ms, WRITE: 8.0 ms]
Latency 99.9th percentile :   13.7 ms [READ: 13.7 ms, WRITE: 14.1 ms]
Latency max   :  114.6 ms [READ: 61.5 ms, WRITE: 114.6 ms]
Total partitions  :  7,188,152 [READ: 6,472,051, WRITE: 716,101]
Total errors  :  0 [READ: 0, WRITE: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:01:00
{noformat}

{noformat}
Results:
Op rate   :  104,456 op/s  [READ: 94,016 op/s, WRITE: 10,440 
op/s]
Partition rate:  104,456 pk/s  [READ: 94,016 pk/s, WRITE: 10,440 
pk/s]
Row rate  :  104,456 row/s [READ: 94,016 row/s, WRITE: 10,440 
row/s]
Latency mean  :1.9 ms [READ: 1.9 ms, WRITE: 2.0 ms]
Latency median:1.5 ms [READ: 1.4 ms, WRITE: 1.5 ms]
Latency 95th percentile   :4.7 ms [READ: 4.6 ms, WRITE: 4.8 ms]
Latency 99th percentile   :8.6 ms [READ: 8.6 ms, WRITE: 8.8 ms]
Latency 99.9th percentile :   13.9 ms [READ: 13.8 ms, WRITE: 14.1 ms]
Latency max   :   85.4 ms [READ: 77.2 ms, WRITE: 85.4 ms]
Total partitions  :  6,268,822 [READ: 5,642,258, WRITE: 626,564]
Total errors  :  0 [READ: 0, WRITE: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:01:00
{noformat}

so the speedup it like 20% which is quite nice already.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 

[jira] [Comment Edited] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821196#comment-17821196
 ] 

Stefan Miklosovic edited comment on CASSANDRA-19429 at 2/27/24 11:52 AM:
-

I dont see the speedup, I was just testing same commands on my local PC, before 
/ after numbers are basically more or less same, definitely not 2x or 3x. 

I used just 100 threads.

But yeah ... I am not running this on r8g.24xlarge or r7i.24xlarge but I would 
expect that I would also see some speedup already, no? Even on some Ryzen 7 and 
running it in Docker.


was (Author: smiklosovic):
I dont see the speedup, I was just testing same commands on my local PC, before 
/ after numbers are basically more or less same, definitely not 2x or 3x. 

I used just 100 threads.

But yeah ... I am not running this on r8g.24xlarge or r7i.24xlarge but I would 
expect that I would also seem some speedup already, no? Even on some Ryzen 7 
and running it in Docker.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821196#comment-17821196
 ] 

Stefan Miklosovic edited comment on CASSANDRA-19429 at 2/27/24 11:51 AM:
-

I dont see the speedup, I was just testing same commands on my local PC, before 
/ after numbers are basically more or less same, definitely not 2x or 3x. 

I used just 100 threads.

But yeah ... I am not running this on r8g.24xlarge or r7i.24xlarge but I would 
expect that I would also seem some speedup already, no? Even on some Ryzen 7 
and running it in Docker.


was (Author: smiklosovic):
I dont see the speedup, I was just testing same commands on my local PC, before 
/ after numbers are basically more or less same, definitely not 2x or 3x. 

I used just 100 threads.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821196#comment-17821196
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

I dont see the speedup, I was just testing same commands on my local PC, before 
/ after numbers are basically more or less same, definitely not 2x or 3x. 

I used just 100 threads.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19426) Fix Double Type issues in the Gossiper#maybeGossipToCMS

2024-02-27 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821195#comment-17821195
 ] 

Brandon Williams commented on CASSANDRA-19426:
--

Maybe now is the time to remove the probability, ala CASSANDRA-9206.

> Fix Double Type issues in the Gossiper#maybeGossipToCMS
> ---
>
> Key: CASSANDRA-19426
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19426
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Transactional Cluster Metadata
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _*issue-1:*_
> if liveEndpoints.size()=unreachableEndpoints.size()=0; probability will be 
> {*}_Infinity_{*}.
> randDbl <= probability will always be true, then sendGossip
> _*issue-2:*_ 
> comparing two double is safe by using *<* or {*}>{*}. However missing 
> accuracy will happen if we compare the equality of two double by 
> intuition({*}={*}). For example:
> {code:java}
> double probability = 0.1;
> double randDbl = 0.10001; // Slightly greater than probability
> if (randDbl <= probability)
> {
>     System.out.println("randDbl <= probability(always here)");
> }
> else
> {
>     System.out.println("randDbl > probability");
> }
> {code}
> A good example from: _*Gossiper#maybeGossipToUnreachableMember*_
> {code:java}
> if (randDbl < prob)
> {
> sendGossip(message, Sets.filter(unreachableEndpoints.keySet(),
>                                 ep -> 
> !isDeadState(getEndpointStateMap().get(ep;
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821178#comment-17821178
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

https://github.com/apache/cassandra/pull/3140

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19417) LIST SUPERUSERS cql command

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821173#comment-17821173
 ] 

Stefan Miklosovic edited comment on CASSANDRA-19417 at 2/27/24 10:58 AM:
-

I do not see any deprecation notes in `ListUsersStatement` in vanilla 
Cassandra, nor in the NEWS.txt. However, I see in cql_singlefile.adoc that it 
says that "This statement is equivalent to LIST ROLES". 

What we might do is to officially deprecate LIST USERS and implement this for 
LIST ROLES only. BTW on the code level, ListUsersStatement class extends 
ListRolesStatement so we might probably have this in "LIST USERS" virtually for 
free if there are not any other implementation nuances around that.


was (Author: smiklosovic):
I do not see any deprecation notes in `ListUsersStatement` in vanilla 
Cassandra, nor in the NEWS.txt. However, I see in cql_singlefile.adoc that it 
says that "This statement is equivalent to LIST ROLES". 

What we might do is to officially deprecate LIST USERS and implement this for 
LIST ROLES only. BTW on the code level, ListUsersStatement class extends 
ListRolesStatement so we might probably have this in "LIST ROLES" virtually for 
free if there are not any other implementation nuances around that.

> LIST SUPERUSERS cql command
> ---
>
> Key: CASSANDRA-19417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19417
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/cqlsh
>Reporter: Shailaja Koppu
>Assignee: Shailaja Koppu
>Priority: Normal
>  Labels: CQL
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Developing a new CQL command LIST SUPERUSERS to return list of roles with 
> superuser privilege. This includes roles who acquired superuser privilege in 
> the hierarchy. 
> Context: LIST ROLES cql command lists roles, their membership details and 
> displays super=true for immediate superusers. But there can be roles who 
> acquired superuser privilege due to a grant. LIST ROLES command won't display 
> super=true for such roles and the only way to recognize such roles is to look 
> for atleast one row with super=true in the output of LIST ROLES OF  name> command. While this works to check is a given role has superuser 
> privilege, there may be services (for example, Sidecar) working with C* and 
> may need to maintain list of roles with superuser privilege. There is no 
> existing command/tool to retrieve such roles details. Hence developing this 
> command which returns all roles having superuser privilege.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19417) LIST SUPERUSERS cql command

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821173#comment-17821173
 ] 

Stefan Miklosovic commented on CASSANDRA-19417:
---

I do not see any deprecation notes in `ListUsersStatement` in vanilla 
Cassandra, nor in the NEWS.txt. However, I see in cql_singlefile.adoc that it 
says that "This statement is equivalent to LIST ROLES". 

What we might do is to officially deprecate LIST USERS and implement this for 
LIST ROLES only. BTW on the code level, ListUsersStatement class extends 
ListRolesStatement so we might probably have this in "LIST ROLES" virtually for 
free if there are not any other implementation nuances around that.

> LIST SUPERUSERS cql command
> ---
>
> Key: CASSANDRA-19417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19417
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/cqlsh
>Reporter: Shailaja Koppu
>Assignee: Shailaja Koppu
>Priority: Normal
>  Labels: CQL
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Developing a new CQL command LIST SUPERUSERS to return list of roles with 
> superuser privilege. This includes roles who acquired superuser privilege in 
> the hierarchy. 
> Context: LIST ROLES cql command lists roles, their membership details and 
> displays super=true for immediate superusers. But there can be roles who 
> acquired superuser privilege due to a grant. LIST ROLES command won't display 
> super=true for such roles and the only way to recognize such roles is to look 
> for atleast one row with super=true in the output of LIST ROLES OF  name> command. While this works to check is a given role has superuser 
> privilege, there may be services (for example, Sidecar) working with C* and 
> may need to maintain list of roles with superuser privilege. There is no 
> existing command/tool to retrieve such roles details. Hence developing this 
> command which returns all roles having superuser privilege.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19426) Fix Double Type issues in the Gossiper#maybeGossipToCMS

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821167#comment-17821167
 ] 

Stefan Miklosovic commented on CASSANDRA-19426:
---

in maybeGossipToSeed, this 

{noformat}
double probability = seeds.size() / (double) (liveEndpoints.size() + 
unreachableEndpoints.size());
{noformat}

will never be Infinity, because liveEndoints.size() will not be 0. If it was, 
we would never came to that "else" branch.

If we have this in maybeGossipToCMS

{noformat}
Set cms = 
ClusterMetadata.current().fullCMSMembers();
if (cms.contains(getBroadcastAddressAndPort()))
return;

double probability = cms.size() / (double) (liveEndpoints.size() + 
unreachableEndpoints.size());
{noformat}

cms set can be e.g. 2 nodes, if cms does not contain the current node, then 
liveEndpoints will be non-zero too, no? In other words, can cms members be "not 
live" ? If they are not live, then they will be unreachable, so that will again 
not produce Infinity there.

I am still returning to this but I just can not see how it might be Infinity. 
Both liveEndpoints and unreachableEndpoints would have to be 0. Under what 
scenario that would be true?

> Fix Double Type issues in the Gossiper#maybeGossipToCMS
> ---
>
> Key: CASSANDRA-19426
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19426
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Transactional Cluster Metadata
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Low
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _*issue-1:*_
> if liveEndpoints.size()=unreachableEndpoints.size()=0; probability will be 
> {*}_Infinity_{*}.
> randDbl <= probability will always be true, then sendGossip
> _*issue-2:*_ 
> comparing two double is safe by using *<* or {*}>{*}. However missing 
> accuracy will happen if we compare the equality of two double by 
> intuition({*}={*}). For example:
> {code:java}
> double probability = 0.1;
> double randDbl = 0.10001; // Slightly greater than probability
> if (randDbl <= probability)
> {
>     System.out.println("randDbl <= probability(always here)");
> }
> else
> {
>     System.out.println("randDbl > probability");
> }
> {code}
> A good example from: _*Gossiper#maybeGossipToUnreachableMember*_
> {code:java}
> if (randDbl < prob)
> {
> sendGossip(message, Sets.filter(unreachableEndpoints.keySet(),
>                                 ep -> 
> !isDeadState(getEndpointStateMap().get(ep;
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19417) LIST SUPERUSERS cql command

2024-02-27 Thread Shailaja Koppu (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821165#comment-17821165
 ] 

Shailaja Koppu commented on CASSANDRA-19417:


[~maxwellguo] [~smiklosovic] as per my understanding from this 
[https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlListUsers.html] 
LIST USERS command is deprecated. Do you mean adding option to LIST ROLES 
command?

> LIST SUPERUSERS cql command
> ---
>
> Key: CASSANDRA-19417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19417
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/cqlsh
>Reporter: Shailaja Koppu
>Assignee: Shailaja Koppu
>Priority: Normal
>  Labels: CQL
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Developing a new CQL command LIST SUPERUSERS to return list of roles with 
> superuser privilege. This includes roles who acquired superuser privilege in 
> the hierarchy. 
> Context: LIST ROLES cql command lists roles, their membership details and 
> displays super=true for immediate superusers. But there can be roles who 
> acquired superuser privilege due to a grant. LIST ROLES command won't display 
> super=true for such roles and the only way to recognize such roles is to look 
> for atleast one row with super=true in the output of LIST ROLES OF  name> command. While this works to check is a given role has superuser 
> privilege, there may be services (for example, Sidecar) working with C* and 
> may need to maintain list of roles with superuser privilege. There is no 
> existing command/tool to retrieve such roles details. Hence developing this 
> command which returns all roles having superuser privilege.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19417) LIST SUPERUSERS cql command

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821140#comment-17821140
 ] 

Stefan Miklosovic commented on CASSANDRA-19417:
---

[~skoppu] would you please initiate a ML thread in dev@ to increase the 
visibility? We might also discuss all the details as Maxwell mentions. I think 
that LIST USERS modification to accept options is also possible and probably 
preferred instead of the introduction of a new CQL statement.

> LIST SUPERUSERS cql command
> ---
>
> Key: CASSANDRA-19417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19417
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/cqlsh
>Reporter: Shailaja Koppu
>Assignee: Shailaja Koppu
>Priority: Normal
>  Labels: CQL
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Developing a new CQL command LIST SUPERUSERS to return list of roles with 
> superuser privilege. This includes roles who acquired superuser privilege in 
> the hierarchy. 
> Context: LIST ROLES cql command lists roles, their membership details and 
> displays super=true for immediate superusers. But there can be roles who 
> acquired superuser privilege due to a grant. LIST ROLES command won't display 
> super=true for such roles and the only way to recognize such roles is to look 
> for atleast one row with super=true in the output of LIST ROLES OF  name> command. While this works to check is a given role has superuser 
> privilege, there may be services (for example, Sidecar) working with C* and 
> may need to maintain list of roles with superuser privilege. There is no 
> existing command/tool to retrieve such roles details. Hence developing this 
> command which returns all roles having superuser privilege.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19222) Leak - Strong self-ref loop detected in BTI

2024-02-27 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821136#comment-17821136
 ] 

Berenguer Blasi commented on CASSANDRA-19222:
-

5.0 and trunk in jenkins show a 100% pass. There's nothing in butler, there are 
no artifacts and I don't remember seeing anything like this. There's nothing to 
go about so I suggest we close and keep an eye in case it resurfaces.

> Leak - Strong self-ref loop detected in BTI
> ---
>
> Key: CASSANDRA-19222
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19222
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Jacek Lewandowski
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> https://app.circleci.com/pipelines/github/jacek-lewandowski/cassandra/1233/workflows/bb617340-f1da-4550-9c87-5541469972c4/jobs/62534/tests
> {noformat}
> ERROR [Strong-Reference-Leak-Detector:1] 2023-12-21 09:50:33,072 Strong 
> self-ref loop detected 
> [/tmp/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/oa-1-big
> private java.util.List 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.closeables-java.util.ArrayList
> transient java.lang.Object[] 
> java.util.ArrayList.elementData-[Ljava.lang.Object;
> transient java.lang.Object[] 
> java.util.ArrayList.elementData-org.apache.cassandra.io.util.FileHandle
> final org.apache.cassandra.utils.concurrent.Ref 
> org.apache.cassandra.utils.concurrent.SharedCloseableImpl.ref-org.apache.cassandra.utils.concurrent.Ref
> final org.apache.cassandra.utils.concurrent.Ref$State 
> org.apache.cassandra.utils.concurrent.Ref.state-org.apache.cassandra.utils.concurrent.Ref$State
> final org.apache.cassandra.utils.concurrent.Ref$GlobalState 
> org.apache.cassandra.utils.concurrent.Ref$State.globalState-org.apache.cassandra.utils.concurrent.Ref$GlobalState
> private final org.apache.cassandra.utils.concurrent.RefCounted$Tidy 
> org.apache.cassandra.utils.concurrent.Ref$GlobalState.tidy-org.apache.cassandra.io.util.FileHandle$Cleanup
> final java.util.Optional 
> org.apache.cassandra.io.util.FileHandle$Cleanup.chunkCache-java.util.Optional
> private final java.lang.Object 
> java.util.Optional.value-org.apache.cassandra.cache.ChunkCache
> private final org.apache.cassandra.utils.memory.BufferPool 
> org.apache.cassandra.cache.ChunkCache.bufferPool-org.apache.cassandra.utils.memory.BufferPool
> private final java.util.Set 
> org.apache.cassandra.utils.memory.BufferPool.localPoolReferences-java.util.Collections$SetFromMap
> private final java.util.Map 
> java.util.Collections$SetFromMap.m-java.util.concurrent.ConcurrentHashMap
> private final java.util.Map 
> java.util.Collections$SetFromMap.m-org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef
> private final org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks 
> org.apache.cassandra.utils.memory.BufferPool$LocalPoolRef.chunks-org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks
> private org.apache.cassandra.utils.memory.BufferPool$Chunk 
> org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.chunk0-org.apache.cassandra.utils.memory.BufferPool$Chunk
> private volatile org.apache.cassandra.utils.memory.BufferPool$LocalPool 
> org.apache.cassandra.utils.memory.BufferPool$Chunk.owner-org.apache.cassandra.utils.memory.BufferPool$LocalPool
> private final java.lang.Thread 
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.owningThread-io.netty.util.concurrent.FastThreadLocalThread
> private java.lang.Runnable 
> java.lang.Thread.target-io.netty.util.concurrent.FastThreadLocalRunnable
> private final java.lang.Runnable 
> io.netty.util.concurrent.FastThreadLocalRunnable.runnable-java.util.concurrent.ThreadPoolExecutor$Worker
> final java.util.concurrent.ThreadPoolExecutor 
> java.util.concurrent.ThreadPoolExecutor$Worker.this$0-org.apache.cassandra.concurrent.ScheduledThreadPoolExecutorPlus
> private final java.util.concurrent.BlockingQueue 
> java.util.concurrent.ThreadPoolExecutor.workQueue-java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue
> private final java.util.concurrent.BlockingQueue 
> java.util.concurrent.ThreadPoolExecutor.workQueue-java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
> private java.util.concurrent.Callable 
> java.util.concurrent.FutureTask.callable-java.util.concurrent.Executors$RunnableAdapter
> private final java.lang.Runnable 
> java.util.concurrent.Executors$RunnableAdapter.task-org.apache.cassandra.concurrent.ExecutionFailure$1
> final java.lang.Runnable 
> org.apache.cassandra.concurrent.ExecutionFailure$1.val$wrap-org.apache.cassandra.hints.HintsService$$Lambda$1142/0x000801576aa0
> private final org.apache.cassandra.hints.HintsService 
> 

[jira] [Commented] (CASSANDRA-19259) upgrade_tests.upgrade_through_versions_test consistently failing on circleci

2024-02-27 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821115#comment-17821115
 ] 

Berenguer Blasi commented on CASSANDRA-19259:
-

Theory: CASSANDRA-19409 unearthed timeout decorations on test methods were not 
being observed. A dirty post timeout env could somehow cause cross-talk between 
tests/nodes. Now that ticket is merged let's see what happens, let's give it a 
few days.

> upgrade_tests.upgrade_through_versions_test consistently failing on circleci
> 
>
> Key: CASSANDRA-19259
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19259
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/Other
>Reporter: Paulo Motta
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>
> This suite is consistently failing in  
> [4.0|https://app.circleci.com/pipelines/github/driftx/cassandra/1454/workflows/0357136e-cee3-42e4-900b-3347fc8d42d3/jobs/71008/tests]
>  and 
> [4.1|https://app.circleci.com/pipelines/github/driftx/cassandra/1453/workflows/dd1732df-271c-43bc-bc5f-8577c605c746/jobs/71009/tests]
>  with the following stack trace:
> {noformat}
> self = 
> process = 
> def _update_pid(self, process):
> """
> Reads pid from cassandra.pid file and stores in the self.pid
> After setting up pid updates status (UP, DOWN, etc) and node.conf
> """
> pidfile = os.path.join(self.get_path(), 'cassandra.pid')
> 
> start = time.time()
> while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0):
> if (time.time() - start > 30.0):
> common.error("Timed out waiting for pidfile to be filled 
> (current time is {})".format(datetime.now()))
> break
> else:
> time.sleep(0.1)
> 
> try:
> >   with open(pidfile, 'rb') as f:
> E   FileNotFoundError: [Errno 2] No such file or directory: 
> '/tmp/dtest-_8rdmjs0/test/node1/cassandra.pid'
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2100: FileNotFoundError
> During handling of the above exception, another exception occurred:
> self = 
>   object at 0x7f4c01419438>
> def test_parallel_upgrade(self):
> """
> Test upgrading cluster all at once (requires cluster downtime).
> """
> >   self.upgrade_scenario()
> upgrade_tests/upgrade_through_versions_test.py:387: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> upgrade_tests/upgrade_through_versions_test.py:491: in upgrade_scenario
> self.upgrade_to_version(version_meta, internode_ssl=internode_ssl)
> upgrade_tests/upgrade_through_versions_test.py:580: in upgrade_to_version
> jvm_args=['-Dcassandra.disable_max_protocol_auto_override=true'])  # 
> prevent protocol capping in mixed version clusters
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:906: in start
> if not self._wait_for_running(process, timeout_s=7):
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:931: in _wait_for_running
> self._update_pid(process)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = 
> process = 
> def _update_pid(self, process):
> """
> Reads pid from cassandra.pid file and stores in the self.pid
> After setting up pid updates status (UP, DOWN, etc) and node.conf
> """
> pidfile = os.path.join(self.get_path(), 'cassandra.pid')
> 
> start = time.time()
> while not (os.path.isfile(pidfile) and os.stat(pidfile).st_size > 0):
> if (time.time() - start > 30.0):
> common.error("Timed out waiting for pidfile to be filled 
> (current time is {})".format(datetime.now()))
> break
> else:
> time.sleep(0.1)
> 
> try:
> with open(pidfile, 'rb') as f:
> if 
> common.is_modern_windows_install(self.get_base_cassandra_version()):
> self.pid = 
> int(f.readline().strip().decode('utf-16').strip())
> else:
> self.pid = int(f.readline().strip())
> except IOError as e:
> >   raise NodeError('Problem starting node %s due to %s' % 
> > (self.name, e), process)
> E   ccmlib.node.NodeError: Problem starting node node1 due to [Errno 
> 2] No such file or directory: '/tmp/dtest-_8rdmjs0/test/node1/cassandra.pid'
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2106: NodeError
> {noformat}
> It's not clear whether this reproduces locally or just on circleci.
> We should address these failures before next 4.0.12 and 4.1.4 releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CASSANDRA-19417) LIST SUPERUSERS cql command

2024-02-27 Thread Maxwell Guo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821051#comment-17821051
 ] 

Maxwell Guo edited comment on CASSANDRA-19417 at 2/27/24 8:46 AM:
--

Do we need to make a DISCUSS through ML as we are introducing a new CQL grammar.

Besides, can we add some options to the original LIST USERS grammar to achieve 
this goal instead of introducing a new syntax?  like LIST USERS 
superuseronly=true; 


was (Author: maxwellguo):
Do we need to make a DISCUSS through ML as we are introducing a new CQL grammar.

> LIST SUPERUSERS cql command
> ---
>
> Key: CASSANDRA-19417
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19417
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/cqlsh
>Reporter: Shailaja Koppu
>Assignee: Shailaja Koppu
>Priority: Normal
>  Labels: CQL
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Developing a new CQL command LIST SUPERUSERS to return list of roles with 
> superuser privilege. This includes roles who acquired superuser privilege in 
> the hierarchy. 
> Context: LIST ROLES cql command lists roles, their membership details and 
> displays super=true for immediate superusers. But there can be roles who 
> acquired superuser privilege due to a grant. LIST ROLES command won't display 
> super=true for such roles and the only way to recognize such roles is to look 
> for atleast one row with super=true in the output of LIST ROLES OF  name> command. While this works to check is a given role has superuser 
> privilege, there may be services (for example, Sidecar) working with C* and 
> may need to maintain list of roles with superuser privilege. There is no 
> existing command/tool to retrieve such roles details. Hence developing this 
> command which returns all roles having superuser privilege.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19414) Skinny dev circle workflow

2024-02-27 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821056#comment-17821056
 ] 

Berenguer Blasi edited comment on CASSANDRA-19414 at 2/27/24 8:29 AM:
--

Trunk PR attached, everything rebased and green CI runs attached. Both PRs are 
the same. I'll merge tomorrow unless you spot anything wrong. Thx for the 
reviews!


was (Author: bereng):
Trunk PR attached, everything rebased and gree CI runs attached. Both PRs are 
the same. I'll merge tomorrow unless you spot anything wrong. Thx for the 
reviews!

> Skinny dev circle workflow
> --
>
> Key: CASSANDRA-19414
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19414
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CI
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> CircleCi CI runs are getting pretty heavy. During dev iterations we trigger 
> many CI pre-commit jobs which are just an overkill.
> This ticket has the purpose to purge from the pre-commit workflow all 
> variations of the test matrix but the vanilla one. That should enable us for 
> a quick and cheap to iterate *during dev*, this is not a substitute for 
> pre-commit . This ticket's work will serve as the basis for the upcoming 
> changes being discussed 
> [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19414) Skinny dev circle workflow

2024-02-27 Thread Berenguer Blasi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821056#comment-17821056
 ] 

Berenguer Blasi commented on CASSANDRA-19414:
-

Trunk PR attached, everything rebased and gree CI runs attached. Both PRs are 
the same. I'll merge tomorrow unless you spot anything wrong. Thx for the 
reviews!

> Skinny dev circle workflow
> --
>
> Key: CASSANDRA-19414
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19414
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CI
>Reporter: Berenguer Blasi
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> CircleCi CI runs are getting pretty heavy. During dev iterations we trigger 
> many CI pre-commit jobs which are just an overkill.
> This ticket has the purpose to purge from the pre-commit workflow all 
> variations of the test matrix but the vanilla one. That should enable us for 
> a quick and cheap to iterate *during dev*, this is not a substitute for 
> pre-commit . This ticket's work will serve as the basis for the upcoming 
> changes being discussed 
> [atm|https://lists.apache.org/thread/qf5c3hhz6qkpyqvbd3sppzlmftlc0bw0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   >