(cassandra-website) branch asf-staging updated (26d2c71e0 -> b36348e19)

2024-05-31 Thread git-site-role
This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a change to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git


 discard 26d2c71e0 generate docs for 2884dab5
 new b36348e19 generate docs for 2884dab5

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (26d2c71e0)
\
 N -- N -- N   refs/heads/asf-staging (b36348e19)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../managing/tools/nodetool/disablebinary.html |   5 -
 .../managing/tools/nodetool/disablebinary.html |   5 -
 .../managing/tools/nodetool/disablebinary.html |   5 -
 .../managing/tools/nodetool/disablebinary.html |   5 -
 content/search-index.js|   2 +-
 site-ui/build/ui-bundle.zip| Bin 4883646 -> 4883646 
bytes
 6 files changed, 17 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19445) Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"

2024-05-31 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851199#comment-17851199
 ] 

Blake Eggleston commented on CASSANDRA-19445:
-

So there are 3 processes that start a paxos repair. They're run as part of 
{{{}nodetool repair{}}}, they're run as part of topology changes 
(move/bootstrap/decommission), and they're run every 5 minutes as part of an 
automatic cleanup process. I think that logging should remain at INFO for paxos 
repairs that originate from repairs and topology changes , since these have 
important implications for correctness. However I don't have any problems with 
logging the auto cleanup process repair info at DEBUG, since they're just for 
convenience / cleanup and also what's generating most of the noise you're 
seeing.

> Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"
> --
>
> Key: CASSANDRA-19445
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19445
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions
>Reporter: Zbyszek Z
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: paxos-entry.txt, paxos-multiple.txt
>
>
> Hello,
> On our cluster logs are flooded with: 
> {code:java}
> INFO  [OptionalTasks:1] 2024-02-27 14:27:51,213 
> PaxosCleanupLocalCoordinator.java:185 - Completed 0 uncommitted paxos 
> instances for X on ranges 
> [(9210458530128018597,-9222146739399525061], 
> (-9222146739399525061,-9174246180597321488], 
> (-9174246180597321488,-9155837684527496840], 
> (-9155837684527496840,-9148981328078890812], 
> (-9148981328078890812,-9141853035919151700], 
> (-9141853035919151700,-9138872620588476741], {code}
> I cannot find anything in doc regarding this longline. Also this are huge log 
> payloads that heavy flood system.log. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19674) CEP-15: (Accord) Bootstrap's LocalOnly txn is not written to journal which causes issues once evicted from the cache

2024-05-31 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-19674:
--
Test and Documentation Plan: new tests
 Status: Patch Available  (was: In Progress)

> CEP-15: (Accord) Bootstrap's LocalOnly txn is not written to journal which 
> causes issues once evicted from the cache
> 
>
> Key: CASSANDRA-19674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19674
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: NA
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Bootstrap creates a LocalOnly txn and executes it rather than following the 
> normal txn flow.  This has a problem as these mutations are not stored in the 
> journal so a cache evict of the LocalOnly txn causes us to fail as we don’t 
> know how to reconstruct it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19675) Avoid streams in the common case for UpdateTransaction creation

2024-05-31 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-19675:

Description: 
Some recent Accord profiling highlighted some easily addressable inefficiency 
in the way we create new {{UpdateTransaction}} objects in 
{{SecondaryIndexManager}} that have existed since the introduction of index 
groups for SAI. We should be able to clean this up by avoiding stream creation 
or even iteration over the groups when there is a single index group, which is 
going to be the most common case with SAI anyway. If we do have to iterate, 
there should also be no reason to copy the collection of index groups via 
{{listIndexGroups()}}, although that copying can remain in the method itself 
for external callers.

 !new_update_txn_streams.png! 

  was:Some recent Accord profiling highlighted some easily addressable 
inefficiency in the way we create new {{UpdateTransaction}} objects in 
{{SecondaryIndexManager}} that have existed since the introduction of index 
groups for SAI. We should be able to clean this up by avoiding stream creation 
or even iteration over the groups when there is a single index group, which is 
going to be the most common case with SAI anyway. If we do have to iterate, 
there should also be no reason to copy the collection of index groups via 
{{listIndexGroups()}}, although that copying can remain in the method itself 
for external callers.


> Avoid streams in the common case for UpdateTransaction creation
> ---
>
> Key: CASSANDRA-19675
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19675
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments: new_update_txn_streams.png
>
>
> Some recent Accord profiling highlighted some easily addressable inefficiency 
> in the way we create new {{UpdateTransaction}} objects in 
> {{SecondaryIndexManager}} that have existed since the introduction of index 
> groups for SAI. We should be able to clean this up by avoiding stream 
> creation or even iteration over the groups when there is a single index 
> group, which is going to be the most common case with SAI anyway. If we do 
> have to iterate, there should also be no reason to copy the collection of 
> index groups via {{listIndexGroups()}}, although that copying can remain in 
> the method itself for external callers.
>  !new_update_txn_streams.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19675) Avoid streams in the common case for UpdateTransaction creation

2024-05-31 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-19675:

Change Category: Performance
 Complexity: Low Hanging Fruit
Component/s: Feature/SAI
  Fix Version/s: 5.0.x
 5.x
 Status: Open  (was: Triage Needed)

> Avoid streams in the common case for UpdateTransaction creation
> ---
>
> Key: CASSANDRA-19675
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19675
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments: new_update_txn_streams.png
>
>
> Some recent Accord profiling highlighted some easily addressable inefficiency 
> in the way we create new {{UpdateTransaction}} objects in 
> {{SecondaryIndexManager}} that have existed since the introduction of index 
> groups for SAI. We should be able to clean this up by avoiding stream 
> creation or even iteration over the groups when there is a single index 
> group, which is going to be the most common case with SAI anyway. If we do 
> have to iterate, there should also be no reason to copy the collection of 
> index groups via {{listIndexGroups()}}, although that copying can remain in 
> the method itself for external callers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19675) Avoid streams in the common case for UpdateTransaction creation

2024-05-31 Thread Caleb Rackliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-19675:

Attachment: new_update_txn_streams.png

> Avoid streams in the common case for UpdateTransaction creation
> ---
>
> Key: CASSANDRA-19675
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19675
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/SAI
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
> Attachments: new_update_txn_streams.png
>
>
> Some recent Accord profiling highlighted some easily addressable inefficiency 
> in the way we create new {{UpdateTransaction}} objects in 
> {{SecondaryIndexManager}} that have existed since the introduction of index 
> groups for SAI. We should be able to clean this up by avoiding stream 
> creation or even iteration over the groups when there is a single index 
> group, which is going to be the most common case with SAI anyway. If we do 
> have to iterate, there should also be no reason to copy the collection of 
> index groups via {{listIndexGroups()}}, although that copying can remain in 
> the method itself for external callers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19675) Avoid streams in the common case for UpdateTransaction creation

2024-05-31 Thread Caleb Rackliffe (Jira)
Caleb Rackliffe created CASSANDRA-19675:
---

 Summary: Avoid streams in the common case for UpdateTransaction 
creation
 Key: CASSANDRA-19675
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19675
 Project: Cassandra
  Issue Type: Improvement
Reporter: Caleb Rackliffe
Assignee: Caleb Rackliffe


Some recent Accord profiling highlighted some easily addressable inefficiency 
in the way we create new {{UpdateTransaction}} objects in 
{{SecondaryIndexManager}} that have existed since the introduction of index 
groups for SAI. We should be able to clean this up by avoiding stream creation 
or even iteration over the groups when there is a single index group, which is 
going to be the most common case with SAI anyway. If we do have to iterate, 
there should also be no reason to copy the collection of index groups via 
{{listIndexGroups()}}, although that copying can remain in the method itself 
for external callers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19674) CEP-15: (Accord) Bootstrap's LocalOnly txn is not written to journal which causes issues once evicted from the cache

2024-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-19674:
---
Labels: pull-request-available  (was: )

> CEP-15: (Accord) Bootstrap's LocalOnly txn is not written to journal which 
> causes issues once evicted from the cache
> 
>
> Key: CASSANDRA-19674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19674
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: NA
>
>
> Bootstrap creates a LocalOnly txn and executes it rather than following the 
> normal txn flow.  This has a problem as these mutations are not stored in the 
> journal so a cache evict of the LocalOnly txn causes us to fail as we don’t 
> know how to reconstruct it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19674) CEP-15: (Accord) Bootstrap's LocalOnly txn is not written to journal which causes issues once evicted from the cache

2024-05-31 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-19674:
--
 Bug Category: Parent values: Correctness(12982)Level 1 values: 
Unrecoverable Corruption / Loss(13161)
   Complexity: Low Hanging Fruit
Discovered By: Code Inspection
Fix Version/s: NA
 Severity: Critical
   Status: Open  (was: Triage Needed)

> CEP-15: (Accord) Bootstrap's LocalOnly txn is not written to journal which 
> causes issues once evicted from the cache
> 
>
> Key: CASSANDRA-19674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19674
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: NA
>
>
> Bootstrap creates a LocalOnly txn and executes it rather than following the 
> normal txn flow.  This has a problem as these mutations are not stored in the 
> journal so a cache evict of the LocalOnly txn causes us to fail as we don’t 
> know how to reconstruct it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19674) CEP-15: (Accord) Bootstrap's LocalOnly txn is not written to journal which causes issues once evicted from the cache

2024-05-31 Thread David Capwell (Jira)
David Capwell created CASSANDRA-19674:
-

 Summary: CEP-15: (Accord) Bootstrap's LocalOnly txn is not written 
to journal which causes issues once evicted from the cache
 Key: CASSANDRA-19674
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19674
 Project: Cassandra
  Issue Type: Bug
  Components: Accord
Reporter: David Capwell
Assignee: David Capwell


Bootstrap creates a LocalOnly txn and executes it rather than following the 
normal txn flow.  This has a problem as these mutations are not stored in the 
journal so a cache evict of the LocalOnly txn causes us to fail as we don’t 
know how to reconstruct it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest

2024-05-31 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19641:
---
  Fix Version/s: 5.x
  Since Version: 5.x
Source Control Link: 
https://github.com/apache/cassandra/commit/3b99044d6d5491304d4a25d8dcea54510cfd3215
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed as Cassandra  and Accord 
[4e8bcae81f9751b9d732fd5056bce31c97ad58f3|https://github.com/apache/cassandra-accord/commit/4e8bcae81f9751b9d732fd5056bce31c97ad58f3].

> Accord barriers/inclusive sync points cause failures in BurnTest
> 
>
> Key: CASSANDRA-19641
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19641
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> The burn test fails almost every run at the moment we found several things to 
> fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest

2024-05-31 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19641:
---
Status: Ready to Commit  (was: Review In Progress)

> Accord barriers/inclusive sync points cause failures in BurnTest
> 
>
> Key: CASSANDRA-19641
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19641
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Attachments: ci_summary.html
>
>
> The burn test fails almost every run at the moment we found several things to 
> fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest

2024-05-31 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19641:
---
Reviewers: Benedict Elliott Smith, Ariel Weisberg
   Benedict Elliott Smith, Ariel Weisberg  (was: Ariel Weisberg, 
Benedict Elliott Smith)
   Status: Review In Progress  (was: Patch Available)

> Accord barriers/inclusive sync points cause failures in BurnTest
> 
>
> Key: CASSANDRA-19641
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19641
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Attachments: ci_summary.html
>
>
> The burn test fails almost every run at the moment we found several things to 
> fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[PR] JAVA-3109, JAVA-2980: Support binding collections of attributes in IN clause query builder [cassandra-java-driver]

2024-05-31 Thread via GitHub


lukasz-antoniak opened a new pull request, #1935:
URL: https://github.com/apache/cassandra-java-driver/pull/1935

   JIRA Links:
   - https://datastax-oss.atlassian.net/browse/JAVA-3109
   - https://datastax-oss.atlassian.net/browse/JAVA-2980


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19673) Investigate stream pipelines in hot paths

2024-05-31 Thread Stefan Miklosovic (Jira)
Stefan Miklosovic created CASSANDRA-19673:
-

 Summary: Investigate stream pipelines in hot paths
 Key: CASSANDRA-19673
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19673
 Project: Cassandra
  Issue Type: Task
Reporter: Stefan Miklosovic
Assignee: Stefan Miklosovic


As per discussion in (1), map where we are at with stream pipelines api. Based 
on that, the next step should be to get rid of them and replace it with "for-s" 
but we are not there yet.

 

https://lists.apache.org/thread/65glsjzkmpktzmns6j9wvr4nczvskx36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19673) Investigate stream pipelines in hot paths

2024-05-31 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-19673:
--
Change Category: Performance
 Complexity: Normal
Component/s: Legacy/Core
  Fix Version/s: 5.x
 Status: Open  (was: Triage Needed)

> Investigate stream pipelines in hot paths
> -
>
> Key: CASSANDRA-19673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19673
> Project: Cassandra
>  Issue Type: Task
>  Components: Legacy/Core
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>
> As per discussion in (1), map where we are at with stream pipelines api. 
> Based on that, the next step should be to get rid of them and replace it with 
> "for-s" but we are not there yet.
>  
> https://lists.apache.org/thread/65glsjzkmpktzmns6j9wvr4nczvskx36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19664) Accord Journal Determinism: PreAccept replay stability

2024-05-31 Thread Aleksey Yeschenko (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851123#comment-17851123
 ] 

Aleksey Yeschenko commented on CASSANDRA-19664:
---

+1

> Accord Journal Determinism: PreAccept replay stability 
> ---
>
> Key: CASSANDRA-19664
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19664
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> Currently, some messages, such as PreAccept can have some of their context 
> initialized on replay. This patch adds a concept of Context to Journal that 
> can be used for arbitrary information necessary for replaying them just the 
> way they were executed the first time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19661) Cannot restart Cassandra 5 after creating a vector table and index

2024-05-31 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851119#comment-17851119
 ] 

Benjamin Lerer commented on CASSANDRA-19661:


[~sergrua] Can you reproduce the problem with the latest C* version?


> Cannot restart Cassandra 5 after creating a vector table and index
> --
>
> Key: CASSANDRA-19661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19661
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/SAI
>Reporter: Sergio Rua
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>
> I'm using llama-index and llama3 to train a model. I'm using a very simple 
> code that reads some *.txt files from local and uploads them to Cassandra and 
> then creates the index:
>  
> {code:java}
> # Create the index from documents
> index = VectorStoreIndex.from_documents(
> documents,
> service_context=vector_store.service_context,
> storage_context=storage_context,
> show_progress=True,
> ) {code}
> This works well and I'm able to use a Chat app to get responses from the 
> Cassandra data. however, right after, I cannot restart Cassandra. It'll break 
> with the following error:
>  
> {code:java}
> INFO  [PerDiskMemtableFlushWriter_0:7] 2024-05-23 08:23:20,102 
> Flushing.java:179 - Completed flushing 
> /data/cassandra/data/gpt/docs_20240523-10c8eaa018d811ef8dadf75182f3e2b4/da-6-bti-Data.db
>  (124.236MiB) for commitlog position 
> CommitLogPosition(segmentId=1716452305636, position=15336)
> [...]
> WARN  [MemtableFlushWriter:1] 2024-05-23 08:28:29,575 
> MemtableIndexWriter.java:92 - [gpt.docs.idx_vector_docs] Aborting index 
> memtable flush for 
> /data/cassandra/data/gpt/docs-aea77a80184b11ef8dadf75182f3e2b4/da-3-bti...{code}
> {code:java}
> java.lang.IllegalStateException: null
>         at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:496)
>         at 
> org.apache.cassandra.index.sai.disk.v1.vector.VectorPostings.computeRowIds(VectorPostings.java:76)
>         at 
> org.apache.cassandra.index.sai.disk.v1.vector.OnHeapGraph.writeData(OnHeapGraph.java:313)
>         at 
> org.apache.cassandra.index.sai.memory.VectorMemoryIndex.writeDirect(VectorMemoryIndex.java:272)
>         at 
> org.apache.cassandra.index.sai.memory.MemtableIndex.writeDirect(MemtableIndex.java:110)
>         at 
> org.apache.cassandra.index.sai.disk.v1.MemtableIndexWriter.flushVectorIndex(MemtableIndexWriter.java:192)
>         at 
> org.apache.cassandra.index.sai.disk.v1.MemtableIndexWriter.complete(MemtableIndexWriter.java:117)
>         at 
> org.apache.cassandra.index.sai.disk.StorageAttachedIndexWriter.complete(StorageAttachedIndexWriter.java:185)
>         at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
>         at 
> java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1085)
>         at 
> org.apache.cassandra.io.sstable.format.SSTableWriter.commit(SSTableWriter.java:289)
>         at 
> org.apache.cassandra.db.compaction.unified.ShardedMultiWriter.commit(ShardedMultiWriter.java:219)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1323)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1222)
>         at 
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:829) {code}
> The table created by the script is as follows:
>  
> {noformat}
> CREATE TABLE gpt.docs (
> partition_id text,
> row_id text,
> attributes_blob text,
> body_blob text,
> vector vector,
> metadata_s map,
> PRIMARY KEY (partition_id, row_id)
> ) WITH CLUSTERING ORDER BY (row_id ASC)
> AND additional_write_policy = '99p'
> AND allow_auto_snapshot = true
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND cdc = false
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy', 
> 'scaling_parameters': 'T4', 'target_sstable_size': '1GiB'}
> AND compression = {'chunk_length_in_kb': '16', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND memtable = 'default'
> AND crc_check_chance = 1.0
> AND default_time_to_live = 0
> AND extensions = {}
> AND gc_grace_seconds = 864000
> AND incremental_backups =

[jira] [Updated] (CASSANDRA-19664) Accord Journal Determinism: PreAccept replay stability

2024-05-31 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-19664:

  Fix Version/s: 5.1-alpha1
  Since Version: 5.1-alpha1
Source Control Link: 
https://github.com/apache/cassandra/commit/b0ca509e7add760d187fcc5a9908d93d7c4fd6ec
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Accord Journal Determinism: PreAccept replay stability 
> ---
>
> Key: CASSANDRA-19664
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19664
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> Currently, some messages, such as PreAccept can have some of their context 
> initialized on replay. This patch adds a concept of Context to Journal that 
> can be used for arbitrary information necessary for replaying them just the 
> way they were executed the first time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19664) Accord Journal Determinism: PreAccept replay stability

2024-05-31 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-19664:

Status: Ready to Commit  (was: Review In Progress)

Based on Aleksey's +1 on both patches, merging.

> Accord Journal Determinism: PreAccept replay stability 
> ---
>
> Key: CASSANDRA-19664
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19664
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> Currently, some messages, such as PreAccept can have some of their context 
> initialized on replay. This patch adds a concept of Context to Journal that 
> can be used for arbitrary information necessary for replaying them just the 
> way they were executed the first time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cep-15-accord updated: Accord Journal Determinism: PreAccept replay stability

2024-05-31 Thread ifesdjeen
This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a commit to branch cep-15-accord
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cep-15-accord by this push:
 new b0ca509e7a Accord Journal Determinism: PreAccept replay stability
b0ca509e7a is described below

commit b0ca509e7add760d187fcc5a9908d93d7c4fd6ec
Author: Alex Petrov 
AuthorDate: Wed May 29 19:16:26 2024 +0200

Accord Journal Determinism: PreAccept replay stability

Patch by Alex Petrov; reviewed by Aleksey Yeschenko for CASSANDRA-19664
---
 modules/accord |   2 +-
 .../apache/cassandra/journal/RecordPointer.java|  66 +++
 .../service/accord/AccordCommandStore.java |   4 +-
 .../cassandra/service/accord/AccordJournal.java| 206 +
 .../service/accord/AccordSafeCommandStore.java |  49 -
 5 files changed, 197 insertions(+), 130 deletions(-)

diff --git a/modules/accord b/modules/accord
index 4e8bcae81f..84e89bd91c 16
--- a/modules/accord
+++ b/modules/accord
@@ -1 +1 @@
-Subproject commit 4e8bcae81f9751b9d732fd5056bce31c97ad58f3
+Subproject commit 84e89bd91cf1b058fbf314b750336a1ec1096b18
diff --git a/src/java/org/apache/cassandra/journal/RecordPointer.java 
b/src/java/org/apache/cassandra/journal/RecordPointer.java
new file mode 100644
index 00..2b3e8ea6b8
--- /dev/null
+++ b/src/java/org/apache/cassandra/journal/RecordPointer.java
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.journal;
+
+import com.google.common.primitives.Ints;
+import com.google.common.primitives.Longs;
+
+// TODO: make this available in the accord table as an ID
+public class RecordPointer implements Comparable
+{
+public final long segment; // unique segment id
+public final int position; // record start position within the segment
+
+public RecordPointer(long segment, int position)
+{
+this.segment = segment;
+this.position = position;
+}
+
+@Override
+public boolean equals(Object other)
+{
+if (this == other)
+return true;
+if (!(other instanceof RecordPointer))
+return false;
+RecordPointer that = (RecordPointer) other;
+return this.segment == that.segment
+   && this.position == that.position;
+}
+
+@Override
+public int hashCode()
+{
+return Long.hashCode(segment) + position * 31;
+}
+
+@Override
+public String toString()
+{
+return "(" + segment + ", " + position + ')';
+}
+
+@Override
+public int compareTo(RecordPointer that)
+{
+int cmp = Longs.compare(this.segment, that.segment);
+return cmp != 0 ? cmp : Ints.compare(this.position, that.position);
+}
+}
\ No newline at end of file
diff --git 
a/src/java/org/apache/cassandra/service/accord/AccordCommandStore.java 
b/src/java/org/apache/cassandra/service/accord/AccordCommandStore.java
index 2a67ba656d..c846038fd8 100644
--- a/src/java/org/apache/cassandra/service/accord/AccordCommandStore.java
+++ b/src/java/org/apache/cassandra/service/accord/AccordCommandStore.java
@@ -488,7 +488,9 @@ public class AccordCommandStore extends CommandStore 
implements CacheSize
 timestampsForKeys.values().forEach(AccordSafeState::preExecute);
 if (commandsForRanges != null)
 commandsForRanges.preExecute();
-current = new AccordSafeCommandStore(preLoadContext, commands, 
timestampsForKeys, commandsForKeys, commandsForRanges, this);
+
+current = AccordSafeCommandStore.create(preLoadContext, commands, 
timestampsForKeys, commandsForKeys, commandsForRanges, this);
+
 return current;
 }
 
diff --git a/src/java/org/apache/cassandra/service/accord/AccordJournal.java 
b/src/java/org/apache/cassandra/service/accord/AccordJournal.java
index ce90b26747..0c31afbb4c 100644
--- a/src/java/org/apache/cassandra/service/accord/AccordJournal.java
+++ b/src/java/org/apache/cassandra/service/accord/AccordJournal.java
@@ -40,22 +40,16 @@ import com.google.common.collect.ImmutableMap;
 imp

Re: [PR] Limit calls to Conversions.resolveExecutionProfile [cassandra-java-driver]

2024-05-31 Thread via GitHub


lukasz-antoniak commented on PR #1623:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1623#issuecomment-2142203185

   Any optimisation is fine. The changes you proposed make code cleaner and the 
initialisation of execution profile easier to follow. Let us just address one 
comment that I had regarding the reference in callback class.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra-accord) branch trunk updated: Accord Journal Determinism: PreAccept replay stability

2024-05-31 Thread ifesdjeen
This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-accord.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 84e89bd9 Accord Journal Determinism: PreAccept replay stability
84e89bd9 is described below

commit 84e89bd91cf1b058fbf314b750336a1ec1096b18
Author: Alex Petrov 
AuthorDate: Wed May 29 14:32:33 2024 +0200

Accord Journal Determinism: PreAccept replay stability

Patch by Alex Petrov; reviewed by Aleksey Yeschenko for CASSANDRA-19664
---
 .../src/main/java/accord/local/CommandStore.java   |  6 +
 .../main/java/accord/local/SafeCommandStore.java   |  5 
 .../java/accord/messages/ExecutionContext.java | 29 --
 .../src/main/java/accord/messages/Propagate.java   |  1 -
 .../main/java/accord/messages/ReplyContext.java|  1 +
 .../src/main/java/accord/messages/TxnRequest.java  |  5 
 6 files changed, 12 insertions(+), 35 deletions(-)

diff --git a/accord-core/src/main/java/accord/local/CommandStore.java 
b/accord-core/src/main/java/accord/local/CommandStore.java
index 1d5a3913..c7baf2f5 100644
--- a/accord-core/src/main/java/accord/local/CommandStore.java
+++ b/accord-core/src/main/java/accord/local/CommandStore.java
@@ -319,13 +319,9 @@ public abstract class CommandStore implements AgentExecutor
  */
 final Timestamp preaccept(TxnId txnId, Seekables keys, 
SafeCommandStore safeStore, boolean permitFastPath)
 {
-// TODO (expected): make preAcceptTimeout() be a part of 
SafeCommandStore, initiated from ExecutionContext;
-//  preAcceptTimeout can be subject to local configuration 
changes, which would break determinism of repeated
-//  message processing, if, say, replayed from a log.
-
 NodeTimeService time = safeStore.time();
 
-boolean isExpired = time.now() - txnId.hlc() >= 
agent().preAcceptTimeout() && !txnId.kind().isSyncPoint();
+boolean isExpired = time.now() - txnId.hlc() >= 
safeStore.preAcceptTimeout() && !txnId.kind().isSyncPoint();
 if (rejectBefore != null && !isExpired)
 isExpired = null == rejectBefore.foldl(keys, (rejectIfBefore, 
test) -> rejectIfBefore.compareTo(test) > 0 ? null : test, txnId, 
Objects::isNull);
 
diff --git a/accord-core/src/main/java/accord/local/SafeCommandStore.java 
b/accord-core/src/main/java/accord/local/SafeCommandStore.java
index 5c8e2834..139bf088 100644
--- a/accord-core/src/main/java/accord/local/SafeCommandStore.java
+++ b/accord-core/src/main/java/accord/local/SafeCommandStore.java
@@ -183,6 +183,11 @@ public abstract class SafeCommandStore
 return maybeTruncate(safeCfk);
 }
 
+public long preAcceptTimeout()
+{
+return agent().preAcceptTimeout();
+}
+
 protected abstract SafeCommand getInternal(TxnId txnId);
 protected abstract SafeCommand getInternalIfLoadedAndInitialised(TxnId 
txnId);
 protected abstract SafeCommandsForKey getInternal(Key key);
diff --git a/accord-core/src/main/java/accord/messages/ExecutionContext.java 
b/accord-core/src/main/java/accord/messages/ExecutionContext.java
deleted file mode 100644
index dbf4c2db..
--- a/accord-core/src/main/java/accord/messages/ExecutionContext.java
+++ /dev/null
@@ -1,29 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package accord.messages;
-
-/**
- * Necessary context to allow for deterministic repeated execution of requests 
(e.g. when re-applying from a log)
- */
-public interface ExecutionContext
-{
-/**
- * @return PreAccept timeout as it was at request execution
- */
-long preAcceptTimeout();
-}
diff --git a/accord-core/src/main/java/accord/messages/Propagate.java 
b/accord-core/src/main/java/accord/messages/Propagate.java
index 351c636e..c67d5fa2 100644
--- a/accord-core/src/main/java/accord/messages/Propagate.java
+++ b/accord-core/src/main/java/accord/messages/Propagate.java
@@ -53,7 +53,6 @@ import static accord.local.SaveStatus.Stable;
 import static accord.local.SaveStatus.Uninitialised;
 import static accord.local.Status.NotDefined;
 import static accord.local.Status.Phase

Re: [PR] Limit calls to Conversions.resolveExecutionProfile [cassandra-java-driver]

2024-05-31 Thread via GitHub


chibenwa commented on PR #1623:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1623#issuecomment-2142174927

   For the record https://github.com/apache/cassandra-java-driver/pull/1622 
being merge the remaining resolveExecutionProfile only accounts for .07% of 
Apache James CPU footprint.
   
   According to me it would fall under the optimization ratio... 
   
   Shall I close this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] Limit calls to Conversions.resolveExecutionProfile [cassandra-java-driver]

2024-05-31 Thread via GitHub


chibenwa commented on PR #1623:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1623#issuecomment-2142165069

   I forced push this PR in order to remove the now merged commit.
   
   Sorry for the confusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] Conversions: resolveExecutionProfile only when needed [cassandra-java-driver]

2024-05-31 Thread via GitHub


chibenwa closed pull request #1622: Conversions: resolveExecutionProfile only 
when needed
URL: https://github.com/apache/cassandra-java-driver/pull/1622


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[PR] Limit calls to Conversions.resolveExecutionProfile [cassandra-java-driver]

2024-05-31 Thread via GitHub


chibenwa opened a new pull request, #1623:
URL: https://github.com/apache/cassandra-java-driver/pull/1623

   Those repeated calls account for a non-negligible portion of my application
   CPU (0.6%) and can definitly be a final field so that it gets resolved only
   once per CqlRequestHandler.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19672) some unit tests should generate files in the tmp directory

2024-05-31 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19672:
-
Fix Version/s: 5.x
   (was: 5.1)

> some unit tests should generate files in the tmp directory
> --
>
> Key: CASSANDRA-19672
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19672
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Normal
> Fix For: 5.x
>
>
> I run "{*}_ant test_{*}" to fire the whole test suit cases in my local 
> machine, and found some UTs had generated files in current directory, 
> otherwise the tmp directory.
>  
> {code:java}
> [root@vm-24-5-centos cassandra]# git status
> # audit/
> # compaction.log
> # 
> import_cql_test_keyspace_table_testcopyonlythoserowsthatmatchvectortyp_04.err
> {code}
>  
> These problematic UTs are
>  
> {code:java}
> ant testsome 
> -Dtest.name=org.apache.cassandra.service.StorageServiceServerTest 
> -Dtest.methods=testAuditLogEnableLoggerNotFound
> ant testsome 
> -Dtest.name=org.apache.cassandra.service.StorageServiceServerTest 
> -Dtest.methods=testAuditLogEnableLoggerTransitions
> ant testsome -Dtest.name=org.apache.cassandra.tools.CompactionStressTest 
> -Dtest.methods=testWriteAndCompact
> ant testsome -Dtest.name=org.apache.cassandra.tools.cqlsh.CqlshTest 
> -Dtest.methods=testCopyOnlyThoseRowsThatMatchVectorTypeSize
> {code}
>  
> The patch is aimed to generate files in the tmp directory to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19672) some unit tests should generate files in the tmp directory

2024-05-31 Thread Ling Mao (Jira)
Ling Mao created CASSANDRA-19672:


 Summary: some unit tests should generate files in the tmp directory
 Key: CASSANDRA-19672
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19672
 Project: Cassandra
  Issue Type: Improvement
  Components: Test/unit
Reporter: Ling Mao


I run "{*}_ant test_{*}" to fire the whole test suit cases in my local machine, 
and found some UTs had generated files in current directory, otherwise the tmp 
directory.

 
{code:java}
[root@vm-24-5-centos cassandra]# git status
# audit/
# compaction.log
# import_cql_test_keyspace_table_testcopyonlythoserowsthatmatchvectortyp_04.err
{code}
 

These problematic UTs are

 
{code:java}
ant testsome -Dtest.name=org.apache.cassandra.service.StorageServiceServerTest 
-Dtest.methods=testAuditLogEnableLoggerNotFound
ant testsome -Dtest.name=org.apache.cassandra.service.StorageServiceServerTest 
-Dtest.methods=testAuditLogEnableLoggerTransitions
ant testsome -Dtest.name=org.apache.cassandra.tools.CompactionStressTest 
-Dtest.methods=testWriteAndCompact
ant testsome -Dtest.name=org.apache.cassandra.tools.cqlsh.CqlshTest 
-Dtest.methods=testCopyOnlyThoseRowsThatMatchVectorTypeSize
{code}
 

The patch is aimed to generate files in the tmp directory to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19672) some unit tests should generate files in the tmp directory

2024-05-31 Thread Ling Mao (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ling Mao updated CASSANDRA-19672:
-
Fix Version/s: 5.1

> some unit tests should generate files in the tmp directory
> --
>
> Key: CASSANDRA-19672
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19672
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Ling Mao
>Priority: Normal
> Fix For: 5.1
>
>
> I run "{*}_ant test_{*}" to fire the whole test suit cases in my local 
> machine, and found some UTs had generated files in current directory, 
> otherwise the tmp directory.
>  
> {code:java}
> [root@vm-24-5-centos cassandra]# git status
> # audit/
> # compaction.log
> # 
> import_cql_test_keyspace_table_testcopyonlythoserowsthatmatchvectortyp_04.err
> {code}
>  
> These problematic UTs are
>  
> {code:java}
> ant testsome 
> -Dtest.name=org.apache.cassandra.service.StorageServiceServerTest 
> -Dtest.methods=testAuditLogEnableLoggerNotFound
> ant testsome 
> -Dtest.name=org.apache.cassandra.service.StorageServiceServerTest 
> -Dtest.methods=testAuditLogEnableLoggerTransitions
> ant testsome -Dtest.name=org.apache.cassandra.tools.CompactionStressTest 
> -Dtest.methods=testWriteAndCompact
> ant testsome -Dtest.name=org.apache.cassandra.tools.cqlsh.CqlshTest 
> -Dtest.methods=testCopyOnlyThoseRowsThatMatchVectorTypeSize
> {code}
>  
> The patch is aimed to generate files in the tmp directory to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19672) some unit tests should generate files in the tmp directory

2024-05-31 Thread Ling Mao (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ling Mao reassigned CASSANDRA-19672:


Assignee: Ling Mao

> some unit tests should generate files in the tmp directory
> --
>
> Key: CASSANDRA-19672
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19672
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Ling Mao
>Assignee: Ling Mao
>Priority: Normal
> Fix For: 5.1
>
>
> I run "{*}_ant test_{*}" to fire the whole test suit cases in my local 
> machine, and found some UTs had generated files in current directory, 
> otherwise the tmp directory.
>  
> {code:java}
> [root@vm-24-5-centos cassandra]# git status
> # audit/
> # compaction.log
> # 
> import_cql_test_keyspace_table_testcopyonlythoserowsthatmatchvectortyp_04.err
> {code}
>  
> These problematic UTs are
>  
> {code:java}
> ant testsome 
> -Dtest.name=org.apache.cassandra.service.StorageServiceServerTest 
> -Dtest.methods=testAuditLogEnableLoggerNotFound
> ant testsome 
> -Dtest.name=org.apache.cassandra.service.StorageServiceServerTest 
> -Dtest.methods=testAuditLogEnableLoggerTransitions
> ant testsome -Dtest.name=org.apache.cassandra.tools.CompactionStressTest 
> -Dtest.methods=testWriteAndCompact
> ant testsome -Dtest.name=org.apache.cassandra.tools.cqlsh.CqlshTest 
> -Dtest.methods=testCopyOnlyThoseRowsThatMatchVectorTypeSize
> {code}
>  
> The patch is aimed to generate files in the tmp directory to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] Limit calls to Conversions.resolveExecutionProfile [cassandra-java-driver]

2024-05-31 Thread via GitHub


lukasz-antoniak commented on PR #1623:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1623#issuecomment-2141940006

   I think you closed the wrong PR. This one is mostly fine :).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] Limit calls to Conversions.resolveExecutionProfile [cassandra-java-driver]

2024-05-31 Thread via GitHub


chibenwa commented on PR #1623:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1623#issuecomment-2141936438

   :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] Limit calls to Conversions.resolveExecutionProfile [cassandra-java-driver]

2024-05-31 Thread via GitHub


chibenwa closed pull request #1623: Limit calls to 
Conversions.resolveExecutionProfile
URL: https://github.com/apache/cassandra-java-driver/pull/1623


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] Limit calls to Conversions.resolveExecutionProfile [cassandra-java-driver]

2024-05-31 Thread via GitHub


lukasz-antoniak commented on PR #1623:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1623#issuecomment-2141774747

   Do you mind closing 
https://github.com/apache/cassandra-java-driver/pull/1622, as from what I see, 
this PR contains those changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19664) Accord Journal Determinism: PreAccept replay stability

2024-05-31 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850974#comment-17850974
 ] 

Alex Petrov edited comment on CASSANDRA-19664 at 5/31/24 9:39 AM:
--

[~aleksey] uploaded the latest CI run; there are some JDK17 failures that seem 
to be related to {{add-opens}}; three dtest failures are unrelated. 


was (Author: ifesdjeen):
[~aleksey] uploaded the latest CI run; there are some JDK17 failures that seem 
to be related to {add-opens}; three dtest failures are unrelated. 

> Accord Journal Determinism: PreAccept replay stability 
> ---
>
> Key: CASSANDRA-19664
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19664
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> Currently, some messages, such as PreAccept can have some of their context 
> initialized on replay. This patch adds a concept of Context to Journal that 
> can be used for arbitrary information necessary for replaying them just the 
> way they were executed the first time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19215) "Query start time" in native transport request threads should be the task enqueue time

2024-05-31 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-19215:

Status: Open  (was: Patch Available)

> "Query start time" in native transport request threads should be the task 
> enqueue time
> --
>
> Key: CASSANDRA-19215
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19215
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Runtian Liu
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> Recently, our Cassandra 4.0.6 cluster experienced an outage due to a surge in 
> expensive traffic from the application side. This surge involved a large 
> volume of costly read queries, which took a considerable amount of time to 
> process on the server side. The client had timeout settings; if a request 
> timed out, it might trigger the sending of new requests. Since the server 
> nodes were overloaded, numerous nodes had hundreds of thousands of tasks 
> queued in the Native-Transport-Request pending queue. I expected that once 
> the application ceased sending requests, the server node would quickly return 
> to normal, as most requests in the queue were over half an hour old and 
> should have timed out rapidly, clearing the queue. However, it actually took 
> an hour to clear the native transport's pending queue, even with native 
> transport disabled. Upon examining the code, I noticed that for read/write 
> requests, the 
> [queryStartNanoTime|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/transport/Dispatcher.java#L78],
>  which determines if a request has timed out, only begins when the task 
> starts processing. This means that no matter how long a request has been 
> pending, it doesn't contribute to the timeout. I believe this is incorrect. 
> The timer should start when the Cassandra server receives the request or when 
> it enqueues the task, not when the request/task begins processing. This way, 
> an overloaded node with many pending tasks can quickly discard timed-out 
> requests and recover from an outage once new requests stop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19215) "Query start time" in native transport request threads should be the task enqueue time

2024-05-31 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-19215:

Resolution: Fixed
Status: Resolved  (was: Open)

> "Query start time" in native transport request threads should be the task 
> enqueue time
> --
>
> Key: CASSANDRA-19215
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19215
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Runtian Liu
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> Recently, our Cassandra 4.0.6 cluster experienced an outage due to a surge in 
> expensive traffic from the application side. This surge involved a large 
> volume of costly read queries, which took a considerable amount of time to 
> process on the server side. The client had timeout settings; if a request 
> timed out, it might trigger the sending of new requests. Since the server 
> nodes were overloaded, numerous nodes had hundreds of thousands of tasks 
> queued in the Native-Transport-Request pending queue. I expected that once 
> the application ceased sending requests, the server node would quickly return 
> to normal, as most requests in the queue were over half an hour old and 
> should have timed out rapidly, clearing the queue. However, it actually took 
> an hour to clear the native transport's pending queue, even with native 
> transport disabled. Upon examining the code, I noticed that for read/write 
> requests, the 
> [queryStartNanoTime|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/transport/Dispatcher.java#L78],
>  which determines if a request has timed out, only begins when the task 
> starts processing. This means that no matter how long a request has been 
> pending, it doesn't contribute to the timeout. I believe this is incorrect. 
> The timer should start when the Cassandra server receives the request or when 
> it enqueues the task, not when the request/task begins processing. This way, 
> an overloaded node with many pending tasks can quickly discard timed-out 
> requests and recover from an outage once new requests stop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19215) "Query start time" in native transport request threads should be the task enqueue time

2024-05-31 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov reassigned CASSANDRA-19215:
---

Assignee: Alex Petrov

> "Query start time" in native transport request threads should be the task 
> enqueue time
> --
>
> Key: CASSANDRA-19215
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19215
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Runtian Liu
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> Recently, our Cassandra 4.0.6 cluster experienced an outage due to a surge in 
> expensive traffic from the application side. This surge involved a large 
> volume of costly read queries, which took a considerable amount of time to 
> process on the server side. The client had timeout settings; if a request 
> timed out, it might trigger the sending of new requests. Since the server 
> nodes were overloaded, numerous nodes had hundreds of thousands of tasks 
> queued in the Native-Transport-Request pending queue. I expected that once 
> the application ceased sending requests, the server node would quickly return 
> to normal, as most requests in the queue were over half an hour old and 
> should have timed out rapidly, clearing the queue. However, it actually took 
> an hour to clear the native transport's pending queue, even with native 
> transport disabled. Upon examining the code, I noticed that for read/write 
> requests, the 
> [queryStartNanoTime|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/transport/Dispatcher.java#L78],
>  which determines if a request has timed out, only begins when the task 
> starts processing. This means that no matter how long a request has been 
> pending, it doesn't contribute to the timeout. I believe this is incorrect. 
> The timer should start when the Cassandra server receives the request or when 
> it enqueues the task, not when the request/task begins processing. This way, 
> an overloaded node with many pending tasks can quickly discard timed-out 
> requests and recover from an outage once new requests stop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19215) "Query start time" in native transport request threads should be the task enqueue time

2024-05-31 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851018#comment-17851018
 ] 

Alex Petrov commented on CASSANDRA-19215:
-

This should be fixed by [CASSANDRA-19534].

> "Query start time" in native transport request threads should be the task 
> enqueue time
> --
>
> Key: CASSANDRA-19215
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19215
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Runtian Liu
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>
> Recently, our Cassandra 4.0.6 cluster experienced an outage due to a surge in 
> expensive traffic from the application side. This surge involved a large 
> volume of costly read queries, which took a considerable amount of time to 
> process on the server side. The client had timeout settings; if a request 
> timed out, it might trigger the sending of new requests. Since the server 
> nodes were overloaded, numerous nodes had hundreds of thousands of tasks 
> queued in the Native-Transport-Request pending queue. I expected that once 
> the application ceased sending requests, the server node would quickly return 
> to normal, as most requests in the queue were over half an hour old and 
> should have timed out rapidly, clearing the queue. However, it actually took 
> an hour to clear the native transport's pending queue, even with native 
> transport disabled. Upon examining the code, I noticed that for read/write 
> requests, the 
> [queryStartNanoTime|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/transport/Dispatcher.java#L78],
>  which determines if a request has timed out, only begins when the task 
> starts processing. This means that no matter how long a request has been 
> pending, it doesn't contribute to the timeout. I believe this is incorrect. 
> The timer should start when the Cassandra server receives the request or when 
> it enqueues the task, not when the request/task begins processing. This way, 
> an overloaded node with many pending tasks can quickly discard timed-out 
> requests and recover from an outage once new requests stop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19534) Unbounded queues in native transport requests lead to node instability

2024-05-31 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-19534:

Since Version: 3.0.0  (was: 4.1.5)

> Unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg, ci_summary-4.1.html, 
> ci_summary-5.0.html, ci_summary-trunk.html, ci_summary.html, 
> image-2024-05-03-16-08-10-101.png, screenshot-1.png, screenshot-2.png, 
> screenshot-3.png, screenshot-4.png, screenshot-5.png, screenshot-6.png, 
> screenshot-7.png, screenshot-8.png, screenshot-9.png
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19534) Unbounded queues in native transport requests lead to node instability

2024-05-31 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-19534:

  Since Version: 4.1.5
Source Control Link: 
https://github.com/apache/cassandra/commit/dc17c29724d86547538cc8116ff1a90d36a0bf3a
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed to 4.1 with 
[dc17c29724d86547538cc8116ff1a90d36a0bf3a|https://github.com/apache/cassandra/commit/dc17c29724d86547538cc8116ff1a90d36a0bf3a]
 and merged up to 
[5.0|https://github.com/apache/cassandra/commit/617a75843c9bfaf241249514f9604466f6c8ccab]
 and 
[trunk|https://github.com/apache/cassandra/commit/d10008d54bfb301ba12d022037b1caf78f18418b].

> Unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg, ci_summary-4.1.html, 
> ci_summary-5.0.html, ci_summary-trunk.html, ci_summary.html, 
> image-2024-05-03-16-08-10-101.png, screenshot-1.png, screenshot-2.png, 
> screenshot-3.png, screenshot-4.png, screenshot-5.png, screenshot-6.png, 
> screenshot-7.png, screenshot-8.png, screenshot-9.png
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch trunk updated (8ba2f9e8c0 -> d10008d54b)

2024-05-31 Thread ifesdjeen
This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


from 8ba2f9e8c0 Consolidate logging on trace level
 add 9ebe0aa08a Replace getStderr calls with getCleanedStderr calls in 
tests checking for emptiness
 add dc17c29724 Add native transport deadline, an ultimate deadline for all 
tasks related to a specific request
 add 617a75843c Merge branch 'cassandra-4.1' into cassandra-5.0
 add d10008d54b Merge branch 'cassandra-5.0' into trunk

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt|   2 +-
 .../cassandra/auth/CIDRGroupsMappingManager.java   |   5 +-
 .../cassandra/auth/CIDRPermissionsManager.java |   4 +-
 .../apache/cassandra/auth/CassandraAuthorizer.java |   7 +-
 .../cassandra/auth/CassandraNetworkAuthorizer.java |   4 +-
 .../cassandra/auth/CassandraRoleManager.java   |  27 +-
 .../cassandra/auth/PasswordAuthenticator.java  |   4 +-
 .../apache/cassandra/batchlog/BatchlogManager.java |   8 +-
 .../cassandra/concurrent/DebuggableTask.java   |   9 +
 .../apache/cassandra/concurrent/FutureTask.java|  28 ++
 .../cassandra/concurrent/ResizableThreadPool.java  |   5 +
 .../apache/cassandra/concurrent/SEPExecutor.java   |  15 +
 src/java/org/apache/cassandra/config/Config.java   |  20 ++
 .../cassandra/config/DatabaseDescriptor.java   | 103 ++
 .../org/apache/cassandra/cql3/CQLStatement.java|   5 +-
 .../cql3/CustomPayloadMirroringQueryHandler.java   |  16 +-
 .../org/apache/cassandra/cql3/QueryHandler.java|   7 +-
 .../org/apache/cassandra/cql3/QueryProcessor.java  |  42 +--
 .../apache/cassandra/cql3/UntypedResultSet.java|   5 +-
 .../cql3/statements/AuthenticationStatement.java   |   5 +-
 .../cql3/statements/AuthorizationStatement.java|   4 +-
 .../cassandra/cql3/statements/BatchStatement.java  |  34 +-
 .../cql3/statements/DescribeStatement.java |   3 +-
 .../cql3/statements/ModificationStatement.java |  50 +--
 .../cassandra/cql3/statements/SelectStatement.java |  30 +-
 .../cql3/statements/TruncateStatement.java |   4 +-
 .../cassandra/cql3/statements/UseStatement.java|   7 +-
 .../statements/schema/AlterSchemaStatement.java|   4 +-
 .../cassandra/db/CounterMutationVerbHandler.java   |   6 +-
 .../apache/cassandra/db/MutationVerbHandler.java   |   9 +
 .../cassandra/db/PartitionRangeReadCommand.java|   7 +-
 src/java/org/apache/cassandra/db/ReadCommand.java  |  20 +-
 src/java/org/apache/cassandra/db/ReadQuery.java|   6 +-
 .../cassandra/db/SinglePartitionReadCommand.java   |  17 +-
 .../org/apache/cassandra/db/view/TableViews.java   |   5 +-
 .../apache/cassandra/db/view/ViewBuilderTask.java  |   4 +-
 .../db/virtual/CIDRFilteringMetricsTable.java  |   4 +-
 .../locator/AbstractReplicationStrategy.java   |  15 +-
 .../apache/cassandra/metrics/ClientMetrics.java|  32 +-
 .../cassandra/metrics/ThreadPoolMetrics.java   |   7 +
 .../cassandra/net/InboundMessageHandler.java   |  12 +-
 src/java/org/apache/cassandra/net/Message.java |  45 ++-
 .../apache/cassandra/repair/RepairCoordinator.java |   3 +-
 .../service/AbstractWriteResponseHandler.java  |  43 ++-
 .../cassandra/service/BatchlogResponseHandler.java |   5 +-
 .../apache/cassandra/service/CassandraDaemon.java  |  16 +-
 .../DatacenterSyncWriteResponseHandler.java|   5 +-
 .../service/DatacenterWriteResponseHandler.java|   5 +-
 .../cassandra/service/NativeTransportService.java  |   7 +-
 .../org/apache/cassandra/service/StorageProxy.java | 346 +++--
 .../apache/cassandra/service/StorageService.java   |  71 -
 .../cassandra/service/StorageServiceMBean.java |  19 +-
 .../cassandra/service/WriteResponseHandler.java|   9 +-
 .../service/pager/AbstractQueryPager.java  |   5 +-
 .../service/pager/AggregationQueryPager.java   |  41 ++-
 .../service/pager/MultiPartitionPager.java |  19 +-
 .../apache/cassandra/service/pager/QueryPager.java |   5 +-
 .../org/apache/cassandra/service/paxos/Paxos.java  |  18 +-
 .../service/paxos/v1/AbstractPaxosCallback.java|  16 +-
 .../service/paxos/v1/PrepareCallback.java  |   5 +-
 .../service/paxos/v1/ProposeCallback.java  |   5 +-
 .../service/reads/AbstractReadExecutor.java|  70 +++--
 .../cassandra/service/reads/DataResolver.java  |  13 +-
 .../cassandra/service/reads/DigestResolver.java|   7 +-
 .../cassandra/service/reads/ReadCallback.java  |  42 ++-
 .../service/reads/ReplicaFilteringProtection.java  |  15 +-
 .../cassandra/service/reads/ResponseResolver.java  |   7 +-
 .../reads/ShortReadPartitionsProtection.java   |  15 +-
 .../service/reads/ShortReadProtection.java |   5 +-
 .../service/reads/range/RangeCommandIterator.java  |  26 +-
 .../service/reads/range/RangeCommands.java

(cassandra) branch cassandra-4.1 updated (aa20c9ab11 -> dc17c29724)

2024-05-31 Thread ifesdjeen
This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a change to branch cassandra-4.1
in repository https://gitbox.apache.org/repos/asf/cassandra.git


from aa20c9ab11 Merge branch 'cassandra-4.0' into cassandra-4.1
 add 9ebe0aa08a Replace getStderr calls with getCleanedStderr calls in 
tests checking for emptiness
 add dc17c29724 Add native transport deadline, an ultimate deadline for all 
tasks related to a specific request

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt|   2 +-
 .../apache/cassandra/auth/CassandraAuthorizer.java |   8 +-
 .../cassandra/auth/CassandraNetworkAuthorizer.java |   5 +-
 .../cassandra/auth/CassandraRoleManager.java   |   5 +-
 .../cassandra/auth/PasswordAuthenticator.java  |   5 +-
 .../apache/cassandra/batchlog/BatchlogManager.java |   8 +-
 .../cassandra/concurrent/DebuggableTask.java   |   9 +
 .../apache/cassandra/concurrent/FutureTask.java|  28 ++
 .../cassandra/concurrent/ResizableThreadPool.java  |   5 +
 .../apache/cassandra/concurrent/SEPExecutor.java   |  15 +
 src/java/org/apache/cassandra/config/Config.java   |  16 +
 .../cassandra/config/DatabaseDescriptor.java   |  96 +-
 .../org/apache/cassandra/cql3/CQLStatement.java|   5 +-
 .../cql3/CustomPayloadMirroringQueryHandler.java   |  16 +-
 .../org/apache/cassandra/cql3/QueryHandler.java|   7 +-
 .../org/apache/cassandra/cql3/QueryProcessor.java  |  42 +--
 .../apache/cassandra/cql3/UntypedResultSet.java|   5 +-
 .../cql3/statements/AuthenticationStatement.java   |   5 +-
 .../cql3/statements/AuthorizationStatement.java|   4 +-
 .../cassandra/cql3/statements/BatchStatement.java  |  34 +-
 .../cql3/statements/DescribeStatement.java |   3 +-
 .../cql3/statements/ModificationStatement.java |  50 +--
 .../cassandra/cql3/statements/SelectStatement.java |  30 +-
 .../cql3/statements/TruncateStatement.java |   4 +-
 .../cassandra/cql3/statements/UseStatement.java|   7 +-
 .../statements/schema/AlterSchemaStatement.java|   4 +-
 .../cassandra/db/CounterMutationVerbHandler.java   |   6 +-
 .../apache/cassandra/db/MutationVerbHandler.java   |   9 +
 .../cassandra/db/PartitionRangeReadCommand.java|   7 +-
 src/java/org/apache/cassandra/db/ReadCommand.java  |  20 +-
 src/java/org/apache/cassandra/db/ReadQuery.java|   5 +-
 .../cassandra/db/SinglePartitionReadCommand.java   |  17 +-
 .../org/apache/cassandra/db/view/TableViews.java   |   5 +-
 .../apache/cassandra/db/view/ViewBuilderTask.java  |   4 +-
 .../locator/AbstractReplicationStrategy.java   |  15 +-
 .../apache/cassandra/metrics/ClientMetrics.java|  45 ++-
 .../cassandra/metrics/ThreadPoolMetrics.java   |   7 +
 .../cassandra/net/InboundMessageHandler.java   |  12 +-
 src/java/org/apache/cassandra/net/Message.java |  31 +-
 .../apache/cassandra/repair/RepairRunnable.java|   4 +-
 .../service/AbstractWriteResponseHandler.java  |  20 +-
 .../cassandra/service/BatchlogResponseHandler.java |   5 +-
 .../DatacenterSyncWriteResponseHandler.java|   5 +-
 .../service/DatacenterWriteResponseHandler.java|   5 +-
 .../org/apache/cassandra/service/StorageProxy.java | 343 +++--
 .../apache/cassandra/service/StorageService.java   |  79 +
 .../cassandra/service/StorageServiceMBean.java |  19 ++
 .../cassandra/service/WriteResponseHandler.java|   9 +-
 .../service/pager/AbstractQueryPager.java  |   5 +-
 .../service/pager/AggregationQueryPager.java   |  41 ++-
 .../service/pager/MultiPartitionPager.java |  18 +-
 .../apache/cassandra/service/pager/QueryPager.java |   5 +-
 .../org/apache/cassandra/service/paxos/Paxos.java  |  18 +-
 .../service/paxos/v1/AbstractPaxosCallback.java|  13 +-
 .../service/paxos/v1/PrepareCallback.java  |   5 +-
 .../service/paxos/v1/ProposeCallback.java  |   5 +-
 .../service/reads/AbstractReadExecutor.java|  71 +++--
 .../cassandra/service/reads/DataResolver.java  |  13 +-
 .../cassandra/service/reads/DigestResolver.java|   7 +-
 .../cassandra/service/reads/ReadCallback.java  |  31 +-
 .../service/reads/ReplicaFilteringProtection.java  |  15 +-
 .../cassandra/service/reads/ResponseResolver.java  |   7 +-
 .../reads/ShortReadPartitionsProtection.java   |  15 +-
 .../service/reads/ShortReadProtection.java |   5 +-
 .../service/reads/range/RangeCommandIterator.java  |  24 +-
 .../service/reads/range/RangeCommands.java |   9 +-
 .../service/reads/repair/AbstractReadRepair.java   |  20 +-
 .../service/reads/repair/BlockingReadRepair.java   |   9 +-
 .../service/reads/repair/ReadOnlyReadRepair.java   |   5 +-
 .../cassandra/service/reads/repair/ReadRepair.java |   7 +-
 .../service/reads/repair/ReadRepairStrategy.java   |   9 +-
 .../apache/cassandra/tracing/TraceStateImpl.java   |   4 +-
 .../cassandra/transport/CQLMessageHand

[jira] [Updated] (CASSANDRA-19534) Unbounded queues in native transport requests lead to node instability

2024-05-31 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-19534:

Status: Ready to Commit  (was: Review In Progress)

> Unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg, ci_summary-4.1.html, 
> ci_summary-5.0.html, ci_summary-trunk.html, ci_summary.html, 
> image-2024-05-03-16-08-10-101.png, screenshot-1.png, screenshot-2.png, 
> screenshot-3.png, screenshot-4.png, screenshot-5.png, screenshot-6.png, 
> screenshot-7.png, screenshot-8.png, screenshot-9.png
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cassandra-5.0 updated (fbfa77e70f -> 617a75843c)

2024-05-31 Thread ifesdjeen
This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a change to branch cassandra-5.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git


from fbfa77e70f Merge branch 'cassandra-4.1' into cassandra-5.0
 add 9ebe0aa08a Replace getStderr calls with getCleanedStderr calls in 
tests checking for emptiness
 add dc17c29724 Add native transport deadline, an ultimate deadline for all 
tasks related to a specific request
 add 617a75843c Merge branch 'cassandra-4.1' into cassandra-5.0

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt|   2 +-
 .../cassandra/auth/CIDRGroupsMappingManager.java   |   5 +-
 .../cassandra/auth/CIDRPermissionsManager.java |   4 +-
 .../apache/cassandra/auth/CassandraAuthorizer.java |   7 +-
 .../cassandra/auth/CassandraNetworkAuthorizer.java |   4 +-
 .../cassandra/auth/CassandraRoleManager.java   |   4 +-
 .../cassandra/auth/PasswordAuthenticator.java  |   4 +-
 .../apache/cassandra/batchlog/BatchlogManager.java |   8 +-
 .../cassandra/concurrent/DebuggableTask.java   |   9 +
 .../apache/cassandra/concurrent/FutureTask.java|  28 ++
 .../cassandra/concurrent/ResizableThreadPool.java  |   5 +
 .../apache/cassandra/concurrent/SEPExecutor.java   |  15 +
 src/java/org/apache/cassandra/config/Config.java   |  20 ++
 .../cassandra/config/DatabaseDescriptor.java   | 103 ++
 .../org/apache/cassandra/cql3/CQLStatement.java|   5 +-
 .../cql3/CustomPayloadMirroringQueryHandler.java   |  16 +-
 .../org/apache/cassandra/cql3/QueryHandler.java|   7 +-
 .../org/apache/cassandra/cql3/QueryProcessor.java  |  42 +--
 .../apache/cassandra/cql3/UntypedResultSet.java|   5 +-
 .../cql3/statements/AuthenticationStatement.java   |   5 +-
 .../cql3/statements/AuthorizationStatement.java|   4 +-
 .../cassandra/cql3/statements/BatchStatement.java  |  34 +-
 .../cql3/statements/DescribeStatement.java |   3 +-
 .../cql3/statements/ModificationStatement.java |  50 +--
 .../cassandra/cql3/statements/SelectStatement.java |  30 +-
 .../cql3/statements/TruncateStatement.java |   4 +-
 .../cassandra/cql3/statements/UseStatement.java|   7 +-
 .../statements/schema/AlterSchemaStatement.java|   4 +-
 .../cassandra/db/CounterMutationVerbHandler.java   |   6 +-
 .../apache/cassandra/db/MutationVerbHandler.java   |   9 +
 .../cassandra/db/PartitionRangeReadCommand.java|   7 +-
 src/java/org/apache/cassandra/db/ReadCommand.java  |  20 +-
 src/java/org/apache/cassandra/db/ReadQuery.java|   6 +-
 .../cassandra/db/SinglePartitionReadCommand.java   |  17 +-
 .../org/apache/cassandra/db/view/TableViews.java   |   5 +-
 .../apache/cassandra/db/view/ViewBuilderTask.java  |   4 +-
 .../db/virtual/CIDRFilteringMetricsTable.java  |   4 +-
 .../locator/AbstractReplicationStrategy.java   |  15 +-
 .../apache/cassandra/metrics/ClientMetrics.java|  44 ++-
 .../cassandra/metrics/ThreadPoolMetrics.java   |   7 +
 .../cassandra/net/InboundMessageHandler.java   |  12 +-
 src/java/org/apache/cassandra/net/Message.java |  31 +-
 .../apache/cassandra/repair/RepairCoordinator.java |   3 +-
 .../service/AbstractWriteResponseHandler.java  |  41 ++-
 .../cassandra/service/BatchlogResponseHandler.java |   5 +-
 .../apache/cassandra/service/CassandraDaemon.java  |  16 +-
 .../DatacenterSyncWriteResponseHandler.java|   5 +-
 .../service/DatacenterWriteResponseHandler.java|   5 +-
 .../cassandra/service/NativeTransportService.java  |   7 +-
 .../org/apache/cassandra/service/StorageProxy.java | 348 +++--
 .../apache/cassandra/service/StorageService.java   |  71 -
 .../cassandra/service/StorageServiceMBean.java |  21 +-
 .../cassandra/service/WriteResponseHandler.java|   9 +-
 .../service/pager/AbstractQueryPager.java  |   5 +-
 .../service/pager/AggregationQueryPager.java   |  41 ++-
 .../service/pager/MultiPartitionPager.java |  19 +-
 .../apache/cassandra/service/pager/QueryPager.java |   5 +-
 .../org/apache/cassandra/service/paxos/Paxos.java  |  18 +-
 .../service/paxos/v1/AbstractPaxosCallback.java|  16 +-
 .../service/paxos/v1/PrepareCallback.java  |   5 +-
 .../service/paxos/v1/ProposeCallback.java  |   5 +-
 .../service/reads/AbstractReadExecutor.java|  73 +++--
 .../cassandra/service/reads/DataResolver.java  |  13 +-
 .../cassandra/service/reads/DigestResolver.java|   7 +-
 .../cassandra/service/reads/ReadCallback.java  |  40 ++-
 .../service/reads/ReplicaFilteringProtection.java  |  15 +-
 .../cassandra/service/reads/ResponseResolver.java  |   7 +-
 .../reads/ShortReadPartitionsProtection.java   |  15 +-
 .../service/reads/ShortReadProtection.java |   5 +-
 .../service/reads/range/RangeCommandIterator.java  |  26 +-
 .../service/reads/range/RangeCommands.java |  11 +-
 .../reads/range/ScanAllRange

[jira] [Updated] (CASSANDRA-19664) Accord Journal Determinism: PreAccept replay stability

2024-05-31 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-19664:

Attachment: ci_summary-1.html

> Accord Journal Determinism: PreAccept replay stability 
> ---
>
> Key: CASSANDRA-19664
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19664
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> Currently, some messages, such as PreAccept can have some of their context 
> initialized on replay. This patch adds a concept of Context to Journal that 
> can be used for arbitrary information necessary for replaying them just the 
> way they were executed the first time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19664) Accord Journal Determinism: PreAccept replay stability

2024-05-31 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850974#comment-17850974
 ] 

Alex Petrov commented on CASSANDRA-19664:
-

[~aleksey] uploaded the latest CI run; there are some JDK17 failures that seem 
to be related to {add-opens}; three dtest failures are unrelated. 

> Accord Journal Determinism: PreAccept replay stability 
> ---
>
> Key: CASSANDRA-19664
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19664
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> Currently, some messages, such as PreAccept can have some of their context 
> initialized on replay. This patch adds a concept of Context to Journal that 
> can be used for arbitrary information necessary for replaying them just the 
> way they were executed the first time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19534) Unbounded queues in native transport requests lead to node instability

2024-05-31 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-19534:

Summary: Unbounded queues in native transport requests lead to node 
instability  (was: unbounded queues in native transport requests lead to node 
instability)

> Unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg, ci_summary-4.1.html, 
> ci_summary-5.0.html, ci_summary-trunk.html, ci_summary.html, 
> image-2024-05-03-16-08-10-101.png, screenshot-1.png, screenshot-2.png, 
> screenshot-3.png, screenshot-4.png, screenshot-5.png, screenshot-6.png, 
> screenshot-7.png, screenshot-8.png, screenshot-9.png
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org