[jira] [Comment Edited] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-05-13 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845930#comment-17845930
 ] 

Alex Petrov edited comment on CASSANDRA-19534 at 5/13/24 1:47 PM:
--

[~maedhroz] thank you for the review! 
Pushed a new commit that should address your comments.


was (Author: ifesdjeen):
Pushed a new commit that should address your comments [~maedhroz]

> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg, ci_summary.html, 
> image-2024-05-03-16-08-10-101.png, screenshot-1.png, screenshot-2.png, 
> screenshot-3.png, screenshot-4.png, screenshot-5.png, screenshot-6.png, 
> screenshot-7.png, screenshot-8.png, screenshot-9.png
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-05-03 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843377#comment-17843377
 ] 

Jon Haddad edited comment on CASSANDRA-19534 at 5/3/24 10:51 PM:
-

Switched the entire cluster to use the patch, started the workloads back up 
again.  I've also added this to the mix to be particularly brutal:

{noformat}
easy-cass-stress run KeyValue -d 2h --rate 25k -r 0
{noformat}

 !screenshot-6.png! 

Within seconds the NTR queue is empty and the cluster has fully recovered.

Switching back to 5.0-HEAD, I've restarting the test.

After just a few minutes the queue looks like this:

 !screenshot-7.png! 

Took about 20 seconds to recover after stopping the test.



was (Author: rustyrazorblade):
Switched the entire cluster to use the patch, started the workloads back up 
again.  I've also added this to the mix to be particularly brutal:

{noformat}
easy-cass-stress run KeyValue -d 2h --rate 25k -r 0
{noformat}

 !screenshot-6.png! 

Within seconds the NTR queue is empty and the cluster has fully recovered.

Switching back to 5.0-HEAD, I've restarting the test.

After just a few minutes the queue looks like this:

 !screenshot-7.png! 

Took about 20 seconds to recover.


> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg, ci_summary.html, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png, 
> screenshot-5.png, screenshot-6.png, screenshot-7.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-05-03 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843376#comment-17843376
 ] 

Jon Haddad edited comment on CASSANDRA-19534 at 5/3/24 10:37 PM:
-

I'm running these two workloads concurrently:

{noformat}
easy-cass-stress run RandomPartitionAccess --rate 50k -r .5 -d 10h  
--workload.rows=10 --workload.select=partition
easy-cass-stress run KeyValue --rate 50k -r .5 -d 24h -p 10m --populate 100m
{noformat}

In this screenshot, the top node is running the branch, the other two are 
running 5.0-HEAD.

The first node has more completed native transport requests with a 
significantly less backed up queue:

 !screenshot-1.png! 

The cluster has reached a point where it's failing a ton, so I've stopped the 
workload to see how fast it recovers.  The cassandra0 node with the branch 
recovered almost immediately.  The other nodes took approximately 10 seconds.

I restarted the above two workloads and added a third:

{noformat}
easy-cass-stress run KeyValue --keyspace test1 
--field.keyvalue.value='random(1024,2048)' -p 1m -r .5 --populate 1m
{noformat}

The mixed nature of expensive and cheap reads is an easy way to create a deep 
queue for NTR.  It wasn't long before I got to this:

 !screenshot-2.png! 

It looks like load is being shed much faster off cassandra0:

 !screenshot-3.png! 

Within 10 seconds the first node has fully recovered, it took about 10 
additional for the other two nodes to recover as well.

 !screenshot-4.png! 

I've rerun this several times now and am finding [~ifesdjeen]'s patched version 
to recover quicker and than the other two.  The boxes are all running at 99+% 
CPU, and cassandra0 each time continues to get more completed requests as well 
as maintain a more shallow queue and recover first.

 !screenshot-5.png! 

Starting my test with all 3 nodes running the patch.






was (Author: rustyrazorblade):
I'm running these two workloads concurrently:

{noformat}
easy-cass-stress run RandomPartitionAccess --rate 50k -r .5 -d 10h  
--workload.rows=10 --workload.select=partition
easy-cass-stress run KeyValue --rate 50k -r .5 -d 24h -p 10m --populate 100m
{noformat}

In this screenshot, the top node is running the branch, the other two are 
running 5.0-HEAD.

The first node has more completed native transport requests with a 
significantly less backed up queue:

 !screenshot-1.png! 

The cluster has reached a point where it's failing a ton, so I've stopped the 
workload to see how fast it recovers.  The cassandra0 node with the branch 
recovered almost immediately.  The other nodes took approximately 10 seconds.

I restarted the above two workloads and added a third:

{noformat}
easy-cass-stress run KeyValue --keyspace test1 
--field.keyvalue.value='random(1024,2048)' -p 1m -r .5 --populate 1m
{noformat}

The mixed nature of expensive and cheap reads is an easy way to create a deep 
queue for NTR.  It wasn't long before I got to this:

 !screenshot-2.png! 

It looks like load is being shed much faster off cassandra0:

 !screenshot-3.png! 

Within 10 seconds the first node has fully recovered, it took about 10 
additional for the other two nodes to recover as well.

 !screenshot-4.png! 

I've rerun this several times now and am finding [~ifesdjeen]'s patched version 
to recover quicker and have.  The boxes are all running at 99+% CPU, and 
cassandra0 each time continues to get more completed requests as well as 
maintain a more shallow queue and recover first.

 !screenshot-5.png! 

Starting my test with all 3 nodes running the patch.





> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg, ci_summary.html, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png, 
> screenshot-5.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node

[jira] [Comment Edited] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-04-29 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842112#comment-17842112
 ] 

Alex Petrov edited comment on CASSANDRA-19534 at 4/29/24 5:24 PM:
--

[~brandon.williams] [~rustyrazorblade] would you be so kind to try running your 
tests against the branch posted above? I suggest setting  
{{native_transport_timeout_in_ms}} to about 10 (or 12 max) seconds, and 
{{internode_timeout}} to {{true}} for starters. If you really want to push the 
limits, I'd suggest setting {{cql_start_time}} to {{REQUEST}}, but this is 
optional, as we will not roll it out with this setting enabled.


was (Author: ifesdjeen):
[~brandon.williams] [~rustyrazorblade] would you be so kind to try running your 
tests? I suggest setting  {{native_transport_timeout_in_ms}} to about 10 (or 12 
max) seconds, and {{internode_timeout}} to {{true}} for starters. If you really 
want to push the limits, I'd suggest setting {{cql_start_time}} to {{REQUEST}}, 
but this is optional, as we will not roll it out with this setting enabled.

> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg, ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-04-24 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840173#comment-17840173
 ] 

Alex Petrov edited comment on CASSANDRA-19534 at 4/24/24 7:17 AM:
--

Sorry for the lack of clarity; before this patch, there was no deadline at all. 
Tasks will live in the system essentially forever clogging queues doing busy 
work. I was intending to post a patch but it is currently in my CI queue; 
however otherwise ready to go. 

I believe with 12 seconds default, users will only see an improvement and there 
will be no learning curve at all. All configuration options are for the people 
who understand their request lifetimes and want to get an even better profile. 


was (Author: ifesdjeen):
Sorry for the lack of clarity; today there’s no deadline at all. Tasks will 
live in the system essentially forever clogging queues doing busy work. I was 
intending to post a patch but it is currently in my CI queue; however otherwise 
ready to go.



i believe with 12 seconds default, users will only see an improvement and there 
will be no learning curve at all. All configurable are for the people who 
understand their request lifetimes and want to get an even better profile.

 

 

> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg
>
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-04-23 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840171#comment-17840171
 ] 

Brandon Williams edited comment on CASSANDRA-19534 at 4/23/24 5:53 PM:
---

I think this all sounds good, though there may be a bit of a learning curve for 
users. Native request deadline is easy enough to understand, but things get a 
bit nuanced past that.

Regarding native_transport_timeout_in_ms:
bq. Default is 100 seconds, which is unreasonably high, but not unbounded. In 
practice, we should use at most 12 seconds.

Do you mean this currently exists at 100? If not, what is the rationale for 
that default?


was (Author: brandon.williams):
I think this all sounds good, though there may be a bit of a learning curve for 
users. Native request deadline is easy enough to understand, but things get a 
bit nuanced past that.

bq. Default is 100 seconds, which is unreasonably high, but not unbounded. In 
practice, we should use at most 12 seconds.

Do you mean this currently exists at 100? If not, what is the rationale for 
that default?

> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg
>
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-04-06 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834560#comment-17834560
 ] 

Alex Petrov edited comment on CASSANDRA-19534 at 4/6/24 8:25 PM:
-

I believe I know the root cause of this and have a nuanced solution. I believe 
replica side read and write queues are also overflowing with pending requests 
in this case.  I will do my best to post the patch early next week, was 
intending to do so but other things got in the way.


was (Author: ifesdjeen):
I believe I know the root cause of this. I will do my best to post the patch 
early next week, was intending to do so but other things got in the way.

> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org