[jira] [Commented] (CASSANDRA-16364) Joining nodes simultaneously with auto_bootstrap:false can cause token collision

2024-05-17 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847465#comment-17847465
 ] 

Jon Haddad commented on CASSANDRA-16364:


Let me add some additional information.  This is partially based on what I've 
learned to fix the problem and partially from the accounts of others.

It appears that a token collision happened to a cluster _without_ using 
auto_bootstrap: false.  Two nodes existed in the ring owning conflicting 
tokens.  It appears that the cluster was running for months with a split brain, 
causing writes and reads to go to different sets of nodes depending the 
coordinator.  The operator is fairly certain they waited for several minutes 
between adding nodes but admits it's possible that a bug in the automation 
resulted in them joining close to the same time.  During a two month time 
period, some data was deleted, and the tombstones got GC'ed, and eventually 
read repair caused the original data to be resurrected.  

This is a pretty serious flaw in the design of deterministic token allocation.  
It's unsafe by design.  Adding jitter to the tokens by default will prevent 
data loss.  We can make a change to behavior in an existing release if it 
addresses a fundamental flaw in the design, especially when that flaw puts a 
cluster in a wildly unpredictable state.

> Joining nodes simultaneously with auto_bootstrap:false can cause token 
> collision
> 
>
> Key: CASSANDRA-16364
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16364
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Paulo Motta
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same 
> tokens using the default {{allocate_tokens_for_local_rf}}. However they both 
> succeeded bootstrap with colliding tokens.
> We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079, 
> and the workaround to fix this is to avoid parallel bootstrap when using 
> {{allocate_tokens_for_local_rf}}.
> However, since this is the default behavior, we should try to detect and 
> prevent this situation when possible, since it can break users relying on 
> parallel bootstrap behavior.
> I think we could prevent this as following:
> 1. announce intent to bootstrap via gossip (ie. add node on gossip without 
> token information)
> 2. wait for gossip to settle for a longer period (ie. ring delay)
> 3. allocate tokens (if multiple bootstrap attempts are detected, tie break 
> via node-id)
> 4. broadcast tokens and move on with bootstrap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16364) Joining nodes simultaneously with auto_bootstrap:false can cause token collision

2024-05-17 Thread Jeremiah Jordan (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847464#comment-17847464
 ] 

Jeremiah Jordan commented on CASSANDRA-16364:
-

The deterministic nature is a feature that I know of many people relying on for 
the token allocation. It lets you setup duplicate clusters easily to restore 
backups to among other things.

any change to that should be behind a flag, and the default should only change 
in a new major.

> Joining nodes simultaneously with auto_bootstrap:false can cause token 
> collision
> 
>
> Key: CASSANDRA-16364
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16364
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Paulo Motta
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same 
> tokens using the default {{allocate_tokens_for_local_rf}}. However they both 
> succeeded bootstrap with colliding tokens.
> We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079, 
> and the workaround to fix this is to avoid parallel bootstrap when using 
> {{allocate_tokens_for_local_rf}}.
> However, since this is the default behavior, we should try to detect and 
> prevent this situation when possible, since it can break users relying on 
> parallel bootstrap behavior.
> I think we could prevent this as following:
> 1. announce intent to bootstrap via gossip (ie. add node on gossip without 
> token information)
> 2. wait for gossip to settle for a longer period (ie. ring delay)
> 3. allocate tokens (if multiple bootstrap attempts are detected, tie break 
> via node-id)
> 4. broadcast tokens and move on with bootstrap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19644) deterministic token allocation combined with slow gossip propogation can lead to data loss

2024-05-17 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19644:
---
Resolution: (was: Fixed)
Status: Open  (was: Resolved)

> deterministic token allocation combined with slow gossip propogation can lead 
> to data loss
> --
>
> Key: CASSANDRA-19644
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19644
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jon Haddad
>Priority: Normal
>
> I've seen several cases now where starting nodes within a somewhat short time 
> window (about a minute) when using the default allocation tokens for RF leads 
> to token conflicts.  Unfortunately this can easily go undetected with medium 
> to large clusters.
> When this happens, different nodes in the cluster will have different 
> understandings of the topology of the cluster.  I've seen this go unnoticed 
> in a production environment for several months, leading to data loss, data 
> resurrection, and other odd behavior.
> We should apply some randomness to the tokens to ensure that even in the case 
> of 1 nodes starting at once, it's still unlikely that they will ever have a 
> conflict.  Applying a random() value to the token value between - 2^8 and 2^8 
> makes this statistically very, very unlikely that we'll ever have a collision 
> while also preserving the balance of token distribution in the ring.  In the 
> case of 2 nodes starting at the same time, the operator will have weird token 
> distribution instead of data loss.
>  
> {noformat}
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> -1938510198161598815. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> -3478858378222500629. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> 3562748272064835315. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> 8085185010613503278. /10.0.2.134:7000 is the new owner{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19644) deterministic token allocation combined with slow gossip propogation can lead to data loss

2024-05-17 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19644:
---
Resolution: Duplicate
Status: Resolved  (was: Open)

> deterministic token allocation combined with slow gossip propogation can lead 
> to data loss
> --
>
> Key: CASSANDRA-19644
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19644
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jon Haddad
>Priority: Normal
>
> I've seen several cases now where starting nodes within a somewhat short time 
> window (about a minute) when using the default allocation tokens for RF leads 
> to token conflicts.  Unfortunately this can easily go undetected with medium 
> to large clusters.
> When this happens, different nodes in the cluster will have different 
> understandings of the topology of the cluster.  I've seen this go unnoticed 
> in a production environment for several months, leading to data loss, data 
> resurrection, and other odd behavior.
> We should apply some randomness to the tokens to ensure that even in the case 
> of 1 nodes starting at once, it's still unlikely that they will ever have a 
> conflict.  Applying a random() value to the token value between - 2^8 and 2^8 
> makes this statistically very, very unlikely that we'll ever have a collision 
> while also preserving the balance of token distribution in the ring.  In the 
> case of 2 nodes starting at the same time, the operator will have weird token 
> distribution instead of data loss.
>  
> {noformat}
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> -1938510198161598815. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> -3478858378222500629. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> 3562748272064835315. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> 8085185010613503278. /10.0.2.134:7000 is the new owner{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16364) Joining nodes simultaneously with auto_bootstrap:false can cause token collision

2024-05-17 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847460#comment-17847460
 ] 

Jon Haddad commented on CASSANDRA-16364:


I think we should apply some randomness to the tokens to ensure that even in 
the case of 1 nodes starting at once, it's still unlikely that they will ever 
have a conflict.  Applying a random() value to the token value between - 2^8 
and 2^8 makes this statistically very, very unlikely that we'll ever have a 
collision while also preserving the balance of token distribution in the ring.  
In the case of 2 nodes starting at the same time, the operator will have weird 
token distribution instead of data loss.

> Joining nodes simultaneously with auto_bootstrap:false can cause token 
> collision
> 
>
> Key: CASSANDRA-16364
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16364
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Paulo Motta
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same 
> tokens using the default {{allocate_tokens_for_local_rf}}. However they both 
> succeeded bootstrap with colliding tokens.
> We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079, 
> and the workaround to fix this is to avoid parallel bootstrap when using 
> {{allocate_tokens_for_local_rf}}.
> However, since this is the default behavior, we should try to detect and 
> prevent this situation when possible, since it can break users relying on 
> parallel bootstrap behavior.
> I think we could prevent this as following:
> 1. announce intent to bootstrap via gossip (ie. add node on gossip without 
> token information)
> 2. wait for gossip to settle for a longer period (ie. ring delay)
> 3. allocate tokens (if multiple bootstrap attempts are detected, tie break 
> via node-id)
> 4. broadcast tokens and move on with bootstrap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16364) Joining nodes simultaneously with auto_bootstrap:false can cause token collision

2024-05-17 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-16364:
-
Fix Version/s: 4.1.x
   5.0.x
   5.x

> Joining nodes simultaneously with auto_bootstrap:false can cause token 
> collision
> 
>
> Key: CASSANDRA-16364
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16364
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Paulo Motta
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same 
> tokens using the default {{allocate_tokens_for_local_rf}}. However they both 
> succeeded bootstrap with colliding tokens.
> We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079, 
> and the workaround to fix this is to avoid parallel bootstrap when using 
> {{allocate_tokens_for_local_rf}}.
> However, since this is the default behavior, we should try to detect and 
> prevent this situation when possible, since it can break users relying on 
> parallel bootstrap behavior.
> I think we could prevent this as following:
> 1. announce intent to bootstrap via gossip (ie. add node on gossip without 
> token information)
> 2. wait for gossip to settle for a longer period (ie. ring delay)
> 3. allocate tokens (if multiple bootstrap attempts are detected, tie break 
> via node-id)
> 4. broadcast tokens and move on with bootstrap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19644) deterministic token allocation combined with slow gossip propogation can lead to data loss

2024-05-17 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19644:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> deterministic token allocation combined with slow gossip propogation can lead 
> to data loss
> --
>
> Key: CASSANDRA-19644
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19644
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jon Haddad
>Priority: Normal
>
> I've seen several cases now where starting nodes within a somewhat short time 
> window (about a minute) when using the default allocation tokens for RF leads 
> to token conflicts.  Unfortunately this can easily go undetected with medium 
> to large clusters.
> When this happens, different nodes in the cluster will have different 
> understandings of the topology of the cluster.  I've seen this go unnoticed 
> in a production environment for several months, leading to data loss, data 
> resurrection, and other odd behavior.
> We should apply some randomness to the tokens to ensure that even in the case 
> of 1 nodes starting at once, it's still unlikely that they will ever have a 
> conflict.  Applying a random() value to the token value between - 2^8 and 2^8 
> makes this statistically very, very unlikely that we'll ever have a collision 
> while also preserving the balance of token distribution in the ring.  In the 
> case of 2 nodes starting at the same time, the operator will have weird token 
> distribution instead of data loss.
>  
> {noformat}
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> -1938510198161598815. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> -3478858378222500629. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> 3562748272064835315. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> 8085185010613503278. /10.0.2.134:7000 is the new owner{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19644) deterministic token allocation combined with slow gossip propogation can lead to data loss

2024-05-17 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847459#comment-17847459
 ] 

Jon Haddad commented on CASSANDRA-19644:


Ah.  I didn't see CASSANDRA-16364.  My preferred solution is different than 
what's in there, I'll drop my comment on that one and close this out.

> deterministic token allocation combined with slow gossip propogation can lead 
> to data loss
> --
>
> Key: CASSANDRA-19644
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19644
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jon Haddad
>Priority: Normal
>
> I've seen several cases now where starting nodes within a somewhat short time 
> window (about a minute) when using the default allocation tokens for RF leads 
> to token conflicts.  Unfortunately this can easily go undetected with medium 
> to large clusters.
> When this happens, different nodes in the cluster will have different 
> understandings of the topology of the cluster.  I've seen this go unnoticed 
> in a production environment for several months, leading to data loss, data 
> resurrection, and other odd behavior.
> We should apply some randomness to the tokens to ensure that even in the case 
> of 1 nodes starting at once, it's still unlikely that they will ever have a 
> conflict.  Applying a random() value to the token value between - 2^8 and 2^8 
> makes this statistically very, very unlikely that we'll ever have a collision 
> while also preserving the balance of token distribution in the ring.  In the 
> case of 2 nodes starting at the same time, the operator will have weird token 
> distribution instead of data loss.
>  
> {noformat}
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> -1938510198161598815. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> -3478858378222500629. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> 3562748272064835315. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> 8085185010613503278. /10.0.2.134:7000 is the new owner{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-05-17 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19534:
---
Bug Category: Parent values: Availability(12983)  (was: Parent values: 
Correctness(12982)Level 1 values: Unrecoverable Corruption / Loss(13161))

> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg, ci_summary.html, 
> image-2024-05-03-16-08-10-101.png, screenshot-1.png, screenshot-2.png, 
> screenshot-3.png, screenshot-4.png, screenshot-5.png, screenshot-6.png, 
> screenshot-7.png, screenshot-8.png, screenshot-9.png
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability

2024-05-17 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-19534:
---
Bug Category: Parent values: Correctness(12982)Level 1 values: 
Unrecoverable Corruption / Loss(13161)  (was: Parent values: 
Availability(12983))

> unbounded queues in native transport requests lead to node instability
> --
>
> Key: CASSANDRA-19534
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19534
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Jon Haddad
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - 
> QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, 
> Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg, ci_summary.html, 
> image-2024-05-03-16-08-10-101.png, screenshot-1.png, screenshot-2.png, 
> screenshot-3.png, screenshot-4.png, screenshot-5.png, screenshot-6.png, 
> screenshot-7.png, screenshot-8.png, screenshot-9.png
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.  We should be shedding load much more 
> aggressively and use a bounded queue for incoming work.  This is extremely 
> evident when we combine a resource consuming workload with a smaller one:
> Running 5.0 HEAD on a single node as of today:
> {noformat}
> # populate only
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --maxrlat 100 --populate 
> 10m --rate 50k -n 1
> # workload 1 - larger reads
> easy-cass-stress run RandomPartitionAccess -p 100  -r 1 
> --workload.rows=10 --workload.select=partition --rate 200 -d 1d
> # second workload - small reads
> easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat}
> It appears our results don't time out at the requested server time either:
>  
> {noformat}
>                  Writes                                  Reads                
>                   Deletes                       Errors
>   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) | 
>   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
>  950286       70403.93        634.77 |  789524       70442.07        426.02 | 
>       0              0             0 | 9580484         18980.45
>  952304       70567.62         640.1 |  791072       70634.34        428.36 | 
>       0              0             0 | 9636658         18969.54
>  953146       70767.34         640.1 |  791400       70767.76        428.36 | 
>       0              0             0 | 9695272         18969.54
>  956833       71171.28        623.14 |  794009        71175.6        412.79 | 
>       0              0             0 | 9749377         19002.44
>  959627       71312.58        656.93 |  795703       71349.87        435.56 | 
>       0              0             0 | 9804907         18943.11{noformat}
>  
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19644) deterministic token allocation combined with slow gossip propogation can lead to data loss

2024-05-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847458#comment-17847458
 ] 

Brandon Williams commented on CASSANDRA-19644:
--

Is this different from CASSANDRA-16364?

> deterministic token allocation combined with slow gossip propogation can lead 
> to data loss
> --
>
> Key: CASSANDRA-19644
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19644
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jon Haddad
>Priority: Normal
>
> I've seen several cases now where starting nodes within a somewhat short time 
> window (about a minute) when using the default allocation tokens for RF leads 
> to token conflicts.  Unfortunately this can easily go undetected with medium 
> to large clusters.
> When this happens, different nodes in the cluster will have different 
> understandings of the topology of the cluster.  I've seen this go unnoticed 
> in a production environment for several months, leading to data loss, data 
> resurrection, and other odd behavior.
> We should apply some randomness to the tokens to ensure that even in the case 
> of 1 nodes starting at once, it's still unlikely that they will ever have a 
> conflict.  Applying a random() value to the token value between - 2^8 and 2^8 
> makes this statistically very, very unlikely that we'll ever have a collision 
> while also preserving the balance of token distribution in the ring.  In the 
> case of 2 nodes starting at the same time, the operator will have weird token 
> distribution instead of data loss.
>  
> {noformat}
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> -1938510198161598815. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> -3478858378222500629. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> 3562748272064835315. /10.0.2.134:7000 is the new owner
> INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - 
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
> 8085185010613503278. /10.0.2.134:7000 is the new owner{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19644) deterministic token allocation combined with slow gossip propogation can lead to data loss

2024-05-17 Thread Jon Haddad (Jira)
Jon Haddad created CASSANDRA-19644:
--

 Summary: deterministic token allocation combined with slow gossip 
propogation can lead to data loss
 Key: CASSANDRA-19644
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19644
 Project: Cassandra
  Issue Type: Bug
Reporter: Jon Haddad


I've seen several cases now where starting nodes within a somewhat short time 
window (about a minute) when using the default allocation tokens for RF leads 
to token conflicts.  Unfortunately this can easily go undetected with medium to 
large clusters.

When this happens, different nodes in the cluster will have different 
understandings of the topology of the cluster.  I've seen this go unnoticed in 
a production environment for several months, leading to data loss, data 
resurrection, and other odd behavior.

We should apply some randomness to the tokens to ensure that even in the case 
of 1 nodes starting at once, it's still unlikely that they will ever have a 
conflict.  Applying a random() value to the token value between - 2^8 and 2^8 
makes this statistically very, very unlikely that we'll ever have a collision 
while also preserving the balance of token distribution in the ring.  In the 
case of 2 nodes starting at the same time, the operator will have weird token 
distribution instead of data loss.

 
{noformat}
INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - Nodes 
/10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
-1938510198161598815. /10.0.2.134:7000 is the new owner
INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - Nodes 
/10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
-3478858378222500629. /10.0.2.134:7000 is the new owner
INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - Nodes 
/10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
3562748272064835315. /10.0.2.134:7000 is the new owner
INFO  [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - Nodes 
/10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token 
8085185010613503278. /10.0.2.134:7000 is the new owner{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19642) IndexOutOfBoundsException while serializing CommandsForKey

2024-05-17 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-19642:
--
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/34232d7bd45761a1c14c7e91d2f8e5ae183bc8e3
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> IndexOutOfBoundsException while serializing CommandsForKey
> --
>
> Key: CASSANDRA-19642
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19642
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: NA
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When serializing CommandsForKey we have a concept of “unmanaged” but tests 
> didn’t cover this… when we have anything unmanaged we fail to serialize with 
> a IndexOutOfBounds due to using the ValueAccessor API incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cep-15-accord updated: IndexOutOfBoundsException while serializing CommandsForKey

2024-05-17 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch cep-15-accord
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cep-15-accord by this push:
 new 34232d7bd4 IndexOutOfBoundsException while serializing CommandsForKey
34232d7bd4 is described below

commit 34232d7bd45761a1c14c7e91d2f8e5ae183bc8e3
Author: David Capwell 
AuthorDate: Fri May 17 13:50:01 2024 -0700

IndexOutOfBoundsException while serializing CommandsForKey

patch by David Capwell; reviewed by Blake Eggleston for CASSANDRA-19642
---
 modules/accord |  2 +-
 .../serializers/CommandsForKeySerializer.java  | 29 ++
 .../serializers/CommandsForKeySerializerTest.java  | 35 --
 3 files changed, 50 insertions(+), 16 deletions(-)

diff --git a/modules/accord b/modules/accord
index d63d06aafe..21cdaf5d28 16
--- a/modules/accord
+++ b/modules/accord
@@ -1 +1 @@
-Subproject commit d63d06aafe2e60e57a9651ff6dd491175bbe6916
+Subproject commit 21cdaf5d280965cfdc690d385375635b498bc9f9
diff --git 
a/src/java/org/apache/cassandra/service/accord/serializers/CommandsForKeySerializer.java
 
b/src/java/org/apache/cassandra/service/accord/serializers/CommandsForKeySerializer.java
index dbe2f4845f..a81b62b4a3 100644
--- 
a/src/java/org/apache/cassandra/service/accord/serializers/CommandsForKeySerializer.java
+++ 
b/src/java/org/apache/cassandra/service/accord/serializers/CommandsForKeySerializer.java
@@ -385,15 +385,18 @@ public class CommandsForKeySerializer
 VIntCoding.writeUnsignedVInt32(unmanagedPendingCommitCount, out);
 VIntCoding.writeUnsignedVInt32(cfk.unmanagedCount() - 
unmanagedPendingCommitCount, out);
 Unmanaged.Pending pending = unmanagedPendingCommitCount == 0 ? 
Unmanaged.Pending.APPLY : Unmanaged.Pending.COMMIT;
-for (int i = 0 ; i < cfk.unmanagedCount() ; ++i)
 {
-Unmanaged unmanaged = cfk.getUnmanaged(i);
-Invariants.checkState(unmanaged.pending == pending);
-CommandSerializers.txnId.serialize(unmanaged.txnId, out, 
ByteBufferAccessor.instance, out.position());
-out.position(out.position() + 
CommandSerializers.txnId.serializedSize());
-CommandSerializers.timestamp.serialize(unmanaged.waitingUntil, 
out, ByteBufferAccessor.instance, out.position());
-out.position(out.position() + 
CommandSerializers.timestamp.serializedSize());
-if (--unmanagedPendingCommitCount == 0) pending = 
Unmanaged.Pending.APPLY;
+int offset = 0;
+for (int i = 0 ; i < cfk.unmanagedCount() ; ++i)
+{
+Unmanaged unmanaged = cfk.getUnmanaged(i);
+Invariants.checkState(unmanaged.pending == pending);
+
+offset += 
CommandSerializers.txnId.serialize(unmanaged.txnId, out, 
ByteBufferAccessor.instance, offset);
+offset += 
CommandSerializers.timestamp.serialize(unmanaged.waitingUntil, out, 
ByteBufferAccessor.instance, offset);
+if (--unmanagedPendingCommitCount == 0) pending = 
Unmanaged.Pending.APPLY;
+}
+out.position(out.position() + offset);
 }
 
 if ((executeAtCount | missingIdCount) > 0)
@@ -610,15 +613,17 @@ public class CommandsForKeySerializer
 {
 unmanageds = new Unmanaged[unmanagedCount];
 Unmanaged.Pending pending = unmanagedPendingCommitCount == 0 ? 
Unmanaged.Pending.APPLY : Unmanaged.Pending.COMMIT;
+int offset = 0;
 for (int i = 0 ; i < unmanagedCount ; ++i)
 {
-TxnId txnId = CommandSerializers.txnId.deserialize(in, 
ByteBufferAccessor.instance, in.position());
-in.position(in.position() + 
CommandSerializers.txnId.serializedSize());
-Timestamp waitingUntil = 
CommandSerializers.timestamp.deserialize(in, ByteBufferAccessor.instance, 
in.position());
-in.position(in.position() + 
CommandSerializers.timestamp.serializedSize());
+TxnId txnId = CommandSerializers.txnId.deserialize(in, 
ByteBufferAccessor.instance, offset);
+offset += CommandSerializers.txnId.serializedSize();
+Timestamp waitingUntil = 
CommandSerializers.timestamp.deserialize(in, ByteBufferAccessor.instance, 
offset);
+offset += CommandSerializers.timestamp.serializedSize();
 unmanageds[i] = new Unmanaged(pending, txnId, waitingUntil);
 if (--unmanagedPendingCommitCount == 0) pending = 
Unmanaged.Pending.APPLY;
 }
+in.position(in.position() + offset);
 }
 
 if (executeAtMasks + missingDepsMasks > 0)
diff --git 

(cassandra-accord) branch trunk updated: IndexOutOfBoundsException while serializing CommandsForKey (#90)

2024-05-17 Thread dcapwell
This is an automated email from the ASF dual-hosted git repository.

dcapwell pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-accord.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 21cdaf5d IndexOutOfBoundsException while serializing CommandsForKey 
(#90)
21cdaf5d is described below

commit 21cdaf5d280965cfdc690d385375635b498bc9f9
Author: dcapwell 
AuthorDate: Fri May 17 15:16:45 2024 -0700

IndexOutOfBoundsException while serializing CommandsForKey (#90)

patch by David Capwell; reviewed by Blake Eggleston for CASSANDRA-19642
---
 .../src/main/java/accord/local/CommandsForKey.java | 18 +++-
 .../accord/impl/basic/DelayedCommandStores.java| 25 +++---
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/accord-core/src/main/java/accord/local/CommandsForKey.java 
b/accord-core/src/main/java/accord/local/CommandsForKey.java
index cba170c9..cc44f242 100644
--- a/accord-core/src/main/java/accord/local/CommandsForKey.java
+++ b/accord-core/src/main/java/accord/local/CommandsForKey.java
@@ -161,6 +161,21 @@ public class CommandsForKey implements CommandsSummary
 return c;
 }
 
+@Override
+public boolean equals(Object o)
+{
+if (this == o) return true;
+if (o == null || getClass() != o.getClass()) return false;
+Unmanaged unmanaged = (Unmanaged) o;
+return pending == unmanaged.pending && 
waitingUntil.equals(unmanaged.waitingUntil) && txnId.equals(unmanaged.txnId);
+}
+
+@Override
+public int hashCode()
+{
+return Objects.hash(pending, waitingUntil, txnId);
+}
+
 @Override
 public String toString()
 {
@@ -1699,7 +1714,8 @@ public class CommandsForKey implements CommandsSummary
 if (o == null || getClass() != o.getClass()) return false;
 CommandsForKey that = (CommandsForKey) o;
 return Objects.equals(key, that.key)
-   && Arrays.equals(txns, that.txns);
+   && Arrays.equals(txns, that.txns)
+   && Arrays.equals(unmanageds, that.unmanageds);
 }
 
 @Override
diff --git 
a/accord-core/src/test/java/accord/impl/basic/DelayedCommandStores.java 
b/accord-core/src/test/java/accord/impl/basic/DelayedCommandStores.java
index ea97c5e0..17e18b81 100644
--- a/accord-core/src/test/java/accord/impl/basic/DelayedCommandStores.java
+++ b/accord-core/src/test/java/accord/impl/basic/DelayedCommandStores.java
@@ -48,7 +48,6 @@ import accord.local.PreLoadContext;
 import accord.local.SafeCommandStore;
 import accord.local.SerializerSupport;
 import accord.local.ShardDistributor;
-import accord.messages.Message;
 import accord.primitives.Range;
 import accord.primitives.RoutableKey;
 import accord.primitives.Txn;
@@ -266,22 +265,14 @@ public class DelayedCommandStores extends 
InMemoryCommandStores.SingleThread
 @Override
 public void postExecute()
 {
-if (context instanceof Message)
-{
-Message m = (Message) context;
-if (m.type() != null && !m.type().hasSideEffects())
-{
-// double check there are no modifications
-commands.entrySet().forEach(e -> {
-InMemorySafeCommand safe = e.getValue();
-if (!safe.isModified()) return;
-commandStore.validateRead(safe.current());
-Command original = safe.original();
-if (original != null)
-commandStore.validateRead(original);
-});
-}
-}
+commands.entrySet().forEach(e -> {
+InMemorySafeCommand safe = e.getValue();
+if (!safe.isModified()) return;
+commandStore.validateRead(safe.current());
+Command original = safe.original();
+if (original != null)
+commandStore.validateRead(original);
+});
 }
 }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19642) IndexOutOfBoundsException while serializing CommandsForKey

2024-05-17 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-19642:
--
Status: Ready to Commit  (was: Review In Progress)

+1 from [~bdeggleston] in GH

> IndexOutOfBoundsException while serializing CommandsForKey
> --
>
> Key: CASSANDRA-19642
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19642
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: NA
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When serializing CommandsForKey we have a concept of “unmanaged” but tests 
> didn’t cover this… when we have anything unmanaged we fail to serialize with 
> a IndexOutOfBounds due to using the ValueAccessor API incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19642) IndexOutOfBoundsException while serializing CommandsForKey

2024-05-17 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-19642:
--
Reviewers: Blake Eggleston, David Capwell
   Status: Review In Progress  (was: Patch Available)

> IndexOutOfBoundsException while serializing CommandsForKey
> --
>
> Key: CASSANDRA-19642
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19642
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: NA
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When serializing CommandsForKey we have a concept of “unmanaged” but tests 
> didn’t cover this… when we have anything unmanaged we fail to serialize with 
> a IndexOutOfBounds due to using the ValueAccessor API incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19642) IndexOutOfBoundsException while serializing CommandsForKey

2024-05-17 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-19642:
--
Reviewers: Blake Eggleston  (was: Blake Eggleston, David Capwell)

> IndexOutOfBoundsException while serializing CommandsForKey
> --
>
> Key: CASSANDRA-19642
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19642
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: NA
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When serializing CommandsForKey we have a concept of “unmanaged” but tests 
> didn’t cover this… when we have anything unmanaged we fail to serialize with 
> a IndexOutOfBounds due to using the ValueAccessor API incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19642) IndexOutOfBoundsException while serializing CommandsForKey

2024-05-17 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-19642:
--
Test and Documentation Plan: updated tests
 Status: Patch Available  (was: In Progress)

> IndexOutOfBoundsException while serializing CommandsForKey
> --
>
> Key: CASSANDRA-19642
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19642
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: NA
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When serializing CommandsForKey we have a concept of “unmanaged” but tests 
> didn’t cover this… when we have anything unmanaged we fail to serialize with 
> a IndexOutOfBounds due to using the ValueAccessor API incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19642) IndexOutOfBoundsException while serializing CommandsForKey

2024-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-19642:
---
Labels: pull-request-available  (was: )

> IndexOutOfBoundsException while serializing CommandsForKey
> --
>
> Key: CASSANDRA-19642
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19642
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: NA
>
>
> When serializing CommandsForKey we have a concept of “unmanaged” but tests 
> didn’t cover this… when we have anything unmanaged we fail to serialize with 
> a IndexOutOfBounds due to using the ValueAccessor API incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19643) IndexOutOfBoundsException thrown executing query when bound parameters are missing

2024-05-17 Thread Jon Haddad (Jira)
Jon Haddad created CASSANDRA-19643:
--

 Summary: IndexOutOfBoundsException thrown executing query when 
bound parameters are missing 
 Key: CASSANDRA-19643
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19643
 Project: Cassandra
  Issue Type: Bug
Reporter: Jon Haddad


I prepared a query and tried to execute it without binding the parameters and 
this exception was thrown.  I think we should do a better job of telling the 
user how they didn't pass enough params instead of throwing exceptions.

I prepared this query:
{noformat}
SELECT * from system.local  WHERE token(key) > ? AND token(key) < ?{noformat}
Here's the exception:

 
{noformat}
ERROR [Native-Transport-Requests-1] 2024-05-17 13:39:51,786 
ErrorMessage.java:457 - Unexpected exception during request
java.lang.IndexOutOfBoundsException: null
    at java.base/java.nio.Buffer.checkIndex(Buffer.java:693)
    at java.base/java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:458)
    at org.apache.cassandra.utils.ByteBufferUtil.toLong(ByteBufferUtil.java:505)
    at 
org.apache.cassandra.dht.Murmur3Partitioner$2.fromByteArray(Murmur3Partitioner.java:376)
    at 
org.apache.cassandra.cql3.restrictions.StatementRestrictions.getTokenBound(StatementRestrictions.java:913)
    at 
org.apache.cassandra.cql3.restrictions.StatementRestrictions.getPartitionKeyBoundsForTokenRestrictions(StatementRestrictions.java:879)
    at 
org.apache.cassandra.cql3.restrictions.StatementRestrictions.getPartitionKeyBounds(StatementRestrictions.java:841)
    at 
org.apache.cassandra.cql3.statements.SelectStatement.getRangeCommand(SelectStatement.java:793)
    at 
org.apache.cassandra.cql3.statements.SelectStatement.getQuery(SelectStatement.java:408)
    at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:332)
    at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:108)
    at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:256)
    at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:823)
    at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:801)
    at 
org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:167)
    at org.apache.cassandra.transport.Message$Request.execute(Message.java:256)
    at 
org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:194)
    at 
org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:213)
    at 
org.apache.cassandra.transport.Dispatcher.processRequest(Dispatcher.java:240)
    at 
org.apache.cassandra.transport.Dispatcher$RequestProcessor.run(Dispatcher.java:137)
    at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)
    at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
    at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:143)
    at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:829){noformat}
Found in cassandra-5.0 branch, not sure how far back it goes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19642) IndexOutOfBoundsException while serializing CommandsForKey

2024-05-17 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-19642:
--
 Bug Category: Parent values: Correctness(12982)Level 1 values: 
Unrecoverable Corruption / Loss(13161)
   Complexity: Low Hanging Fruit
Discovered By: Performance Regression Test
Fix Version/s: NA
 Severity: Critical
   Status: Open  (was: Triage Needed)

> IndexOutOfBoundsException while serializing CommandsForKey
> --
>
> Key: CASSANDRA-19642
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19642
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: NA
>
>
> When serializing CommandsForKey we have a concept of “unmanaged” but tests 
> didn’t cover this… when we have anything unmanaged we fail to serialize with 
> a IndexOutOfBounds due to using the ValueAccessor API incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19642) IndexOutOfBoundsException while serializing CommandsForKey

2024-05-17 Thread David Capwell (Jira)
David Capwell created CASSANDRA-19642:
-

 Summary: IndexOutOfBoundsException while serializing CommandsForKey
 Key: CASSANDRA-19642
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19642
 Project: Cassandra
  Issue Type: Bug
  Components: Accord
Reporter: David Capwell


When serializing CommandsForKey we have a concept of “unmanaged” but tests 
didn’t cover this… when we have anything unmanaged we fail to serialize with a 
IndexOutOfBounds due to using the ValueAccessor API incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19642) IndexOutOfBoundsException while serializing CommandsForKey

2024-05-17 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell reassigned CASSANDRA-19642:
-

Assignee: David Capwell

> IndexOutOfBoundsException while serializing CommandsForKey
> --
>
> Key: CASSANDRA-19642
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19642
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>
> When serializing CommandsForKey we have a concept of “unmanaged” but tests 
> didn’t cover this… when we have anything unmanaged we fail to serialize with 
> a IndexOutOfBounds due to using the ValueAccessor API incorrectly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19601) Test failure: test_change_durable_writes

2024-05-17 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19601:
-
Fix Version/s: 4.0.x
   4.1.x

> Test failure: test_change_durable_writes
> 
>
> Key: CASSANDRA-19601
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19601
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> Failing on trunk:
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1880/testReport/junit/dtest-latest.configuration_test/TestConfiguration/Tests___dtest_latest_jdk11_31_64___test_change_durable_writes/]
> [https://app.circleci.com/pipelines/github/blerer/cassandra/400/workflows/893a0edb-9181-4981-b542-77228c8bc975/jobs/10941/tests]
> {code:java}
> AssertionError: Commitlog was written with durable writes disabled
> assert 90112 == 86016
>   +90112
>   -86016
> self = 
> @pytest.mark.timeout(60*30)
> def test_change_durable_writes(self):
> """
> @jira_ticket CASSANDRA-9560
> 
> Test that changes to the DURABLE_WRITES option on keyspaces is
> respected in subsequent writes.
> 
> This test starts by writing a dataset to a cluster and asserting that
> the commitlogs have been written to. The subsequent test depends on
> the assumption that this dataset triggers an fsync.
> 
> After checking this assumption, the test destroys the cluster and
> creates a fresh one. Then it tests that DURABLE_WRITES is respected 
> by:
> 
> - creating a keyspace with DURABLE_WRITES set to false,
> - using ALTER KEYSPACE to set its DURABLE_WRITES option to true,
> - writing a dataset to this keyspace that is known to trigger a 
> commitlog fsync,
> - asserting that the commitlog has grown in size since the data was 
> written.
> """
> cluster = self.cluster
> cluster.set_batch_commitlog(enabled=True, use_batch_window = 
> cluster.version() < '5.0')
> 
> cluster.set_configuration_options(values={'commitlog_segment_size_in_mb': 1})
> 
> cluster.populate(1).start()
> durable_node = cluster.nodelist()[0]
> 
> durable_init_size = commitlog_size(durable_node)
> durable_session = self.patient_exclusive_cql_connection(durable_node)
> 
> # test assumption that write_to_trigger_fsync actually triggers a 
> commitlog fsync
> durable_session.execute("CREATE KEYSPACE ks WITH REPLICATION = 
> {'class': 'SimpleStrategy', 'replication_factor': 1} "
> "AND DURABLE_WRITES = true")
> durable_session.execute('CREATE TABLE ks.tab (key int PRIMARY KEY, a 
> int, b int, c int)')
> logger.debug('commitlog size diff = ' + 
> str(commitlog_size(durable_node) - durable_init_size))
> write_to_trigger_fsync(durable_session, 'ks', 'tab')
> logger.debug('commitlog size diff = ' + 
> str(commitlog_size(durable_node) - durable_init_size))
> 
> assert commitlog_size(durable_node) > durable_init_size, \
> "This test will not work in this environment; 
> write_to_trigger_fsync does not trigger fsync."
> 
> durable_session.shutdown()
> cluster.stop()
> cluster.clear()
> 
> cluster.set_batch_commitlog(enabled=True, use_batch_window = 
> cluster.version() < '5.0')
> 
> cluster.set_configuration_options(values={'commitlog_segment_size_in_mb': 1})
> cluster.start()
> node = cluster.nodelist()[0]
> session = self.patient_exclusive_cql_connection(node)
> 
> # set up a keyspace without durable writes, then alter it to use them
> session.execute("CREATE KEYSPACE ks WITH REPLICATION = {'class': 
> 'SimpleStrategy', 'replication_factor': 1} "
> "AND DURABLE_WRITES = false")
> session.execute('CREATE TABLE ks.tab (key int PRIMARY KEY, a int, b 
> int, c int)')
> init_size = commitlog_size(node)
> write_to_trigger_fsync(session, 'ks', 'tab')
> >   assert commitlog_size(node) == init_size, "Commitlog was written with 
> > durable writes disabled"
> E   AssertionError: Commitlog was written with durable writes disabled
> E   assert 90112 == 86016
> E +90112
> E -86016
> configuration_test.py:104: AssertionError
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19601) Test failure: test_change_durable_writes

2024-05-17 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-19601:
-
Test and Documentation Plan: run CI
 Status: Patch Available  (was: Open)

> Test failure: test_change_durable_writes
> 
>
> Key: CASSANDRA-19601
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19601
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> Failing on trunk:
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1880/testReport/junit/dtest-latest.configuration_test/TestConfiguration/Tests___dtest_latest_jdk11_31_64___test_change_durable_writes/]
> [https://app.circleci.com/pipelines/github/blerer/cassandra/400/workflows/893a0edb-9181-4981-b542-77228c8bc975/jobs/10941/tests]
> {code:java}
> AssertionError: Commitlog was written with durable writes disabled
> assert 90112 == 86016
>   +90112
>   -86016
> self = 
> @pytest.mark.timeout(60*30)
> def test_change_durable_writes(self):
> """
> @jira_ticket CASSANDRA-9560
> 
> Test that changes to the DURABLE_WRITES option on keyspaces is
> respected in subsequent writes.
> 
> This test starts by writing a dataset to a cluster and asserting that
> the commitlogs have been written to. The subsequent test depends on
> the assumption that this dataset triggers an fsync.
> 
> After checking this assumption, the test destroys the cluster and
> creates a fresh one. Then it tests that DURABLE_WRITES is respected 
> by:
> 
> - creating a keyspace with DURABLE_WRITES set to false,
> - using ALTER KEYSPACE to set its DURABLE_WRITES option to true,
> - writing a dataset to this keyspace that is known to trigger a 
> commitlog fsync,
> - asserting that the commitlog has grown in size since the data was 
> written.
> """
> cluster = self.cluster
> cluster.set_batch_commitlog(enabled=True, use_batch_window = 
> cluster.version() < '5.0')
> 
> cluster.set_configuration_options(values={'commitlog_segment_size_in_mb': 1})
> 
> cluster.populate(1).start()
> durable_node = cluster.nodelist()[0]
> 
> durable_init_size = commitlog_size(durable_node)
> durable_session = self.patient_exclusive_cql_connection(durable_node)
> 
> # test assumption that write_to_trigger_fsync actually triggers a 
> commitlog fsync
> durable_session.execute("CREATE KEYSPACE ks WITH REPLICATION = 
> {'class': 'SimpleStrategy', 'replication_factor': 1} "
> "AND DURABLE_WRITES = true")
> durable_session.execute('CREATE TABLE ks.tab (key int PRIMARY KEY, a 
> int, b int, c int)')
> logger.debug('commitlog size diff = ' + 
> str(commitlog_size(durable_node) - durable_init_size))
> write_to_trigger_fsync(durable_session, 'ks', 'tab')
> logger.debug('commitlog size diff = ' + 
> str(commitlog_size(durable_node) - durable_init_size))
> 
> assert commitlog_size(durable_node) > durable_init_size, \
> "This test will not work in this environment; 
> write_to_trigger_fsync does not trigger fsync."
> 
> durable_session.shutdown()
> cluster.stop()
> cluster.clear()
> 
> cluster.set_batch_commitlog(enabled=True, use_batch_window = 
> cluster.version() < '5.0')
> 
> cluster.set_configuration_options(values={'commitlog_segment_size_in_mb': 1})
> cluster.start()
> node = cluster.nodelist()[0]
> session = self.patient_exclusive_cql_connection(node)
> 
> # set up a keyspace without durable writes, then alter it to use them
> session.execute("CREATE KEYSPACE ks WITH REPLICATION = {'class': 
> 'SimpleStrategy', 'replication_factor': 1} "
> "AND DURABLE_WRITES = false")
> session.execute('CREATE TABLE ks.tab (key int PRIMARY KEY, a int, b 
> int, c int)')
> init_size = commitlog_size(node)
> write_to_trigger_fsync(session, 'ks', 'tab')
> >   assert commitlog_size(node) == init_size, "Commitlog was written with 
> > durable writes disabled"
> E   AssertionError: Commitlog was written with durable writes disabled
> E   assert 90112 == 86016
> E +90112
> E -86016
> configuration_test.py:104: AssertionError
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 

[jira] [Commented] (CASSANDRA-19601) Test failure: test_change_durable_writes

2024-05-17 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847439#comment-17847439
 ] 

Brandon Williams commented on CASSANDRA-19601:
--

I added an in-jvm dtest to cover durable writes in [this 
branch|https://github.com/driftx/cassandra/commits/CASSANDRA-19601-4.0/].

||Branch||CI||
|[4.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19601-4.0]|[repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1635/workflows/cdf5e244-8952-409e-9d59-69470e9435fb/jobs/89924]|
|[4.1|https://github.com/driftx/cassandra/tree/CASSANDRA-19601-4.1]|[repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1632/workflows/653d31da-f0c3-49f0-8574-30deb2d8d22b/jobs/89925]|
|[5.0|https://github.com/driftx/cassandra/tree/CASSANDRA-19601-5.0]|[repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1633/workflows/d83db009-934b-4abd-818b-637011f2d466/jobs/89928],
 [latest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1633/workflows/d83db009-934b-4abd-818b-637011f2d466/jobs/89930]|
|[trunk|https://github.com/driftx/cassandra/tree/CASSANDRA-19601-trunk]|[repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1634/workflows/2e55fcdd-6a60-48dc-96c8-7e80bb72a293/jobs/89929],
 [latest 
repeat|https://app.circleci.com/pipelines/github/driftx/cassandra/1634/workflows/2e55fcdd-6a60-48dc-96c8-7e80bb72a293/jobs/89931]|

3.0 and 3.11 don't have TableId and I didn't think going any further down that 
rabbit hole was worth it.  Given that we know the python dtest is useless, they 
won't be any worse off if we just remove it, which I can do when committing 
this.


> Test failure: test_change_durable_writes
> 
>
> Key: CASSANDRA-19601
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19601
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> Failing on trunk:
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/1880/testReport/junit/dtest-latest.configuration_test/TestConfiguration/Tests___dtest_latest_jdk11_31_64___test_change_durable_writes/]
> [https://app.circleci.com/pipelines/github/blerer/cassandra/400/workflows/893a0edb-9181-4981-b542-77228c8bc975/jobs/10941/tests]
> {code:java}
> AssertionError: Commitlog was written with durable writes disabled
> assert 90112 == 86016
>   +90112
>   -86016
> self = 
> @pytest.mark.timeout(60*30)
> def test_change_durable_writes(self):
> """
> @jira_ticket CASSANDRA-9560
> 
> Test that changes to the DURABLE_WRITES option on keyspaces is
> respected in subsequent writes.
> 
> This test starts by writing a dataset to a cluster and asserting that
> the commitlogs have been written to. The subsequent test depends on
> the assumption that this dataset triggers an fsync.
> 
> After checking this assumption, the test destroys the cluster and
> creates a fresh one. Then it tests that DURABLE_WRITES is respected 
> by:
> 
> - creating a keyspace with DURABLE_WRITES set to false,
> - using ALTER KEYSPACE to set its DURABLE_WRITES option to true,
> - writing a dataset to this keyspace that is known to trigger a 
> commitlog fsync,
> - asserting that the commitlog has grown in size since the data was 
> written.
> """
> cluster = self.cluster
> cluster.set_batch_commitlog(enabled=True, use_batch_window = 
> cluster.version() < '5.0')
> 
> cluster.set_configuration_options(values={'commitlog_segment_size_in_mb': 1})
> 
> cluster.populate(1).start()
> durable_node = cluster.nodelist()[0]
> 
> durable_init_size = commitlog_size(durable_node)
> durable_session = self.patient_exclusive_cql_connection(durable_node)
> 
> # test assumption that write_to_trigger_fsync actually triggers a 
> commitlog fsync
> durable_session.execute("CREATE KEYSPACE ks WITH REPLICATION = 
> {'class': 'SimpleStrategy', 'replication_factor': 1} "
> "AND DURABLE_WRITES = true")
> durable_session.execute('CREATE TABLE ks.tab (key int PRIMARY KEY, a 
> int, b int, c int)')
> logger.debug('commitlog size diff = ' + 
> str(commitlog_size(durable_node) - durable_init_size))
> write_to_trigger_fsync(durable_session, 'ks', 'tab')
> logger.debug('commitlog size diff = ' + 
> str(commitlog_size(durable_node) - durable_init_size))
> 
> assert commitlog_size(durable_node) > durable_init_size, \
> "This test will not work in this environment; 
> write_to_trigger_fsync does not trigger fsync."
> 
>

[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest

2024-05-17 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19641:
---
 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Complexity: Normal
Discovered By: Fuzz Test
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Accord barriers/inclusive sync points cause failures in BurnTest
> 
>
> Key: CASSANDRA-19641
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19641
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> The burn test fails almost every run at the moment we found several things to 
> fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19383) Restarting a node causes MV unavailable exceptions afterward

2024-05-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847423#comment-17847423
 ] 

Gábor Auth commented on CASSANDRA-19383:


It looks like, I ran into the same issue: 
[https://lists.apache.org/thread/qt2q8dmb21lffzhrm3w9ymfrb4ps0qkc]

In my case, it is a very small 4-nodes cluster, with very minimal data (~100 MB 
at all), in my case, it occurs only on tables with more than one materialized 
view, regardless the size of the table, in the thread the affected table has 
only ~1300 rows and less than 200 kB data, so it might be a concurrency issue 
under the hood instead of heavy I/O or CPU usage.

> Restarting a node causes MV unavailable exceptions afterward
> 
>
> Key: CASSANDRA-19383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19383
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Materialized Views
>Reporter: Brandon Williams
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> If a cluster is started with materialized views enabled and a client is 
> constantly writing to one, when a node is restarted shortly after startup it 
> may throw UEs:
> {quote}
> INFO  [GossipStage:1] 2024-02-09 06:17:10,073 Gossiper.java:1419 - Node 
> /10.5.11.102:7000 has restarted, now UP
> ERROR [MutationStage-2] 2024-02-09 06:17:10,074 Keyspace.java:650 - Unknown 
> exception caught while attempting to update MaterializedView! 
> stresstest.transport_orders
> org.apache.cassandra.exceptions.UnavailableException: Cannot achieve 
> consistency level ONE
> at 
> org.apache.cassandra.exceptions.UnavailableException.create(UnavailableException.java:37)
> at 
> org.apache.cassandra.locator.ReplicaPlans.assureSufficientLiveReplicas(ReplicaPlans.java:169)
> at 
> org.apache.cassandra.locator.ReplicaPlans.assureSufficientLiveReplicasForWrite(ReplicaPlans.java:112)
> at 
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:353)
> at 
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:344)
> at 
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:338)
> at 
> org.apache.cassandra.service.StorageProxy.wrapViewBatchResponseHandler(StorageProxy.java:1417)
> at 
> org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:1077)
> at 
> org.apache.cassandra.db.view.TableViews.pushViewReplicaUpdates(TableViews.java:169)
> at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:645)
> at org.apache.cassandra.db.Keyspace.applyFuture(Keyspace.java:476)
> at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:223)
> at 
> org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:54)
> at 
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
> at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
> at 
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:750)
> {quote}
> It's not immediately clear what the impact of this is since the error is on 
> restart and not communicated back to the client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest

2024-05-17 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19641:
--

 Summary: Accord barriers/inclusive sync points cause failures in 
BurnTest
 Key: CASSANDRA-19641
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19641
 Project: Cassandra
  Issue Type: Bug
  Components: Accord
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg


The burn test fails almost every run at the moment we found several things to 
fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra-java-driver) branch 4.x updated: ninja-fix changlog updates for 4.18.1

2024-05-17 Thread absurdfarce
This is an automated email from the ASF dual-hosted git repository.

absurdfarce pushed a commit to branch 4.x
in repository https://gitbox.apache.org/repos/asf/cassandra-java-driver.git


The following commit(s) were added to refs/heads/4.x by this push:
 new f60e75842 ninja-fix changlog updates for 4.18.1
f60e75842 is described below

commit f60e75842fa99cbb728a716c0236a89caa19b39c
Author: absurdfarce 
AuthorDate: Fri May 17 12:27:53 2024 -0500

ninja-fix changlog updates for 4.18.1
---
 changelog/README.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/changelog/README.md b/changelog/README.md
index 7807ef15f..83ebb4423 100644
--- a/changelog/README.md
+++ b/changelog/README.md
@@ -21,6 +21,16 @@ under the License.
 
 
 
+### 4.18.1
+
+- [improvement] JAVA-3142: Ability to specify ordering of remote local dc's 
via new configuration for graceful automatic failovers
+- [bug] CASSANDRA-19457: Object reference in Micrometer metrics prevent GC 
from reclaiming Session instances
+- [improvement] CASSANDRA-19468: Don't swallow exception during metadata 
refresh
+- [bug] CASSANDRA-19333: Fix data corruption in VectorCodec when using heap 
buffers
+- [improvement] CASSANDRA-19290: Replace uses of AttributeKey.newInstance
+- [improvement] CASSANDRA-19352: Support native_transport_(address|port) + 
native_transport_port_ssl for DSE 6.8 (4.x edition)
+- [improvement] CASSANDRA-19180: Support reloading keystore in 
cassandra-java-driver
+
 ### 4.18.0
 
 - [improvement] PR 1689: Add support for publishing percentile time series for 
the histogram metrics (nparaddi-walmart)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra-java-driver) branch 4.x updated (bc0240606 -> 3151129f7)

2024-05-17 Thread absurdfarce
This is an automated email from the ASF dual-hosted git repository.

absurdfarce pushed a change to branch 4.x
in repository https://gitbox.apache.org/repos/asf/cassandra-java-driver.git


omit bc0240606 JAVA-3142: Improving the documentation for remote local dc's 
feature
 add 3151129f7 JAVA-3142: Improving the documentation for remote local dc's 
feature

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (bc0240606)
\
 N -- N -- N   refs/heads/4.x (3151129f7)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra-java-driver) branch 4.x updated: JAVA-3142: Improving the documentation for remote local dc's feature

2024-05-17 Thread absurdfarce
This is an automated email from the ASF dual-hosted git repository.

absurdfarce pushed a commit to branch 4.x
in repository https://gitbox.apache.org/repos/asf/cassandra-java-driver.git


The following commit(s) were added to refs/heads/4.x by this push:
 new bc0240606 JAVA-3142: Improving the documentation for remote local dc's 
feature
bc0240606 is described below

commit bc0240606acf437e8dc5d338bf9cc79548fb92b7
Author: Nitin Chhabra 
AuthorDate: Wed May 8 16:54:43 2024 -0700

JAVA-3142: Improving the documentation for remote local dc's feature
---
 core/src/main/resources/reference.conf | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/core/src/main/resources/reference.conf 
b/core/src/main/resources/reference.conf
index 7a56a18e9..7b1c43f8b 100644
--- a/core/src/main/resources/reference.conf
+++ b/core/src/main/resources/reference.conf
@@ -574,7 +574,9 @@ datastax-java-driver {
   # Modifiable at runtime: no
   # Overridable in a profile: yes
   allow-for-local-consistency-levels = false
+  
   # Ordered preference list of remote dc's (in order) optionally supplied 
for automatic failover. While building a query plan, the driver uses the DC's 
supplied in order together with max-nodes-per-remote-dc
+  # Users are not required to specify all DCs, when listing preferences 
via this config
   # Required: no
   # Modifiable at runtime: no
   # Overridable in a profile: no


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] JAVA-3142: Improving the documentation for remote local dc's feature [cassandra-java-driver]

2024-05-17 Thread via GitHub


absurdfarce merged PR #1933:
URL: https://github.com/apache/cassandra-java-driver/pull/1933


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



Re: [PR] JAVA-3142: Improving the documentation for remote local dc's feature [cassandra-java-driver]

2024-05-17 Thread via GitHub


absurdfarce commented on PR #1933:
URL: 
https://github.com/apache/cassandra-java-driver/pull/1933#issuecomment-2118056886

   Thanks for the follow-up @nitinitt !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19636) Fix CCM for Cassandra 5.0 and add arg to the command line which let the user explicitly select JVM

2024-05-17 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847378#comment-17847378
 ] 

Ariel Weisberg commented on CASSANDRA-19636:


I didn't test this yet (still working on getting the existing changes to run), 
but +1 on what I saw in the PR and its description.

> Fix CCM for Cassandra 5.0 and add arg to the command line which let the user 
> explicitly select JVM
> --
>
> Key: CASSANDRA-19636
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19636
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Attachments: CASSANDRA-19636_50_75_ci_summary.html, 
> CASSANDRA-19636_50_75_results_details.tar.xz, 
> CASSANDRA-19636_trunk_76_ci_summary.html, 
> CASSANDRA-19636_trunk_76_results_details.tar.xz
>
>
> CCM fails to select the right Java version for Cassandra 5 binary 
> distribution.
> There are also two additional changes proposed here:
>  * add {{--jvm-version}} argument to let the user explicitly select Java 
> version when starting a node from command line
>  * fail if {{java}} command is available on the {{PATH}} and points to a 
> different Java version than Java distribution defined in {{JAVA_HOME}} 
> because there is no obvious way for the user to figure out which one is going 
> to be used
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19637) LWT conditions on MultiCell collections return invalid results

2024-05-17 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847362#comment-17847362
 ] 

Benjamin Lerer commented on CASSANDRA-19637:


[~dcapwell] did you try with an unfrozen collection ?

> LWT conditions on MultiCell collections return invalid results
> --
>
> Key: CASSANDRA-19637
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19637
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Semantics
>Reporter: Benjamin Lerer
>Assignee: Benjamin Lerer
>Priority: Normal
>
> Due to the way multicell collections are implemented, it is not possible to 
> differentiate between {{null}} and empty collections like it is feasible for 
> single cell (frozen) collections. Therefore an empty multicell collection 
> will always be treated as {{null}}.
> Unfortunately, the way LWT conditions handle that is not consistent with that.
> For example for {{colA list}} non null: {code}.. IF colA >= null{code} 
> will throw an invalid request error whereas {code}..IF colA >= []{code} will 
> returns {{true}}.
> Moreover, if we insert an empty list through:
> {code}INSERT INTO mytable (pk, colA) VALUES (1, []);{code}
> and use {code}DELETE FROM mytable WHERE pk=1 IF colA >= []{code} the returned 
> results will be {code}{false, null}{code}. Which can be quite confusing.
> The way to fix that behaviour to make it consistent with other operations is 
> to consider empty multicell collection input as {{null}} and reject the 
> {{null}} input for non {{=}} and {{!=}} operators.
>   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra) branch cep-15-accord updated: Move preaccept expiration logic away from Agent

2024-05-17 Thread aleksey
This is an automated email from the ASF dual-hosted git repository.

aleksey pushed a commit to branch cep-15-accord
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cep-15-accord by this push:
 new 5b24707c72 Move preaccept expiration logic away from Agent
5b24707c72 is described below

commit 5b24707c729693b0bfdd6e154f70aad7daa2e4ca
Author: Aleksey Yeschenko 
AuthorDate: Mon May 13 15:08:32 2024 +0100

Move preaccept expiration logic away from Agent

patch by Aleksey Yeschenko; reviewed by Alex Petrov and Benedict Elliott 
Smith for CASSANDRA-1
---
 modules/accord |  2 +-
 .../org/apache/cassandra/service/accord/api/AccordAgent.java   | 10 +++---
 .../cassandra/service/accord/SimulatedAccordCommandStore.java  |  4 ++--
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/modules/accord b/modules/accord
index 256b35e27d..d63d06aafe 16
--- a/modules/accord
+++ b/modules/accord
@@ -1 +1 @@
-Subproject commit 256b35e27d170db9fcd8024d5678b4f6e9d3a956
+Subproject commit d63d06aafe2e60e57a9651ff6dd491175bbe6916
diff --git a/src/java/org/apache/cassandra/service/accord/api/AccordAgent.java 
b/src/java/org/apache/cassandra/service/accord/api/AccordAgent.java
index 33f8f2b088..9c4b678996 100644
--- a/src/java/org/apache/cassandra/service/accord/api/AccordAgent.java
+++ b/src/java/org/apache/cassandra/service/accord/api/AccordAgent.java
@@ -35,7 +35,6 @@ import accord.primitives.Seekables;
 import accord.primitives.Timestamp;
 import accord.primitives.Txn;
 import accord.primitives.Txn.Kind;
-import accord.primitives.TxnId;
 import org.apache.cassandra.service.accord.AccordService;
 import org.apache.cassandra.metrics.AccordMetrics;
 import org.apache.cassandra.service.accord.txn.TxnQuery;
@@ -114,13 +113,10 @@ public class AccordAgent implements Agent
 }
 
 @Override
-public boolean isExpired(TxnId initiated, long now)
+public long preAcceptTimeout()
 {
-// TODO: should distinguish between reads and writes
-if (initiated.kind().isSyncPoint())
-return false;
-
-return now - initiated.hlc() > getReadRpcTimeout(MICROSECONDS);
+// TODO: should distinguish between reads and writes (Aleksey: why? 
and why read rpc timeout is being used?)
+return getReadRpcTimeout(MICROSECONDS);
 }
 
 @Override
diff --git 
a/test/unit/org/apache/cassandra/service/accord/SimulatedAccordCommandStore.java
 
b/test/unit/org/apache/cassandra/service/accord/SimulatedAccordCommandStore.java
index 1a1b7f98d2..a0bb647c41 100644
--- 
a/test/unit/org/apache/cassandra/service/accord/SimulatedAccordCommandStore.java
+++ 
b/test/unit/org/apache/cassandra/service/accord/SimulatedAccordCommandStore.java
@@ -149,9 +149,9 @@ public class SimulatedAccordCommandStore implements 
AutoCloseable
 new TestAgent.RethrowAgent()
 {
 @Override
-public boolean isExpired(TxnId 
initiated, long now)
+public long preAcceptTimeout()
 {
-return false;
+return Long.MAX_VALUE;
 }
 
 @Override


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19640) Enhance documentation on storage engine with leading summary

2024-05-17 Thread Brad Schoening (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brad Schoening updated CASSANDRA-19640:
---
Description: The storage engine 
[documentation|https://github.com/apache/cassandra/blob/trunk/doc/modules/cassandra/pages/architecture/storage-engine.adoc]
  would benefit from an abstract or summary which mentions key points that it 
uses a Log-structured merge (LSM) tree design, is write-oriented, and relies 
upon bloom filters (not B-trees) to optimize the read path.  (was: The storage 
engine 
[documentation|https://github.com/apache/cassandra/blob/trunk/doc/modules/cassandra/pages/architecture/storage-engine.adoc]
  would benefit from an abstract or summary which mentions key points that it 
uses a Log-structured merge tree design, is write-oriented, and relies upon 
bloom filters (not B-trees) to optimize the read path.)

> Enhance documentation on storage engine with leading summary
> 
>
> Key: CASSANDRA-19640
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19640
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Brad Schoening
>Priority: Normal
>
> The storage engine 
> [documentation|https://github.com/apache/cassandra/blob/trunk/doc/modules/cassandra/pages/architecture/storage-engine.adoc]
>   would benefit from an abstract or summary which mentions key points that it 
> uses a Log-structured merge (LSM) tree design, is write-oriented, and relies 
> upon bloom filters (not B-trees) to optimize the read path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19640) Enhance documentation on storage engine with leading summary

2024-05-17 Thread Brad Schoening (Jira)
Brad Schoening created CASSANDRA-19640:
--

 Summary: Enhance documentation on storage engine with leading 
summary
 Key: CASSANDRA-19640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19640
 Project: Cassandra
  Issue Type: Improvement
Reporter: Brad Schoening


The storage engine 
[documentation|https://github.com/apache/cassandra/blob/trunk/doc/modules/cassandra/pages/architecture/storage-engine.adoc]
  would benefit from an abstract or summary which mentions key points that it 
uses a Log-structured merge tree design, is write-oriented, and relies upon 
bloom filters (not B-trees) to optimize the read path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra-accord) branch trunk updated: Move preaccept expiration logic away from Agent

2024-05-17 Thread aleksey
This is an automated email from the ASF dual-hosted git repository.

aleksey pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-accord.git


The following commit(s) were added to refs/heads/trunk by this push:
 new d63d06aa Move preaccept expiration logic away from Agent
d63d06aa is described below

commit d63d06aafe2e60e57a9651ff6dd491175bbe6916
Author: Aleksey Yeschenko 
AuthorDate: Fri May 17 13:33:57 2024 +0100

Move preaccept expiration logic away from Agent

patch by Aleksey Yeschenko; reviewed by Alex Petrov, Benedict Elliott 
Smith, and David Capwell for CASSANDRA-1
---
 accord-core/src/main/java/accord/api/Agent.java|  6 +++--
 .../accord/coordinate/CoordinateTransaction.java   |  2 +-
 .../src/main/java/accord/local/CommandStore.java   |  7 +-
 .../java/accord/messages/ExecutionContext.java | 29 ++
 .../src/test/java/accord/impl/TestAgent.java   |  9 ---
 .../src/test/java/accord/impl/list/ListAgent.java  |  5 ++--
 .../main/java/accord/maelstrom/MaelstromAgent.java | 10 
 7 files changed, 52 insertions(+), 16 deletions(-)

diff --git a/accord-core/src/main/java/accord/api/Agent.java 
b/accord-core/src/main/java/accord/api/Agent.java
index f229ab82..06ed2f98 100644
--- a/accord-core/src/main/java/accord/api/Agent.java
+++ b/accord-core/src/main/java/accord/api/Agent.java
@@ -26,7 +26,6 @@ import accord.primitives.Ranges;
 import accord.primitives.Seekables;
 import accord.primitives.Timestamp;
 import accord.primitives.Txn;
-import accord.primitives.TxnId;
 
 /**
  * Facility for augmenting node behaviour at specific points
@@ -70,7 +69,10 @@ public interface Agent extends UncaughtExceptionListener
 
 void onHandledException(Throwable t);
 
-boolean isExpired(TxnId initiated, long now);
+/**
+ * @return PreAccept timeout with implementation-defined resolution of the 
hybrid logical clock
+ */
+long preAcceptTimeout();
 
 Txn emptyTxn(Txn.Kind kind, Seekables keysOrRanges);
 
diff --git 
a/accord-core/src/main/java/accord/coordinate/CoordinateTransaction.java 
b/accord-core/src/main/java/accord/coordinate/CoordinateTransaction.java
index dc4395c9..0b05657d 100644
--- a/accord-core/src/main/java/accord/coordinate/CoordinateTransaction.java
+++ b/accord-core/src/main/java/accord/coordinate/CoordinateTransaction.java
@@ -84,7 +84,7 @@ public class CoordinateTransaction extends 
CoordinatePreAccept
 //  but by sending accept we rule 
out hybrid fast-path
 // TODO (low priority, efficiency): if we receive an expired 
response, perhaps defer to permit at least one other
 //  node to respond before 
invalidating
-if (executeAt.isRejected() || node.agent().isExpired(txnId, 
executeAt.hlc()))
+if (executeAt.isRejected() || executeAt.hlc() - txnId.hlc() >= 
node.agent().preAcceptTimeout())
 {
 proposeAndCommitInvalidate(node, Ballot.ZERO, txnId, 
route.homeKey(), route, executeAt,this);
 }
diff --git a/accord-core/src/main/java/accord/local/CommandStore.java 
b/accord-core/src/main/java/accord/local/CommandStore.java
index 8e2f18eb..1d5a3913 100644
--- a/accord-core/src/main/java/accord/local/CommandStore.java
+++ b/accord-core/src/main/java/accord/local/CommandStore.java
@@ -319,8 +319,13 @@ public abstract class CommandStore implements AgentExecutor
  */
 final Timestamp preaccept(TxnId txnId, Seekables keys, 
SafeCommandStore safeStore, boolean permitFastPath)
 {
+// TODO (expected): make preAcceptTimeout() be a part of 
SafeCommandStore, initiated from ExecutionContext;
+//  preAcceptTimeout can be subject to local configuration 
changes, which would break determinism of repeated
+//  message processing, if, say, replayed from a log.
+
 NodeTimeService time = safeStore.time();
-boolean isExpired = agent().isExpired(txnId, safeStore.time().now());
+
+boolean isExpired = time.now() - txnId.hlc() >= 
agent().preAcceptTimeout() && !txnId.kind().isSyncPoint();
 if (rejectBefore != null && !isExpired)
 isExpired = null == rejectBefore.foldl(keys, (rejectIfBefore, 
test) -> rejectIfBefore.compareTo(test) > 0 ? null : test, txnId, 
Objects::isNull);
 
diff --git a/accord-core/src/main/java/accord/messages/ExecutionContext.java 
b/accord-core/src/main/java/accord/messages/ExecutionContext.java
new file mode 100644
index ..dbf4c2db
--- /dev/null
+++ b/accord-core/src/main/java/accord/messages/ExecutionContext.java
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 

[jira] [Updated] (CASSANDRA-19212) Better handle Spark job timeouts/process killing in Analytics tests

2024-05-17 Thread Doug Rohrer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Rohrer updated CASSANDRA-19212:

Resolution: Abandoned
Status: Resolved  (was: Open)

No longer spinning up a separate process for Spark, so we don't need this work 
any more.

> Better handle Spark job timeouts/process killing in Analytics tests
> ---
>
> Key: CASSANDRA-19212
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19212
> Project: Cassandra
>  Issue Type: Bug
>  Components: Analytics Library
>Reporter: Doug Rohrer
>Assignee: Doug Rohrer
>Priority: Normal
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The Process#destroyForcibly() call in ResiliencyTestBase#bulkWriteData can 
> complete without the process actually exiting (and is documented to do so). 
> We should wait on the process to exit before attempting to read the exit 
> code, which throws if the process hasn’t yet exited.  Otherwise, we can lose 
> the Spark output when the test determines that the job is taking too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



(cassandra-website) branch asf-staging updated (d18ff303e -> 1a96cdf82)

2024-05-17 Thread git-site-role
This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a change to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git


 discard d18ff303e generate docs for a5b7a878
 new 1a96cdf82 generate docs for a5b7a878

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (d18ff303e)
\
 N -- N -- N   refs/heads/asf-staging (1a96cdf82)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/search-index.js |   2 +-
 site-ui/build/ui-bundle.zip | Bin 4883646 -> 4883646 bytes
 2 files changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org