[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-11-13 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785479#comment-17785479
 ] 

Henrik Ingo commented on CASSANDRA-18798:
-

Status: The issue with TimeUUID() generating different values on each replica 
is a universal problem. Halting this work while waiting for discussion in 
#cassandra-accord

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.x
>
> Attachments: image-2023-09-26-20-05-25-846.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692607286281483026}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 56    
> :time 1692607286284097561}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 1    :n 56    :time 1692607286306445242}
> {code}
> Processes process 6 and process 7 are appending the values 553 and 455 
> respectively. 455

[jira] [Updated] (CASSANDRA-18989) Accord: UX: Force transactions / automatic transactions

2023-11-01 Thread Henrik Ingo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik Ingo updated CASSANDRA-18989:

Summary: Accord: UX: Force transactions / automatic transactions  (was: 
Accord: Force transactions / automatic transactions)

> Accord: UX: Force transactions / automatic transactions
> ---
>
> Key: CASSANDRA-18989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18989
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Henrik Ingo
>Priority: Normal
> Fix For: 5.x
>
>
> Note: I chose  "bug" because I think this is a serious UX issue we should 
> consider. But strictly speaking this is a UX issue and technically the 
> implementation is working as designed. The UX conclusion rather is that the 
> desing needs improvement...
> I'm submitting this based on observing [~antithesis-luis] creating a checker 
> with some accord transactions. A discussion that followed his experience is 
> here https://the-asf.slack.com/archives/C0459N9R5C6/p1698352614742079
> The tl;dr is that users are likely to expect single SELECT queries, and maybe 
> even single UPDATE/INSERT to be consistent even if they neglect the 
> BEGIN...COMMIT around the single statement.
> My proposed fix for improved UX is an ability to force or default also single 
> statements to be wrapper in an accord transaction.
> There are two ways to implement this: 
> 1. Add configuration option to reject queries that are not accord 
> transactions. This could be a per table or per keyspace option.
> 2. A per session setting that enables automatic transactions, combined with a 
> global setting to have this behavior as default. MySQL's AUTOCOMMIT is an 
> example of this approach.
> My preference is #2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18989) Accord: Force transactions / automatic transactions

2023-11-01 Thread Henrik Ingo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik Ingo updated CASSANDRA-18989:

Description: 
Note: I chose  "bug" because I think this is a serious UX issue we should 
consider. But strictly speaking this is a UX issue and technically the 
implementation is working as designed. The UX conclusion rather is that the 
desing needs improvement...

I'm submitting this based on observing [~antithesis-luis] creating a checker 
with some accord transactions. A discussion that followed his experience is 
here https://the-asf.slack.com/archives/C0459N9R5C6/p1698352614742079

The tl;dr is that users are likely to expect single SELECT queries, and maybe 
even single UPDATE/INSERT to be consistent even if they neglect the 
BEGIN...COMMIT around the single statement.

My proposed fix for improved UX is an ability to force or default also single 
statements to be wrapper in an accord transaction.

There are two ways to implement this: 

1. Add configuration option to reject queries that are not accord transactions. 
This could be a per table or per keyspace option.
2. A per session setting that enables automatic transactions, combined with a 
global setting to have this behavior as default. MySQL's AUTOCOMMIT is an 
example of this approach.

My preference is #2.

  was:
Note: I chose  "bug" because I think this is a serious UX issue we should 
consider. But strictly speaking this is a UX issue and technically the 
implementation is working as designed. The UX conclusion rather is that the 
desing needs improvement...

I'm submitting this based on observing [~alfprado] creating a checker with some 
accord transactions. A discussion that followed his experience is here 
https://the-asf.slack.com/archives/C0459N9R5C6/p1698352614742079

The tl;dr is that users are likely to expect single SELECT queries, and maybe 
even single UPDATE/INSERT to be consistent even if they neglect the 
BEGIN...COMMIT around the single statement.

My proposed fix for improved UX is an ability to force or default also single 
statements to be wrapper in an accord transaction.

There are two ways to implement this: 

1. Add configuration option to reject queries that are not accord transactions. 
This could be a per table or per keyspace option.
2. A per session setting that enables automatic transactions, combined with a 
global setting to have this behavior as default. MySQL's AUTOCOMMIT is an 
example of this approach.

My preference is #2.


> Accord: Force transactions / automatic transactions
> ---
>
> Key: CASSANDRA-18989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18989
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Henrik Ingo
>Priority: Normal
> Fix For: 5.x
>
>
> Note: I chose  "bug" because I think this is a serious UX issue we should 
> consider. But strictly speaking this is a UX issue and technically the 
> implementation is working as designed. The UX conclusion rather is that the 
> desing needs improvement...
> I'm submitting this based on observing [~antithesis-luis] creating a checker 
> with some accord transactions. A discussion that followed his experience is 
> here https://the-asf.slack.com/archives/C0459N9R5C6/p1698352614742079
> The tl;dr is that users are likely to expect single SELECT queries, and maybe 
> even single UPDATE/INSERT to be consistent even if they neglect the 
> BEGIN...COMMIT around the single statement.
> My proposed fix for improved UX is an ability to force or default also single 
> statements to be wrapper in an accord transaction.
> There are two ways to implement this: 
> 1. Add configuration option to reject queries that are not accord 
> transactions. This could be a per table or per keyspace option.
> 2. A per session setting that enables automatic transactions, combined with a 
> global setting to have this behavior as default. MySQL's AUTOCOMMIT is an 
> example of this approach.
> My preference is #2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-18989) Accord: Force transactions / automatic transactions

2023-10-31 Thread Henrik Ingo (Jira)
Henrik Ingo created CASSANDRA-18989:
---

 Summary: Accord: Force transactions / automatic transactions
 Key: CASSANDRA-18989
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18989
 Project: Cassandra
  Issue Type: Bug
  Components: Accord
Reporter: Henrik Ingo


Note: I chose  "bug" because I think this is a serious UX issue we should 
consider. But strictly speaking this is a UX issue and technically the 
implementation is working as designed. The UX conclusion rather is that the 
desing needs improvement...

I'm submitting this based on observing [~alfprado] creating a checker with some 
accord transactions. A discussion that followed his experience is here 
https://the-asf.slack.com/archives/C0459N9R5C6/p1698352614742079

The tl;dr is that users are likely to expect single SELECT queries, and maybe 
even single UPDATE/INSERT to be consistent even if they neglect the 
BEGIN...COMMIT around the single statement.

My proposed fix for improved UX is an ability to force or default also single 
statements to be wrapper in an accord transaction.

There are two ways to implement this: 

1. Add configuration option to reject queries that are not accord transactions. 
This could be a per table or per keyspace option.
2. A per session setting that enables automatic transactions, combined with a 
global setting to have this behavior as default. MySQL's AUTOCOMMIT is an 
example of this approach.

My preference is #2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-26 Thread Henrik Ingo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik Ingo updated CASSANDRA-18798:

Test and Documentation Plan: 
Added 2 unit tests
Retest with the list append Elle test
Does not impact documentation
 Status: Patch Available  (was: In Progress)

Ok, above PR now ready for review [~jlewandowski]. Let me know if I need to 
squash commits or something first?

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.0-alpha2
>
> Attachments: image-2023-09-26-20-05-25-846.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692607286281483026}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 56    
> :time 1692607286284097561}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 1    :n 56    :time 1692607286306445242}
> {code}
> Processes proc

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-20 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1896#comment-1896
 ] 

Henrik Ingo commented on CASSANDRA-18798:
-

New branch btw: https://github.com/apache/cassandra/pull/2830

This seems to work now. Key was to understand what each method in TimeUUID 
class really does.

I'll add some source code commentary in that regard on Monday, then submit for 
review. 

[~kijanowski] to confirm whether the Elle test now passes

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.0-alpha2
>
> Attachments: image-2023-09-26-20-05-25-846.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692607286281483026}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 56    
> :time 1692607286284097561}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 1   

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-18 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776780#comment-17776780
 ] 

Henrik Ingo commented on CASSANDRA-18798:
-

Okay actually bug was in my own code after all. I had lost 10 microseconds 
granularity. Seems to work now, pushed a commit, taking a break.

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.0-alpha2
>
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692607286281483026}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 56    
> :time 1692607286284097561}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 1    :n 56    :time 1692607286306445242}
> {code}
> Processes process 6 and process 7 are appending the values 553 and 455 
> respectively. 455 succeeded and a read by process 5 confirms that. But then 
> also 553 is append

[jira] [Updated] (CASSANDRA-18937) Two accord transactions have the exact same transaction id

2023-10-18 Thread Henrik Ingo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik Ingo updated CASSANDRA-18937:

Resolution: Invalid
Status: Resolved  (was: Triage Needed)

Closing: Bug was in code added by myself.

> Two accord transactions have the exact same transaction id
> --
>
> Key: CASSANDRA-18937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18937
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Henrik Ingo
>Priority: Normal
>
> When testing solutions for CASSANDRA-18798 I noticed that two independent 
> transactions running at the same time in two parallel threads ended up having 
> the exact same transaction id:
> {code}
> public void testListAddition() throws Exception
> {
> SHARED_CLUSTER.schemaChange("CREATE TABLE " + currentTable + " (k int 
> PRIMARY KEY, l list)");
> SHARED_CLUSTER.forEach(node -> node.runOnInstance(() -> 
> AccordService.instance().setCacheSize(0)));
> CountDownLatch latch = CountDownLatch.newCountDownLatch(1);
> Vector completionOrder = new Vector<>();
> try
> {
> for (int i=0; i<100; i++)
> {
> ForkJoinTask add1 = ForkJoinPool.commonPool().submit(() -> 
> {
> latch.awaitThrowUncheckedOnInterrupt();
> SHARED_CLUSTER.get(1).executeInternal("BEGIN TRANSACTION 
> " +
> "UPDATE " + currentTable + " SET l = l + [1] 
> WHERE k = 1; " +
> "COMMIT TRANSACTION");
> completionOrder.add(1);
> });
> ForkJoinTask add2 = ForkJoinPool.commonPool().submit(() -> 
> {
> try {
> Thread.sleep(0);
> {code}
> Adding some logging in TxnWrite.java reveals the two threads ave identical 
> executeAt and unix timestamps:
> {noformat}
> lastmicros 0
> DEBUG [node2_Messaging-EventLoop-3-4] node2 2023-10-18 18:26:08,954 
> AccordVerbHandler.java:54 - Receiving Apply{kind:Minimal, 
> txnId:[10,1697642767659000,10,1], deps:[distributed_test_keyspace:[(-Inf,-1], 
> (-1,9223372036854775805], (9223372036854775805,+Inf]]]:{}, {}, 
> executeAt:[10,1697642767659000,10,1], 
> writes:TxnWrites{executeAt:[10,1697642767659000,10,1], 
> keys:[distributed_test_keyspace:DecoratedKey(-4069959284402364209, 
> 0001)], write:TxnWrite{}}, result:accord.api.Result$1@253c102e} from 
> /127.0.0.1:7012
> raw 0  (NO_LAST_EXECUTED_HLC=-9223372036854775808
> lastExecutedTimestamp [0,0,0,0]
> lastmicros 1697642767659000
> raw -9223372036854775808  (NO_LAST_EXECUTED_HLC=-9223372036854775808
> lastExecutedTimestamp [10,1697642767659000,10,1]
> DEBUG [node2_CommandStore[1]:1] node2 2023-10-18 18:26:09,023 
> AccordMessageSink.java:167 - Replying ACCORD_APPLY_RSP ApplyApplied to 
> /127.0.0.1:7012
> lastmicros 0
> raw 0  (NO_LAST_EXECUTED_HLC=-9223372036854775808
> lastExecutedTimestamp [0,0,0,0]
> lastmicros 1697642767659000
> raw -9223372036854775808  (NO_LAST_EXECUTED_HLC=-9223372036854775808
> lastExecutedTimestamp [10,1697642767659000,10,1]
> timestamp 1697642767659000executeAt[10,1697642767659000,10,1]
> timestamp 1697642767659000executeAt[10,1697642767659000,10,1]
> {noformat}
> Increasing the Thread.sleep() to9 or 10 helps so that the transactions have 
> different IDs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-18937) Two accord transactions have the exact same transaction id

2023-10-18 Thread Henrik Ingo (Jira)
Henrik Ingo created CASSANDRA-18937:
---

 Summary: Two accord transactions have the exact same transaction id
 Key: CASSANDRA-18937
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18937
 Project: Cassandra
  Issue Type: Bug
  Components: Accord
Reporter: Henrik Ingo


When testing solutions for CASSANDRA-18798 I noticed that two independent 
transactions running at the same time in two parallel threads ended up having 
the exact same transaction id:


{code}
public void testListAddition() throws Exception
{
SHARED_CLUSTER.schemaChange("CREATE TABLE " + currentTable + " (k int 
PRIMARY KEY, l list)");
SHARED_CLUSTER.forEach(node -> node.runOnInstance(() -> 
AccordService.instance().setCacheSize(0)));

CountDownLatch latch = CountDownLatch.newCountDownLatch(1);

Vector completionOrder = new Vector<>();
try
{
for (int i=0; i<100; i++)
{

ForkJoinTask add1 = ForkJoinPool.commonPool().submit(() -> {
latch.awaitThrowUncheckedOnInterrupt();
SHARED_CLUSTER.get(1).executeInternal("BEGIN TRANSACTION " +
"UPDATE " + currentTable + " SET l = l + [1] WHERE 
k = 1; " +
"COMMIT TRANSACTION");
completionOrder.add(1);
});

ForkJoinTask add2 = ForkJoinPool.commonPool().submit(() -> {
try {
Thread.sleep(0);
{code}

Adding some logging in TxnWrite.java reveals the two threads ave identical 
executeAt and unix timestamps:

{noformat}
lastmicros 0
DEBUG [node2_Messaging-EventLoop-3-4] node2 2023-10-18 18:26:08,954 
AccordVerbHandler.java:54 - Receiving Apply{kind:Minimal, 
txnId:[10,1697642767659000,10,1], deps:[distributed_test_keyspace:[(-Inf,-1], 
(-1,9223372036854775805], (9223372036854775805,+Inf]]]:{}, {}, 
executeAt:[10,1697642767659000,10,1], 
writes:TxnWrites{executeAt:[10,1697642767659000,10,1], 
keys:[distributed_test_keyspace:DecoratedKey(-4069959284402364209, 0001)], 
write:TxnWrite{}}, result:accord.api.Result$1@253c102e} from /127.0.0.1:7012
raw 0  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [0,0,0,0]
lastmicros 1697642767659000
raw -9223372036854775808  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [10,1697642767659000,10,1]
DEBUG [node2_CommandStore[1]:1] node2 2023-10-18 18:26:09,023 
AccordMessageSink.java:167 - Replying ACCORD_APPLY_RSP ApplyApplied to 
/127.0.0.1:7012
lastmicros 0
raw 0  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [0,0,0,0]
lastmicros 1697642767659000
raw -9223372036854775808  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [10,1697642767659000,10,1]
timestamp 1697642767659000executeAt[10,1697642767659000,10,1]
timestamp 1697642767659000executeAt[10,1697642767659000,10,1]
{noformat}


Increasing the Thread.sleep() to9 or 10 helps so that the transactions have 
different IDs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-17 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776200#comment-17776200
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 10/17/23 2:06 PM:
---

Pushed new snapshot of progress: 
https://github.com/henrikingo/cassandra/commit/4b2292bfa52ed713163abbc4f72b8300bf630e8e

This commit "fixes" the issue in the sense that 
{{updateAllTimestampAndLocalDeletionTime()}} will now also update the {{path}} 
variable for elements of a ListType. However, this does not actualy fix the 
issue. In the unit test that's also part of the patch, the transactions end up 
always having the same timestamp, and hence generate the same TimeUUID().

(Note that separately we might wonder what would happen if we append 2 list 
elements in the same transaction?)

To emphasize  the point that the above does the right thing given the original 
assumptions, if I just use { {nextTimeUUID()}}, which generates new UUIDs, not 
just maps the current timestamp to a deterministic UUID, then the test 
"passes", though I doubt that would be correct in a real cluster with multiple 
nodes. IT works on a single node because this code executes serially in the 
accord execution phase, so newly generated UUIDs are ordered correctly, even if 
they are not the correct UUIDs. (...as in derived from the Accord transaction 
id)

But ok, debugging this I realized another issue, which I first thought was with 
the test setup, but might be some kind of race condition. It turns out the two 
transactions in the unit test end up executing with the exact same timestamps.

{noformat}
lastmicros 0
DEBUG [node2_CommandStore[1]:1] node2 2023-10-17 15:39:35,579 
AccordMessageSink.java:167 - Replying ACCORD_APPLY_RSP ApplyApplied to 
/127.0.0.1:7012
DEBUG [node1_RequestResponseStage-1] node1 2023-10-17 15:39:35,580 
AccordCallback.java:49 - Received response ApplyApplied from /127.0.0.2:7012
lastmicros 0
raw 0  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [0,0,0,0]
lastmicros 1697546374434000
raw 0  (NO_LAST_EXECUTED_HLC=-9223372036854775808
raw -9223372036854775808  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [0,0,0,0]
lastExecutedTimestamp [10,1697546374434000,10,1]
lastmicros 1697546374434000
raw -9223372036854775808  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [10,1697546374434000,10,1]
timestamp 1697546374434000executeAt[10,1697546374434000,10,1]
timestamp 1697546374434000executeAt[10,1697546374434000,10,1]
{noformat}

But adding a sleep to one thread, it resolves the issue (also makes the test 
pass, actually):
{code}
ForkJoinTask add2 = ForkJoinPool.commonPool().submit(() -> {
try {
Thread.sleep(1000);
}catch (InterruptedException e){
// It's ok
}

latch.awaitThrowUncheckedOnInterrupt();
SHARED_CLUSTER.get(1).executeInternal("BEGIN TRANSACTION " +
"UPDATE " + currentTable + " SET l = l + [2] WHERE 
k = 1; " +
"COMMIT TRANSACTION");
completionOrder.add(2);
});

{code}

{noformat}
lastmicros 1697544893676000
raw 1697544893676000  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [10,1697544893676000,10,1]
lastmicros 1697544894677000
raw -9223372036854775808  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [10,1697544894677000,10,1]
timestamp 1697544894677000executeAt[10,1697544894677000,10,1]
DEBUG [node2_CommandStore[1]:1] node2 2023-10-17 15:14:54,728 
AccordMessageSink.java:167 - Replying ACCORD_APPLY_RSP ApplyApplied to 
/127.0.0.1:7012
DEBUG [node1_RequestResponseStage-1] node1 2023-10-17 15:14:54,728 
AccordCallback.java:49 - Received response ApplyApplied from /127.0.0.1:7012
DEBUG [node2_Messaging-EventLoop-3-4] node2 2023-10-17 15:14:54,728 
AccordVerbHandler.java:54 - Receiving 
PreAccept{txnId:[10,1697544894711000,0,1], 
txn:{read:TxnRead{TxnNamedRead{name='RETURNING:', 
key=distributed_test_keyspace:DecoratedKey(-4069959284402364209, 0001), 
update=Read(distributed_test_keyspace.tbl0 columns=*/[l] rowFilter= limits= 
key=1 filter=names(EMPTY), nowInSec=0)}}}, 
scope:[distributed_test_keyspace:-4069959284402364209]} from /127.0.0.1:7012
DEBUG [node1_CommandStore[1]:1] node1 2023-10-17 15:14:54,730 
AbstractCell.java:144 - timestamp: 1697544894677000   buffer: 0newPath: 
java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
lastmicros 1697544893676000
raw 1697544893676000  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [10,1697544893676000,10,1]
lastmicros 1697544894677000
raw -9223372036854775808  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [10,1697544894677000,10,1]
DEBUG [node1_RequestResponseS

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-17 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776200#comment-17776200
 ] 

Henrik Ingo commented on CASSANDRA-18798:
-

Pushed new snapshot of progress: 
https://github.com/henrikingo/cassandra/commit/4b2292bfa52ed713163abbc4f72b8300bf630e8e

This commit "fixes" the issue in the sense that 
{{updateAllTimestampAndLocalDeletionTime()}} will now also update the {{path}} 
variable for elements of a ListType. However, this does not actualy fix the 
issue. In the unit test that's also part of the patch, the transactions end up 
always having the same timestamp, and hence generate the same TimeUUID().

To emphasize  the point that the above does the right thing given the original 
assumptions, if I just use { {nextTimeUUID()}}, which generates new UUIDs, not 
just maps the current timestamp to a deterministic UUID, then the test 
"passes", though I doubt that would be correct in a real cluster with multiple 
nodes. IT works on a single node because this code executes serially in the 
accord execution phase, so newly generated UUIDs are ordered correctly, even if 
they are not the correct UUIDs. (...as in derived from the Accord transaction 
id)

But ok, debugging this I realized another issue, which I first thought was with 
the test setup, but might be some kind of race condition. It turns out the two 
transactions in the unit test end up executing with the exact same timestamps.

{noformat}
lastmicros 0
DEBUG [node2_CommandStore[1]:1] node2 2023-10-17 15:39:35,579 
AccordMessageSink.java:167 - Replying ACCORD_APPLY_RSP ApplyApplied to 
/127.0.0.1:7012
DEBUG [node1_RequestResponseStage-1] node1 2023-10-17 15:39:35,580 
AccordCallback.java:49 - Received response ApplyApplied from /127.0.0.2:7012
lastmicros 0
raw 0  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [0,0,0,0]
lastmicros 1697546374434000
raw 0  (NO_LAST_EXECUTED_HLC=-9223372036854775808
raw -9223372036854775808  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [0,0,0,0]
lastExecutedTimestamp [10,1697546374434000,10,1]
lastmicros 1697546374434000
raw -9223372036854775808  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [10,1697546374434000,10,1]
timestamp 1697546374434000executeAt[10,1697546374434000,10,1]
timestamp 1697546374434000executeAt[10,1697546374434000,10,1]
{noformat}

But adding a sleep to one thread, it resolves the issue (also makes the test 
pass, actually):
{code}
ForkJoinTask add2 = ForkJoinPool.commonPool().submit(() -> {
try {
Thread.sleep(1000);
}catch (InterruptedException e){
// It's ok
}

latch.awaitThrowUncheckedOnInterrupt();
SHARED_CLUSTER.get(1).executeInternal("BEGIN TRANSACTION " +
"UPDATE " + currentTable + " SET l = l + [2] WHERE 
k = 1; " +
"COMMIT TRANSACTION");
completionOrder.add(2);
});

{code}

{noformat}
lastmicros 1697544893676000
raw 1697544893676000  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [10,1697544893676000,10,1]
lastmicros 1697544894677000
raw -9223372036854775808  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [10,1697544894677000,10,1]
timestamp 1697544894677000executeAt[10,1697544894677000,10,1]
DEBUG [node2_CommandStore[1]:1] node2 2023-10-17 15:14:54,728 
AccordMessageSink.java:167 - Replying ACCORD_APPLY_RSP ApplyApplied to 
/127.0.0.1:7012
DEBUG [node1_RequestResponseStage-1] node1 2023-10-17 15:14:54,728 
AccordCallback.java:49 - Received response ApplyApplied from /127.0.0.1:7012
DEBUG [node2_Messaging-EventLoop-3-4] node2 2023-10-17 15:14:54,728 
AccordVerbHandler.java:54 - Receiving 
PreAccept{txnId:[10,1697544894711000,0,1], 
txn:{read:TxnRead{TxnNamedRead{name='RETURNING:', 
key=distributed_test_keyspace:DecoratedKey(-4069959284402364209, 0001), 
update=Read(distributed_test_keyspace.tbl0 columns=*/[l] rowFilter= limits= 
key=1 filter=names(EMPTY), nowInSec=0)}}}, 
scope:[distributed_test_keyspace:-4069959284402364209]} from /127.0.0.1:7012
DEBUG [node1_CommandStore[1]:1] node1 2023-10-17 15:14:54,730 
AbstractCell.java:144 - timestamp: 1697544894677000   buffer: 0newPath: 
java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
lastmicros 1697544893676000
raw 1697544893676000  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [10,1697544893676000,10,1]
lastmicros 1697544894677000
raw -9223372036854775808  (NO_LAST_EXECUTED_HLC=-9223372036854775808
lastExecutedTimestamp [10,1697544894677000,10,1]
DEBUG [node1_RequestResponseStage-1] node1 2023-10-17 15:14:54,734 
AccordCallback.java:49 - Received response ApplyApplied from /127.0.0.2:7012
timestamp 1697544894677000executeAt[10,16975

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-12 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774673#comment-17774673
 ] 

Henrik Ingo commented on CASSANDRA-18798:
-

update: Working on what Branimir suggested earlier:

{quote}
Could we do this by adding an updatePathTimestamps method in AbstractType that 
does nothing by default but is implemented by ListType to adjust all the 
timestamp part of its path UUIDs, and call it from 
ColumnData.updateAllTimestamps?
{quote}

Will continue and elaborate tomorrow.

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.0-alpha2
>
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692607286281483026}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 56    
> :time 1692607286284097561}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 1    :n 56    :tim

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-12 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774632#comment-17774632
 ] 

Henrik Ingo commented on CASSANDRA-18798:
-

Hmm...

It did pass for me, but you're right, if repeating the test multiple times it 
does fail quite soon, at runs 2 to 4.

Btw I added:

{code}
try
{
for (int i=0; i<100; i++)
{

ForkJoinTask add1 = ForkJoinPool.commonPool().submit(() -> {
{code}

...so that the test is practically guáranteed to fail. (Otherwise it would be a 
flaky test if it passes 50% of the time...)

I should note that I did rerun the --list-append test that is the test that 
discovered this bug in the first place, and it can no longer repro this. It 
passes even a fairly lengthy run.

...
I would say the addition of 
https://github.com/apache/cassandra/blame/cep-15-accord/src/java/org/apache/cassandra/service/accord/AccordKeyspace.java#L361-L363
 clearly helps. The  {{BufferCell[] ListType.elements}} now get their 
timestamps updated. But what's missing?

One possibility is that the list items now have their timestamps correctly 
aligned with Accord, but the list is never re-sorted after this?

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.0-alpha2
>
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    

[jira] [Updated] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-11 Thread Henrik Ingo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik Ingo updated CASSANDRA-18798:

Fix Version/s: 5.0-alpha2

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.0-alpha2
>
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692607286281483026}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 56    
> :time 1692607286284097561}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 1    :n 56    :time 1692607286306445242}
> {code}
> Processes process 6 and process 7 are appending the values 553 and 455 
> respectively. 455 succeeded and a read by process 5 confirms that. But then 
> also 553 is appended and a read by process 1 confirms that as well, however 
> it sees 553 before 455.
> process 5 reads [... 852 352 455] where as process 1 reads [... 852 352 553 
> 455

[jira] [Updated] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-11 Thread Henrik Ingo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik Ingo updated CASSANDRA-18798:

Mentor: Jacek Lewandowski
Resolution: Fixed
Status: Resolved  (was: Open)

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692607286281483026}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 56    
> :time 1692607286284097561}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 1    :n 56    :time 1692607286306445242}
> {code}
> Processes process 6 and process 7 are appending the values 553 and 455 
> respectively. 455 succeeded and a read by process 5 confirms that. But then 
> also 553 is appended and a read by process 1 confirms that as well, however 
> it sees 553 before 455.
> process 5 reads [... 852 352 455] where as process 1 r

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-11 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774270#comment-17774270
 ] 

Henrik Ingo commented on CASSANDRA-18798:
-

Confirmed:

First run --list-append on an old branch of cep-15-accord:

{code}
hingo@odysseus:~/Documents/github/accordclient$ java -jar 
elle-cli/target/elle-cli-0.1.6-standalone.jar --model list-append --anomalies 
G0 --consistency-models strict-serializable --directory out-la --verbose 
test-la.edn

java.lang.AssertionError: Assert failed: No transaction wrote 5 12
t2
at 
elle.list_append$dirty_update_cases$fn__1930$fn__1935.invoke(list_append.clj:377)
at clojure.lang.PersistentVector.reduce(PersistentVector.java:343)
at clojure.core$reduce.invokeStatic(core.clj:6829)
at clojure.core$reduce.invoke(core.clj:6812)
at 
elle.list_append$dirty_update_cases$fn__1930.invoke(list_append.clj:372)
at clojure.core$map$fn__5884.invoke(core.clj:2759)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:51)
at clojure.lang.Cons.next(Cons.java:39)
at clojure.lang.RT.boundedLength(RT.java:1793)
at clojure.lang.RestFn.applyTo(RestFn.java:130)
at clojure.core$apply.invokeStatic(core.clj:667)
at clojure.core$mapcat.invokeStatic(core.clj:2787)
at clojure.core$mapcat.doInvoke(core.clj:2787)
at clojure.lang.RestFn.invoke(RestFn.java:423)
at elle.list_append$dirty_update_cases.invokeStatic(list_append.clj:370)
at elle.list_append$dirty_update_cases.invoke(list_append.clj:361)
at 
elle.list_append$check$dirty_update_task__2257.invoke(list_append.clj:875)
at jepsen.history.task.Task.run(task.clj:282)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

{code}

Then today's checkout of cep-15-accord

{code}

hingo@odysseus:~/Documents/github/accordclient$ java -jar 
elle-cli/target/elle-cli-0.1.6-standalone.jar --model list-append --anomalies 
G0 --consistency-models strict-serializable --directory out-la --verbose 
test-la.edn
{"valid?":true}

{code}

(Full list of steps as in description of this ticket)

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-10 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773851#comment-17773851
 ] 

Henrik Ingo commented on CASSANDRA-18798:
-

After pulling the most recent cep-15-accord branch, it seems this issue is 
fixed: 
https://github.com/apache/cassandra/blob/cep-15-accord/src/java/org/apache/cassandra/service/accord/AccordKeyspace.java#L361-L363

I'll rerun Jaroslaw's original consistency test tomorrow to verify.

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692607286281483026}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 56    
> :time 1692607286284097561}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 1    :n 56    :time 1692607286306445242}
> {code}
> Processes process 6 and process 7 are appending the values 553 an

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-05 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772178#comment-17772178
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 10/5/23 11:35 AM:
---

Yes. Sorry, Branimir educated me on Friday about this. I thought a fix would be 
trivial aftert that, so didn't bother to summarize a comment here by then. Now 
4 days later it's obvious I should have...

Basically, the TimeUUID also needs to be (re)generated from the Accord 
executeAt timestamp. This way operations like appending to a list will result 
in a correct and consistent ordering of the list elements.

Ok, so clearly I cannot brute force my way through this with late nights. I've 
pushed a branch which contains my work so far. I'm stuck with the unit test, I 
expect the actual fix to be a one liner.

Status: 

Blocked by org.apache.cassandra.exceptions.WriteTimeoutException
for debugging accord transactions.

Beyond that, I know what I have to do wrt how timestamps affect ordering
of List elements. However in the unit test I created the ts
value still is 0, and therefore all the inserted rows end up deleted.

https://github.com/henrikingo/cassandra/tree/C-18798-ListType-Accord


was (Author: henrik.ingo):
Yes. Sorry, Branimir educated me on Friday about this. I thought a fix would be 
trivial aftert that, so didn't bother to summarize a comment here by then. Now 
4 days later it's obvious I should have...

Ok, so clearly I cannot brute force my way through this with late nights. I've 
pushed a branch which contains my work so far. I'm stuck with the unit test, I 
expect the actual fix to be a one liner.

Status: 

Blocked by org.apache.cassandra.exceptions.WriteTimeoutException
for debugging accord transactions.

Beyond that, I know what I have to do wrt how timestamps affect ordering
of List elements. However in the unit test I created the ts
value still is 0, and therefore all the inserted rows end up deleted.

https://github.com/henrikingo/cassandra/tree/C-18798-ListType-Accord

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value 

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-05 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772178#comment-17772178
 ] 

Henrik Ingo commented on CASSANDRA-18798:
-

Yes. Sorry, Branimir educated me on Friday about this. I thought a fix would be 
trivial aftert that, so didn't bother to summarize a comment here by then. Now 
4 days later it's obvious I should have...

Ok, so clearly I cannot brute force my way through this with late nights. I've 
pushed a branch which contains my work so far. I'm stuck with the unit test, I 
expect the actual fix to be a one liner.

Status: 

Blocked by org.apache.cassandra.exceptions.WriteTimeoutException
for debugging accord transactions.

Beyond that, I know what I have to do wrt how timestamps affect ordering
of List elements. However in the unit test I created the ts
value still is 0, and therefore all the inserted rows end up deleted.

https://github.com/henrikingo/cassandra/tree/C-18798-ListType-Accord

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-27 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769861#comment-17769861
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/28/23 1:05 AM:
--

[~kijanowski] When you wake up, can you  try this:

{code}
diff --git 
a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java 
b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
index 1be3d54558..3b0d7b78cc 100644
--- a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
@@ -230,7 +230,7 @@ public class UnfilteredRowIteratorSerializer
 final SerializationHeader sHeader = header.sHeader;
 return new AbstractUnfilteredRowIterator(metadata, header.key, 
header.partitionDeletion, sHeader.columns(), header.staticRow, 
header.isReversed, sHeader.stats())
 {
-private final Row.Builder builder = BTreeRow.sortedBuilder();
+private final Row.Builder builder = BTreeRow.unsortedBuilder();
 
 protected Unfiltered computeNext()
 {
diff --git a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java 
b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
index d528a70a18..22bdbc745b 100644
--- a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
@@ -455,7 +455,7 @@ public class UnfilteredSerializer
 throws IOException
 {
 // It wouldn't be wrong per-se to use an unsorted builder, but it 
would be inefficient so make sure we don't do it by mistake
-assert builder.isSorted();
+//assert builder.isSorted();
 
 int flags = in.readUnsignedByte();
 if (isEndOfPartition(flags))

{code}

 

 Note the funny naming:

sortedBuilder = data is already sorted, builder not sorting
unsortedBuilder = data is not sorted, builder makes it sorted


was (Author: henrik.ingo):
[~kijanowski] When you wake up, can you  try this:

{code}
diff --git 
a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java 
b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
index 1be3d54558..3b0d7b78cc 100644
--- a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
@@ -230,7 +230,7 @@ public class UnfilteredRowIteratorSerializer
 final SerializationHeader sHeader = header.sHeader;
 return new AbstractUnfilteredRowIterator(metadata, header.key, 
header.partitionDeletion, sHeader.columns(), header.staticRow, 
header.isReversed, sHeader.stats())
 {
-private final Row.Builder builder = BTreeRow.sortedBuilder();
+private final Row.Builder builder = BTreeRow.unsortedBuilder();
 
 protected Unfiltered computeNext()
 {
diff --git a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java 
b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
index d528a70a18..22bdbc745b 100644
--- a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
@@ -455,7 +455,7 @@ public class UnfilteredSerializer
 throws IOException
 {
 // It wouldn't be wrong per-se to use an unsorted builder, but it 
would be inefficient so make sure we don't do it by mistake
-assert builder.isSorted();
+//assert builder.isSorted();
 
 int flags = in.readUnsignedByte();
 if (isEndOfPartition(flags))

{code}

 

 

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-27 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769861#comment-17769861
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/28/23 1:03 AM:
--

[~kijanowski] When you wake up, can you  try this:

{code}
diff --git 
a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java 
b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
index 1be3d54558..3b0d7b78cc 100644
--- a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
@@ -230,7 +230,7 @@ public class UnfilteredRowIteratorSerializer
 final SerializationHeader sHeader = header.sHeader;
 return new AbstractUnfilteredRowIterator(metadata, header.key, 
header.partitionDeletion, sHeader.columns(), header.staticRow, 
header.isReversed, sHeader.stats())
 {
-private final Row.Builder builder = BTreeRow.sortedBuilder();
+private final Row.Builder builder = BTreeRow.unsortedBuilder();
 
 protected Unfiltered computeNext()
 {
diff --git a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java 
b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
index d528a70a18..22bdbc745b 100644
--- a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
@@ -455,7 +455,7 @@ public class UnfilteredSerializer
 throws IOException
 {
 // It wouldn't be wrong per-se to use an unsorted builder, but it 
would be inefficient so make sure we don't do it by mistake
-assert builder.isSorted();
+//assert builder.isSorted();
 
 int flags = in.readUnsignedByte();
 if (isEndOfPartition(flags))

{code}

 

 


was (Author: henrik.ingo):
[~kijanowski] When you wake up, can you  try this:

 

{{diff --git 
a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java 
b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java }}
{{index 1be3d54558..3b0d7b78cc 100644 }}
{{--- 
a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java }}
{{+++ 
b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java }}
{{@@ -230,7 +230,7 @@ public class UnfilteredRowIteratorSerializer }}
{{final SerializationHeader sHeader = header.sHeader; }}
{{return new AbstractUnfilteredRowIterator(metadata, header.key, 
header.partitionDeletion, sHeader.columns(), header.staticRow, 
header.isReversed, sHeader.stats()) }}
{{{ }}
{{-    private final Row.Builder builder = BTreeRow.sortedBuilder(); }}
{{+    private final Row.Builder builder = BTreeRow.unsortedBuilder(); 
}}
{{ }}
{{protected Unfiltered computeNext() }}
{{{ }}
{{diff --git a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java 
b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java }}
{{index d528a70a18..22bdbc745b 100644 }}
{{--- a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java }}
{{+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java }}
{{@@ -455,7 +455,7 @@ public class UnfilteredSerializer }}
{{throws IOException }}
{{{ }}
{{// It wouldn't be wrong per-se to use an unsorted builder, but it 
would be inefficient so make sure we don't do it by mistake }}
{{-    assert builder.isSorted(); }}
{{+    //assert builder.isSorted(); }}
{{ }}
{{int flags = in.readUnsignedByte(); }}
{{if (isEndOfPartition(flags))}}
{{}}

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from 

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-27 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769861#comment-17769861
 ] 

Henrik Ingo commented on CASSANDRA-18798:
-

[~kijanowski] When you wake up, can you  try this:

 

{{diff --git 
a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java 
b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java }}
{{index 1be3d54558..3b0d7b78cc 100644 }}
{{--- 
a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java }}
{{+++ 
b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java }}
{{@@ -230,7 +230,7 @@ public class UnfilteredRowIteratorSerializer }}
{{final SerializationHeader sHeader = header.sHeader; }}
{{return new AbstractUnfilteredRowIterator(metadata, header.key, 
header.partitionDeletion, sHeader.columns(), header.staticRow, 
header.isReversed, sHeader.stats()) }}
{{{ }}
{{-    private final Row.Builder builder = BTreeRow.sortedBuilder(); }}
{{+    private final Row.Builder builder = BTreeRow.unsortedBuilder(); 
}}
{{ }}
{{protected Unfiltered computeNext() }}
{{{ }}
{{diff --git a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java 
b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java }}
{{index d528a70a18..22bdbc745b 100644 }}
{{--- a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java }}
{{+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java }}
{{@@ -455,7 +455,7 @@ public class UnfilteredSerializer }}
{{throws IOException }}
{{{ }}
{{// It wouldn't be wrong per-se to use an unsorted builder, but it 
would be inefficient so make sure we don't do it by mistake }}
{{-    assert builder.isSorted(); }}
{{+    //assert builder.isSorted(); }}
{{ }}
{{int flags = in.readUnsignedByte(); }}
{{if (isEndOfPartition(flags))}}
{{}}

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value 

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-27 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769677#comment-17769677
 ] 

Henrik Ingo commented on CASSANDRA-18798:
-

Thanks! This somewhat confirms the theory then. The only exception is that this 
isn't about loss of precision at all. All of those timestamps are unique and 
the problem is just that the ListType isn't sorting at all now.

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692607286281483026}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 56    
> :time 1692607286284097561}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 1    :n 56    :time 1692607286306445242}
> {code}
> Processes process 6 and process 7 are appending the values 553 and 455 
> respectively. 455 succeeded and a read by process 5

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-27 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/27/23 3:41 PM:
--

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

Part of the reason to write all of the below is that I'm seeking guidance on 
the ListType. What is it even supposed to do?

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
    
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
   

    _src/java/org/apache/cassandra/db/rows/BTreeRow.java_

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

The Accord originated timestamps are easy to spot with their 3 trailing zeros:

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-5-big-Data.db  }}
{{[2]@241 Row[info=[ts=1695222739434337] ]:  | 
del(names)=deletedAt=1695222739434336, localDeletion=1695222739, 
[names[177f79d0-57c8-11ee-b578-7dbb261b6e16]=Albert ts=1695222739434337], 
[names[177f79da-57c8-11ee-b578-7dbb261b6e16]=Ebba ts=1695222739434337], 
[names[3d4371d0-}}
{{57c8-11ee-b578-7dbb261b6e16]=poppari ts=1695222802794082]}}

 

$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-26-big-Data.db  
[2]@0 Row[info=[ts=-9223372036854775808] ]:  | , 
[names[6d8b88c0-5979-11ee-9ee4-1ff7dd1e5050]=HENKKA ts=1695408855885000]

 
h2. ListType

My understanding is that a ListType is expected to return the elements of the 
list sorted by their timestamp. Some elements don't have a timestamp of their 
own, in which case they use the timestamp from the row header + physical order.

When a ListType is read from disk and deserialized, a good point to start 
observing what happens is in BTreeRow.Builder.build():

 

{{        public Row build()}}
{{        {}}
{{            if (isSorted || !isSorted)}}
{{                getCells().sort();}}
{{            // we can avoid resolving if we're sorted and have no complex 
values}}
{{ 

[jira] [Updated] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-27 Thread Henrik Ingo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik Ingo updated CASSANDRA-18798:

 Bug Category: Parent values: Correctness(12982)Level 1 values: 
Consistency(12989)
   Complexity: Normal
Discovered By: Adhoc Test
Reviewers: Jaroslaw Kijanowski
 Severity: Normal
 Assignee: Henrik Ingo
   Status: Open  (was: Triage Needed)

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692607286281483026}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 56    
> :time 1692607286284097561}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 1    :n 56    :time 1692607286306445242}
> {code}
> Processes process 6 and process 7 are appending the values 553 and 455 
> respectively. 455 succeeded and a read by proc

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-26 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/26/23 5:19 PM:
--

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

Part of the reason to write all of the below is that I'm seeking guidance on 
the ListType. What is it even supposed to do?

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
    
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
   

    _src/java/org/apache/cassandra/db/rows/BTreeRow.java_

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

The Accord originated timestamps are easy to spot with their 3 trailing zeros:

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-5-big-Data.db  }}
{{[2]@241 Row[info=[ts=1695222739434337] ]:  | 
del(names)=deletedAt=1695222739434336, localDeletion=1695222739, 
[names[177f79d0-57c8-11ee-b578-7dbb261b6e16]=Albert ts=1695222739434337], 
[names[177f79da-57c8-11ee-b578-7dbb261b6e16]=Ebba ts=1695222739434337], 
[names[3d4371d0-}}
{{57c8-11ee-b578-7dbb261b6e16]=poppari ts=1695222802794082]}}

 

$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-26-big-Data.db  
[2]@0 Row[info=[ts=-9223372036854775808] ]:  | , 
[names[6d8b88c0-5979-11ee-9ee4-1ff7dd1e5050]=HENKKA ts=1695408855885000]

 
h2. ListType

My understanding is that a ListType is expected to return the elements of the 
list sorted by their timestamp. Some elements don't have a timestamp of their 
own, in which case they use the timestamp from the row header + physical order.

When a ListType is read from disk and deserialized, a good point to start 
observing what happens is in BTreeRow.Builder.build():

 

{{        public Row build()}}
{{        {}}
{{            if (isSorted || !isSorted)}}
{{                getCells().sort();}}
{{            // we can avoid resolving if we're sorted and have no complex 
values}}
{{ 

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-26 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/26/23 5:05 PM:
--

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

Part of the reason to write all of the below is that I'm seeking guidance on 
the ListType. What is it even supposed to do?

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
    
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
   

    _src/java/org/apache/cassandra/db/rows/BTreeRow.java_

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

The Accord originated timestamps are easy to spot with their 3 trailing zeros:

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-5-big-Data.db  }}
{{[2]@241 Row[info=[ts=1695222739434337] ]:  | 
del(names)=deletedAt=1695222739434336, localDeletion=1695222739, 
[names[177f79d0-57c8-11ee-b578-7dbb261b6e16]=Albert ts=1695222739434337], 
[names[177f79da-57c8-11ee-b578-7dbb261b6e16]=Ebba ts=1695222739434337], 
[names[3d4371d0-}}
{{57c8-11ee-b578-7dbb261b6e16]=poppari ts=1695222802794082]}}

 

$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-26-big-Data.db  
[2]@0 Row[info=[ts=-9223372036854775808] ]:  | , 
[names[6d8b88c0-5979-11ee-9ee4-1ff7dd1e5050]=HENKKA ts=1695408855885000]

 
h2. ListType

My understanding is that a ListType is expected to return the elements of the 
list sorted by their timestamp. Some elements don't have a timestamp of their 
own, in which case they use the timestamp from the row header + physical order.

When a ListType is read from disk and deserialized, a good point to start 
observing what happens is in BTreeRow.Builder.build():

 

{{        public Row build()}}
{{        {}}
{{            if (isSorted || !isSorted)}}
{{                getCells().sort();}}
{{            // we can avoid resolving if we're sorted and have no complex 
values}}
{{ 

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-26 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/26/23 5:04 PM:
--

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

Part of the reason to write all of the below is that I'm seeking guidance on 
the ListType. What is it even supposed to do?

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
    
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
   

    _src/java/org/apache/cassandra/db/rows/BTreeRow.java_

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

The Accord originated timestamps are easy to spot with their 3 trailing zeros:

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-5-big-Data.db  }}
{{[2]@241 Row[info=[ts=1695222739434337] ]:  | 
del(names)=deletedAt=1695222739434336, localDeletion=1695222739, 
[names[177f79d0-57c8-11ee-b578-7dbb261b6e16]=Albert ts=1695222739434337], 
[names[177f79da-57c8-11ee-b578-7dbb261b6e16]=Ebba ts=1695222739434337], 
[names[3d4371d0-}}
{{57c8-11ee-b578-7dbb261b6e16]=poppari ts=1695222802794082]}}

 

$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-26-big-Data.db  
[2]@0 Row[info=[ts=-9223372036854775808] ]:  | , 
[names[6d8b88c0-5979-11ee-9ee4-1ff7dd1e5050]=HENKKA ts=1695408855885000]

 
h2. ListType

My understanding is that a ListType is expected to return the elements of the 
list sorted by their timestamp. Some elements don't have a timestamp of their 
own, in which case they use the timestamp from the row header + physical order.

When a ListType is read from disk and deserialized, a good point to start 
observing what happens is in BTreeRow.Builder.build():

 

{{        public Row build()}}
{{        {}}
{{            if (isSorted || !isSorted)}}
{{                getCells().sort();}}
{{            // we can avoid resolving if we're sorted and have no complex 
values}}
{{ 

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-26 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/26/23 2:55 PM:
--

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
    
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
    

    _src/java/org/apache/cassandra/db/rows/BTreeRow.java_

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

The Accord originated timestamps are easy to spot with their 3 trailing zeros:

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-5-big-Data.db  }}
{{[2]@241 Row[info=[ts=1695222739434337] ]:  | 
del(names)=deletedAt=1695222739434336, localDeletion=1695222739, 
[names[177f79d0-57c8-11ee-b578-7dbb261b6e16]=Albert ts=1695222739434337], 
[names[177f79da-57c8-11ee-b578-7dbb261b6e16]=Ebba ts=1695222739434337], 
[names[3d4371d0-}}
{{57c8-11ee-b578-7dbb261b6e16]=poppari ts=1695222802794082]}}



 

$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-26-big-Data.db  
[2]@0 Row[info=[ts=-9223372036854775808] ]:  | , 
[names[6d8b88c0-5979-11ee-9ee4-1ff7dd1e5050]=HENKKA ts=1695408855885000]



 
h2. ListType

 

 

 


was (Author: henrik.ingo):
Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simul

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-26 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/26/23 2:54 PM:
--

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
    
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
    

    _src/java/org/apache/cassandra/db/rows/BTreeRow.java_

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

The Accord originated timestamps are easy to spot with their 3 trailing zeros:

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-5-big-Data.db  }}
{{[2]@241 Row[info=[ts=1695222739434337] ]:  | 
del(names)=deletedAt=1695222739434336, localDeletion=1695222739, 
[names[177f79d0-57c8-11ee-b578-7dbb261b6e16]=Albert ts=1695222739434337], 
[names[177f79da-57c8-11ee-b578-7dbb261b6e16]=Ebba ts=1695222739434337], 
[names[3d4371d0-}}
{{57c8-11ee-b578-7dbb261b6e16]=poppari ts=1695222802794082]}}



 

{{ $ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-26-big-Data.db  
[2]@0 Row[info=[ts=-9223372036854775808] ]:  | , 
[names[6d8b88c0-5979-11ee-9ee4-1ff7dd1e5050]=HENKKA ts=1695408855885000] }}



 
h2. ListType

 

 

 


was (Author: henrik.ingo):
Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-26 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/26/23 2:54 PM:
--

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
    
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
    

    _src/java/org/apache/cassandra/db/rows/BTreeRow.java_

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

The Accord originated timestamps are easy to spot with their 3 trailing zeros:

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-5-big-Data.db  }}
{{[2]@241 Row[info=[ts=1695222739434337] ]:  | 
del(names)=deletedAt=1695222739434336, localDeletion=1695222739, 
[names[177f79d0-57c8-11ee-b578-7dbb261b6e16]=Albert ts=1695222739434337], 
[names[177f79da-57c8-11ee-b578-7dbb261b6e16]=Ebba ts=1695222739434337], 
[names[3d4371d0-}}
{{57c8-11ee-b578-7dbb261b6e16]=poppari ts=1695222802794082]}}



 

$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-26-big-Data.db  
[2]@0 Row[info=[ts=-9223372036854775808] ]:  | , 
[names[6d8b88c0-5979-11ee-9ee4-1ff7dd1e5050]=HENKKA ts=1695408855885000]



 
h2. ListType

 

 

 


was (Author: henrik.ingo):
Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simul

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-26 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/26/23 2:53 PM:
--

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
    
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
    

    _src/java/org/apache/cassandra/db/rows/BTreeRow.java_

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

The Accord originated timestamps are easy to spot with their 3 trailing zeros:

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-5-big-Data.db  }}
{{[2]@241 Row[info=[ts=1695222739434337] ]:  | 
del(names)=deletedAt=1695222739434336, localDeletion=1695222739, 
[names[177f79d0-57c8-11ee-b578-7dbb261b6e16]=Albert ts=1695222739434337], 
[names[177f79da-57c8-11ee-b578-7dbb261b6e16]=Ebba ts=1695222739434337], 
[names[3d4371d0-}}
{{57c8-11ee-b578-7dbb261b6e16]=poppari ts=1695222802794082]}}



 

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-26-big-Data.db  
[2]@0 Row[info=[ts=-9223372036854775808] ]:  | , 
[names[6d8b88c0-5979-11ee-9ee4-1ff7dd1e5050]=HENKKA ts=1695408855885000]



 
h2. ListType

 

 

 


was (Author: henrik.ingo):
Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients sim

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-26 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/26/23 2:53 PM:
--

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
    
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
{{    }}

{{    }}_src/java/org/apache/cassandra/db/rows/BTreeRow.java_

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

The Accord originated timestamps are easy to spot with their 3 trailing zeros:

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-5-big-Data.db  }}
{{[2]@241 Row[info=[ts=1695222739434337] ]:  | 
del(names)=deletedAt=1695222739434336, localDeletion=1695222739, 
[names[177f79d0-57c8-11ee-b578-7dbb261b6e16]=Albert ts=1695222739434337], 
[names[177f79da-57c8-11ee-b578-7dbb261b6e16]=Ebba ts=1695222739434337], 
[names[3d4371d0-}}
{{57c8-11ee-b578-7dbb261b6e16]=poppari ts=1695222802794082]}}



 

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-26-big-Data.db  
[2]@0 Row[info=[ts=-9223372036854775808] ]:  | , 
[names[6d8b88c0-5979-11ee-9ee4-1ff7dd1e5050]=HENKKA ts=1695408855885000]



 
h2. ListType

 

 

 


was (Author: henrik.ingo):
Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two cli

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-26 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/26/23 2:52 PM:
--

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_{{}}modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
    
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
{{    }}

{{    }}{_}src/java/org/apache/cassandra/db/rows/BTreeRow.java{_}

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

The Accord originated timestamps are easy to spot with their 3 trailing zeros:

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-5-big-Data.db  }}
{{[2]@241 Row[info=[ts=1695222739434337] ]:  | 
del(names)=deletedAt=1695222739434336, localDeletion=1695222739, 
[names[177f79d0-57c8-11ee-b578-7dbb261b6e16]=Albert ts=1695222739434337], 
[names[177f79da-57c8-11ee-b578-7dbb261b6e16]=Ebba ts=1695222739434337], 
[names[3d4371d0-}}
{{57c8-11ee-b578-7dbb261b6e16]=poppari ts=1695222802794082]}}



 

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-26-big-Data.db  
[2]@0 Row[info=[ts=-9223372036854775808] ]:  | , 
[names[6d8b88c0-5979-11ee-9ee4-1ff7dd1e5050]=HENKKA ts=1695408855885000]



 
h2. ListType

 

 

 


was (Author: henrik.ingo):
Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-26 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/26/23 2:52 PM:
--

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_{{}}modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
{{    }}
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
{{    }}

{{    }}{_}src/java/org/apache/cassandra/db/rows/BTreeRow.java{_}

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

The Accord originated timestamps are easy to spot with their 3 trailing zeros:

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-5-big-Data.db  }}
{{[2]@241 Row[info=[ts=1695222739434337] ]:  | 
del(names)=deletedAt=1695222739434336, localDeletion=1695222739, 
[names[177f79d0-57c8-11ee-b578-7dbb261b6e16]=Albert ts=1695222739434337], 
[names[177f79da-57c8-11ee-b578-7dbb261b6e16]=Ebba ts=1695222739434337], 
[names[3d4371d0-}}
{{57c8-11ee-b578-7dbb261b6e16]=poppari ts=1695222802794082]}}



 

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-26-big-Data.db  
[2]@0 Row[info=[ts=-9223372036854775808] ]:  | , 
[names[6d8b88c0-5979-11ee-9ee4-1ff7dd1e5050]=HENKKA ts=1695408855885000]

}}

 
h2. ListType

 

 

 


was (Author: henrik.ingo):
Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a lis

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-26 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo edited comment on CASSANDRA-18798 at 9/26/23 2:50 PM:
--

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_{{}}modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
{{    {}}
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
{\{    }}}

{\{    }}{_}src/java/org/apache/cassandra/db/rows/BTreeRow.java{_}

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

The Accord originated timestamps are easy to spot with their 3 trailing zeros:

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-5-big-Data.db  }}
{{[2]@241 Row[info=[ts=1695222739434337] ]:  | 
del(names)=deletedAt=1695222739434336, localDeletion=1695222739, 
[names[177f79d0-57c8-11ee-b578-7dbb261b6e16]=Albert ts=1695222739434337], 
[names[177f79da-57c8-11ee-b578-7dbb261b6e16]=Ebba ts=1695222739434337], 
[names[3d4371d0-}}
{{57c8-11ee-b578-7dbb261b6e16]=poppari ts=1695222802794082]}}



 

{{$ tools/bin/sstabledump -d -t 
data/data/myspace/listtest-8574ceb057c611eeb5787dbb261b6e16/nc-26-big-Data.db  
[2]@0 Row[info=[ts=-9223372036854775808] ]:  | , 
[names[6d8b88c0-5979-11ee-9ee4-1ff7dd1e5050]=HENKKA ts=1695408855885000]

}}

 
h2. ListType

 

 

 


was (Author: henrik.ingo):
Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-09-26 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769223#comment-17769223
 ] 

Henrik Ingo commented on CASSANDRA-18798:
-

Okay, finally got to the bottom of this. Report of findings follows:

 

TL:DR; Accord is correctly propagating the {{executeAt}} timestamp into the 
legacy {{timestamp}} and {{executeNow}} fields. There's loss of precision 
though, so appending  to the list twice within the same millisecond is the 
likely explanation of this particular symptom. Underneath this, it's however 
the {{ListType}} itself that is broken, at least for the read path. 

 
h2. Accord

The ListType is internally like a table/partition of BTreeRow's, that are 
sorted by their timestamp. This makes lists ecentualy consistent: The 
application can append entries to a list from two clients simultaneously, and 
the ordering of the resulting list, once all elements have "arrived", is 
deterministic. The initial hyptohesis for my research was that the Accord 
executeAt timestamp isn't correctly propagated into each list element 
(BTreeRow). However, this is not the case:

 

Once an Accord transaction has determined its transaction id, called executeAt 
in Cassandra, and we arrive at the write portion of the exeuction phase, we 
have this:

{{        cfk.updateLastExecutionTimestamps(executeAt, true);}}
{{        long timestamp = cfk.current().timestampMicrosFor(executeAt, true);}}
{{        // TODO (low priority - do we need to compute nowInSeconds, or can we 
just use executeAt?)}}
{{        int nowInSeconds = cfk.current().nowInSecondsFor(executeAt, true);}}

        
_{{}}modules/accord/accord-core/src/main/java/accord/primitives/Timestamp.java_

This eventually reaches 

 

{{    public Row updateAllTimestamp(long newTimestamp)}}
{{    {}}
{{        LivenessInfo newInfo = primaryKeyLivenessInfo.isEmpty() ? 
primaryKeyLivenessInfo : 
primaryKeyLivenessInfo.withUpdatedTimestamp(newTimestamp);}}
{{        // If the deletion is shadowable and the row has a timestamp, we'll 
forced the deletion timestamp to be less than the row one, so we}}
{{        // should get rid of said deletion.}}
{{        Deletion newDeletion = deletion.isLive() || (deletion.isShadowable() 
&& !primaryKeyLivenessInfo.isEmpty())}}
{{                             ? Deletion.LIVE}}
{{                             : new Deletion(new DeletionTime(newTimestamp - 
1, deletion.time().localDeletionTime()), deletion.isShadowable());}}

{{        return transformAndFilter(newInfo, newDeletion, (cd) -> 
cd.updateAllTimestamp(newTimestamp));}}
{{    }}}

{{    }}_src/java/org/apache/cassandra/db/rows/BTreeRow.java_

 

The only problem I can see is loss of precision: This call will use the hlc() 
part of the executeAt timestamp, and not the node id (nor epoch). It seems 
possible and even likely that two different nodes will append to the same list 
during the same millisecond. After this, the ordering of those two (BTreeRow) 
elements is deterministic but arbitrary, and not guaranteed to be the same as 
the Accord transactions that wrote them.

Also note the loss of precision is unnecessary! Cassandra legacy timestamps are 
microseconds, but Accord has only millisecond precision. A better 
implementation here would be to use the last 3 digits of the timestamp field to 
encode the node id, and maybe also epoch.

 
h2. ListType

 

 

 

> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Priority: Normal
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :v

[jira] [Commented] (CASSANDRA-18682) Create TLA+ spec of Accord

2023-08-29 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760152#comment-17760152
 ] 

Henrik Ingo commented on CASSANDRA-18682:
-

Ok so after "sleeping on it" quite a few nights, and doing other tasks in 
between, I returned to this project with some success. I removed all PlusCal 
code and just did everything in TLA syntax. This greatly improves both 
robustness and clarity.

What I pushed yesterday 
(https://github.com/henrikingo/cassandra-accord/commit/25c43b98ed15d5762aeeab8aa1539fa0f00b9458)
 is based on modeling Accord such that each "operator" (I mean function) is a 
coordinator or "writer" node in a Cassandra cluster, and these steps are 
connected (or separated) by message queues. This is intuitive way to think 
about Accord, and luckily it seems to work. If you try to run it, just  note 
that TLC is less concerned about the implementation completing an end to end 
trip for a given transaction, and more focused on running it through every 
possible permutation of input variables.

I have also been working on an approach where the algorithm from the white 
paper is pretty much 1:1 mapped into TLA+ syntax. I believe this is how TLA+ is 
designed to be used. And certainly it would give more confidence in the results 
of this project if the TLA+ implementation is such that it is crystal clear 
that the TLA+ code definitely does the same thing as the algorithm in the paper.

In the latter approach it wasn't at first obvious how the parts that are 
executed by different nodes/shards... can be modeled. But now that I've seen 
more, it might be possible. 

Even so, going to focus on the message queue approach first.

> Create TLA+ spec of Accord
> --
>
> Key: CASSANDRA-18682
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18682
> Project: Cassandra
>  Issue Type: Task
>  Components: Accord
>Reporter: Henrik Ingo
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.x
>
>
> Create a TLA+ Spec of Accord.
>  
> For this ticket, goal is just to cover Algorithm 1. No significant 
> discoveries are expected, and to really check correctness, one will have to 
> implement all 4 algorithms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18682) Create TLA+ spec of Accord

2023-07-28 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748625#comment-17748625
 ] 

Henrik Ingo edited comment on CASSANDRA-18682 at 7/28/23 2:11 PM:
--

Weekly status update:

 
 * Decided to get rid of PlusCal. It's easy, the PlusCal code is compiled into 
TLA+, so just remove the PlusCal and you're left with the equivalent TLA+ 
implementation.
 * TLA Toolbox still feels it's on shaky ground. Decided to try 
[PlusPy|https://github.com/tlaplus/PlusPy], a Python implementation of the TLA+ 
syntax. One motivation here is that by reading the Python code, it could be 
easier to at least understand what it is trying to do. In the extreme case one 
could make careful modifications to avoid some of the oddest TLA+ syntax...
 * However, PlusPy quite early on raises an exception on me. After two days I 
found out what was wrong (variables need to be initialized). Patch below. After 
fixing that issue, there's now an exception because it doesn't find the Init 
stage... In summary, this could work, but how confident would we be that it is 
checking anything correctly at all, if I have to supply N fixes to even run the 
thing.
 * Took a step back and decided to read up on alternatives to TLA+. Through 
[Wikipedia|https://en.wikipedia.org/wiki/List_of_model_checking_tools], found 
[Mazzanti, Franco; Ferrari, Alessio (2018)|https://arxiv.org/abs/1803.10324v1] 
who implemented the same algorithm in 10 different tools and share their 
results. [They later have produced a 100+ page report surveying what tools are 
used the 
most.|http://www.astrail.eu/download.aspx?id=bb46b81b-a5bf-4036-9018-cc6e7d91e2c2]
 TLA+ isn't one of them...
 * Based on that report, I'm curious now  to test the [Eclipse based Rodin 
framework,|http://www.event-b.org/] and the Event-B language it uses. That's 
for next week.

{{diff --git a/pluspy.py b/pluspy.py}}
{{index d1ba07a..2103fcf 100644}}
{{--- a/pluspy.py}}
{{+++ b/pluspy.py}}
{{@@ -2185,7 +2185,10 @@ class VariableExpression(Expression):}}
{{             if initializing:}}
{{                 return PrimeExpression(expr=subs[self])}}
{{             else:}}
{{-                return subs[self]}}
{{+                v = subs[self]}}
{{+                if isinstance(v,bool):}}
{{+                    v = ValueExpression(v)}}
{{+                return v}}
{{ }}
{{     def eval(self, containers, boundedvars):}}
{{         print("Error: variable", self.id, "not realized", containers, 
boundedvars)}}


was (Author: henrik.ingo):
Weekly status update:

 
 * Decided to get rid of PlusCal. It's easy, the PlusCal code is compiled into 
TLA+, so just remove the PlusCal and you're left with the equivalent TLA+ 
implementation.
 * TLA Toolbox still feels it's on shaky ground. Decided to try 
[PlusPy|https://github.com/tlaplus/PlusPy], a Python implementation of the TLA+ 
syntax. One motivation here is that by reading the Python code, it could be 
easier to at least understand what it is trying to do. In the extreme case one 
could make careful modifications to avoid some of the oddest TLA+ syntax...
 * However, PlusPy quite early on raises an exception on me. After two days I 
found out what was wrong (variables need to be initialized). Patch below. After 
fixing that issue, there's now an exception because it doesn't find the Init 
stage... In summary, this could work, but how confident would we be that it is 
checking anything correctly at all, if I have to supply N fixes to even run the 
thing.
 * Took a step back and decided to read up on alternatives to TLA+. Through 
[Wikipedia|https://en.wikipedia.org/wiki/List_of_model_checking_tools], found 
[Mazzanti, Franco; Ferrari, Alessio (2018)|https://arxiv.org/abs/1803.10324v1] 
who implemented the same algorithm in 10 different tools and share their 
results. [They later have produced a 100+ page report surveying what tools are 
used the 
most.|http://www.astrail.eu/download.aspx?id=bb46b81b-a5bf-4036-9018-cc6e7d91e2c2]
 TLA+ isn't one of them...
 * Based on that report, I'm curious now  to test the [Eclipse based Rodin 
framework,|http://www.event-b.org/] and the Event-B language it uses. That's 
for next week.

> Create TLA+ spec of Accord
> --
>
> Key: CASSANDRA-18682
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18682
> Project: Cassandra
>  Issue Type: Task
>  Components: Accord
>Reporter: Henrik Ingo
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.x
>
>
> Create a TLA+ Spec of Accord.
>  
> For this ticket, goal is just to cover Algorithm 1. No significant 
> discoveries are expected, and to really check correctness, one will have to 
> implement all 4 algorithms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-

[jira] [Commented] (CASSANDRA-18682) Create TLA+ spec of Accord

2023-07-28 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748625#comment-17748625
 ] 

Henrik Ingo commented on CASSANDRA-18682:
-

Weekly status update:

 
 * Decided to get rid of PlusCal. It's easy, the PlusCal code is compiled into 
TLA+, so just remove the PlusCal and you're left with the equivalent TLA+ 
implementation.
 * TLA Toolbox still feels it's on shaky ground. Decided to try 
[PlusPy|https://github.com/tlaplus/PlusPy], a Python implementation of the TLA+ 
syntax. One motivation here is that by reading the Python code, it could be 
easier to at least understand what it is trying to do. In the extreme case one 
could make careful modifications to avoid some of the oddest TLA+ syntax...
 * However, PlusPy quite early on raises an exception on me. After two days I 
found out what was wrong (variables need to be initialized). Patch below. After 
fixing that issue, there's now an exception because it doesn't find the Init 
stage... In summary, this could work, but how confident would we be that it is 
checking anything correctly at all, if I have to supply N fixes to even run the 
thing.
 * Took a step back and decided to read up on alternatives to TLA+. Through 
[Wikipedia|https://en.wikipedia.org/wiki/List_of_model_checking_tools], found 
[Mazzanti, Franco; Ferrari, Alessio (2018)|https://arxiv.org/abs/1803.10324v1] 
who implemented the same algorithm in 10 different tools and share their 
results. [They later have produced a 100+ page report surveying what tools are 
used the 
most.|http://www.astrail.eu/download.aspx?id=bb46b81b-a5bf-4036-9018-cc6e7d91e2c2]
 TLA+ isn't one of them...
 * Based on that report, I'm curious now  to test the [Eclipse based Rodin 
framework,|http://www.event-b.org/] and the Event-B language it uses. That's 
for next week.

> Create TLA+ spec of Accord
> --
>
> Key: CASSANDRA-18682
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18682
> Project: Cassandra
>  Issue Type: Task
>  Components: Accord
>Reporter: Henrik Ingo
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.x
>
>
> Create a TLA+ Spec of Accord.
>  
> For this ticket, goal is just to cover Algorithm 1. No significant 
> discoveries are expected, and to really check correctness, one will have to 
> implement all 4 algorithms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18682) Create TLA+ spec of Accord

2023-07-24 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746648#comment-17746648
 ] 

Henrik Ingo commented on CASSANDRA-18682:
-

PHP was the 5th or 6th programming language I learned, and also it was the 
first time I could pick up a new programming language just by starting to code 
and reading the manual as needed. They are all variations of each other, and in 
2 weeks you could learn a 7th and 8th language easily...

 

TLA+ was... hard. I actually tried to watch the official video tutorials from 
Leslie Lamport, but the problem is he teaches math and I gave up when the 3rd 
video still didn't show any TLA syntax... Hillel's learntla.com was better, and 
got me this far, but... It seems like a joke to think that this language is 
used to prove correctness and robustness of algorithms. This is the sillyest, 
most fragile, booby trapped language I've ever seen. Anyway...

The linked branch contains the beginnings of a TLA implementation as a TLA+ 
Spec.

Current status is that there's a key range from which a transaction will pick a 
set of keys that the transaction operates on. (There is no payload and nothing 
is done to the keys other than checking whether a concurrent trx might be 
holding the same key.) Similarly each transaction has a t_0 and  T timestamps. 
A complete transaction

{{    newTrx := <>;}}

is passed to all nodes and back to the coordinator. However...
 * Only the fast path, so the beginning of algorithm1 is implemented so far.
 * Partitioning is skipped, so all nodes always hold all keys.
 * deps are not actually collected, it is always the empty set
 * Consequently, there isn't really any checking for conflicting transactions, 
because the deps to check aren't there.

 

In addition:
 * I chose to implement this is PlusCal, mostly because learntla/Hillel seems 
to recommend it. (And Leslie recommends TLA+, which is a god reason to NOT use 
it...)
 * Looking at the above now, I'll probably do a second attempt where it's just 
raw TLA+. PlusCal seems to make a hard language actually whimsical. For example 
it introduces two new ways to assign to a variable: := and =. It depends on the 
context which one to use.
 * I'm considering using PlusPy as the interpreter and checker. This way I 
could easily follow in python what is actually happening. I'm also tempted to 
just redefine TLA+ to be less crazy, starting with using = for assignment. 
Finally, in PlusPy you have the option to use python for a section instead of 
TLA. For example choosing a set of numbers of varying, random size, should not 
be a hard problem, but it is in TLA.

As I'm writing this it's unclear to me whether I will continue with this 
tomorrow or whether I was asked to context switch to another Accord related 
task.

 

> Create TLA+ spec of Accord
> --
>
> Key: CASSANDRA-18682
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18682
> Project: Cassandra
>  Issue Type: Task
>  Components: Accord
>Reporter: Henrik Ingo
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.x
>
>
> Create a TLA+ Spec of Accord.
>  
> For this ticket, goal is just to cover Algorithm 1. No significant 
> discoveries are expected, and to really check correctness, one will have to 
> implement all 4 algorithms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18682) Create TLA+ spec of Accord

2023-07-24 Thread Henrik Ingo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik Ingo updated CASSANDRA-18682:

Change Category: Quality Assurance
 Complexity: Challenging
  Fix Version/s: 5.x
 Status: Open  (was: Triage Needed)

> Create TLA+ spec of Accord
> --
>
> Key: CASSANDRA-18682
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18682
> Project: Cassandra
>  Issue Type: Task
>  Components: Accord
>Reporter: Henrik Ingo
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 5.x
>
>
> Create a TLA+ Spec of Accord.
>  
> For this ticket, goal is just to cover Algorithm 1. No significant 
> discoveries are expected, and to really check correctness, one will have to 
> implement all 4 algorithms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-18682) Create TLA+ spec of Accord

2023-07-24 Thread Henrik Ingo (Jira)
Henrik Ingo created CASSANDRA-18682:
---

 Summary: Create TLA+ spec of Accord
 Key: CASSANDRA-18682
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18682
 Project: Cassandra
  Issue Type: Task
  Components: Accord
Reporter: Henrik Ingo
Assignee: Henrik Ingo


Create a TLA+ Spec of Accord.

 

For this ticket, goal is just to cover Algorithm 1. No significant discoveries 
are expected, and to really check correctness, one will have to implement all 4 
algorithms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18260) Add details to Error message: Not enough space for compaction

2023-04-28 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717756#comment-17717756
 ] 

Henrik Ingo commented on CASSANDRA-18260:
-

Ok I have added whitespace also the patch against trunk.

 

If I'm following correctly, I've addressed all comments, except for the DEBUG 
log message, which isn't introduced by this patch, it is merely adjacent and 
copied from one branch to another when backporting.

So I would suggest we merge and close this, and if there is sufficient 
movitvation to remove the DEBUG message, that can easily be done in a separate 
commit later.

> Add details to Error message: Not enough space for compaction 
> --
>
> Key: CASSANDRA-18260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18260
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Observability/Logging
>Reporter: Brad Schoening
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When compaction fails, the log message should list a) the free space 
> available on disk at that point in time and b) perhaps the number and/or size 
> of the source sstables being compacted.
> Free space can change from one moment to the next, so when the below 
> compaction failed because it needed 161GB, upon checking the server a few 
> minutes later, it had 184GB free.  Similarly, the error message mentions it 
> was writing out one SSTable on this STCS table, but its not clear if it was 
> combining X -> 1 tables, or something else.
> {noformat}
> [INFO ] [CompactionExecutor:77758] cluster_id=87 ip_address=127.1.1.1  
> CompactionTask.java:241 - Compacted (8a1cffe0-abb5-11ed-b3fc-8d2ac2c52f0d) 1 
> sstables to [...] to level=0.  86.997GiB to 86.997GiB (~99% of original) in 
> 1,508,155ms.  Read Throughput = 59.069MiB/s, Write Throughput = 59.069MiB/s, 
> Row Throughput = ~20,283/s.  21,375 total partitions merged to 21,375.  
> Partition merge counts were \{1:21375, }
> [ERROR] [CompactionExecutor:4] cluster_id=87 ip_address=127.1.1.1  
> CassandraDaemon.java:581 - Exception in thread 
> Thread[CompactionExecutor:4,1,main]
> java.lang.RuntimeException: Not enough space for compaction, estimated 
> sstables = 1, expected write size = 161228934093
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.buildCompactionCandidatesForAvailableDiskSpace(CompactionTask.java:386)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:126)
>     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:77)
>     at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$7.execute(CompactionManager.java:613)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:377)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18260) Add details to Error message: Not enough space for compaction

2023-04-20 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714610#comment-17714610
 ] 

Henrik Ingo commented on CASSANDRA-18260:
-

[~maxwellguo] For reference, this is the patch against trunk: 
[https://github.com/apache/cassandra/pull/2244]

 

And yes indeed, they don't have much in common.

> Add details to Error message: Not enough space for compaction 
> --
>
> Key: CASSANDRA-18260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18260
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Observability/Logging
>Reporter: Brad Schoening
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When compaction fails, the log message should list a) the free space 
> available on disk at that point in time and b) perhaps the number and/or size 
> of the source sstables being compacted.
> Free space can change from one moment to the next, so when the below 
> compaction failed because it needed 161GB, upon checking the server a few 
> minutes later, it had 184GB free.  Similarly, the error message mentions it 
> was writing out one SSTable on this STCS table, but its not clear if it was 
> combining X -> 1 tables, or something else.
> {noformat}
> [INFO ] [CompactionExecutor:77758] cluster_id=87 ip_address=127.1.1.1  
> CompactionTask.java:241 - Compacted (8a1cffe0-abb5-11ed-b3fc-8d2ac2c52f0d) 1 
> sstables to [...] to level=0.  86.997GiB to 86.997GiB (~99% of original) in 
> 1,508,155ms.  Read Throughput = 59.069MiB/s, Write Throughput = 59.069MiB/s, 
> Row Throughput = ~20,283/s.  21,375 total partitions merged to 21,375.  
> Partition merge counts were \{1:21375, }
> [ERROR] [CompactionExecutor:4] cluster_id=87 ip_address=127.1.1.1  
> CassandraDaemon.java:581 - Exception in thread 
> Thread[CompactionExecutor:4,1,main]
> java.lang.RuntimeException: Not enough space for compaction, estimated 
> sstables = 1, expected write size = 161228934093
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.buildCompactionCandidatesForAvailableDiskSpace(CompactionTask.java:386)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:126)
>     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:77)
>     at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$7.execute(CompactionManager.java:613)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:377)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-18260) Add details to Error message: Not enough space for compaction

2023-04-19 Thread Henrik Ingo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik Ingo updated CASSANDRA-18260:

Mentor: Michael Semb Wever
Status: Review In Progress  (was: Changes Suggested)

> Add details to Error message: Not enough space for compaction 
> --
>
> Key: CASSANDRA-18260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18260
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Observability/Logging
>Reporter: Brad Schoening
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When compaction fails, the log message should list a) the free space 
> available on disk at that point in time and b) perhaps the number and/or size 
> of the source sstables being compacted.
> Free space can change from one moment to the next, so when the below 
> compaction failed because it needed 161GB, upon checking the server a few 
> minutes later, it had 184GB free.  Similarly, the error message mentions it 
> was writing out one SSTable on this STCS table, but its not clear if it was 
> combining X -> 1 tables, or something else.
> {noformat}
> [INFO ] [CompactionExecutor:77758] cluster_id=87 ip_address=127.1.1.1  
> CompactionTask.java:241 - Compacted (8a1cffe0-abb5-11ed-b3fc-8d2ac2c52f0d) 1 
> sstables to [...] to level=0.  86.997GiB to 86.997GiB (~99% of original) in 
> 1,508,155ms.  Read Throughput = 59.069MiB/s, Write Throughput = 59.069MiB/s, 
> Row Throughput = ~20,283/s.  21,375 total partitions merged to 21,375.  
> Partition merge counts were \{1:21375, }
> [ERROR] [CompactionExecutor:4] cluster_id=87 ip_address=127.1.1.1  
> CassandraDaemon.java:581 - Exception in thread 
> Thread[CompactionExecutor:4,1,main]
> java.lang.RuntimeException: Not enough space for compaction, estimated 
> sstables = 1, expected write size = 161228934093
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.buildCompactionCandidatesForAvailableDiskSpace(CompactionTask.java:386)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:126)
>     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:77)
>     at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$7.execute(CompactionManager.java:613)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:377)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18260) Add details to Error message: Not enough space for compaction

2023-04-19 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714151#comment-17714151
 ] 

Henrik Ingo commented on CASSANDRA-18260:
-

Okay, here are the 4.1 and 4.0 "backports". I tries to make the log messages as 
similar as possible, but since the code actually does something different 
compared to trunk, the message text reflects that.

 

[https://github.com/apache/cassandra/pull/2285]

[https://github.com/apache/cassandra/pull/2284]

 

The above two are pretty identical, just a simple 1 line change was needed. The 
difference from these two to the trunk PR is huge. I spent a couple hours 
pondering whether some other approach would be more correct (for trunk) but in 
the end they're just different. But here we have them now.

 

Maybe my main criticism against these is the growing number of lines asserting 
log output. If anything changes in the surrounding code, you have like 20+ 
assertions to update as well.

> Add details to Error message: Not enough space for compaction 
> --
>
> Key: CASSANDRA-18260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18260
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Observability/Logging
>Reporter: Brad Schoening
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When compaction fails, the log message should list a) the free space 
> available on disk at that point in time and b) perhaps the number and/or size 
> of the source sstables being compacted.
> Free space can change from one moment to the next, so when the below 
> compaction failed because it needed 161GB, upon checking the server a few 
> minutes later, it had 184GB free.  Similarly, the error message mentions it 
> was writing out one SSTable on this STCS table, but its not clear if it was 
> combining X -> 1 tables, or something else.
> {noformat}
> [INFO ] [CompactionExecutor:77758] cluster_id=87 ip_address=127.1.1.1  
> CompactionTask.java:241 - Compacted (8a1cffe0-abb5-11ed-b3fc-8d2ac2c52f0d) 1 
> sstables to [...] to level=0.  86.997GiB to 86.997GiB (~99% of original) in 
> 1,508,155ms.  Read Throughput = 59.069MiB/s, Write Throughput = 59.069MiB/s, 
> Row Throughput = ~20,283/s.  21,375 total partitions merged to 21,375.  
> Partition merge counts were \{1:21375, }
> [ERROR] [CompactionExecutor:4] cluster_id=87 ip_address=127.1.1.1  
> CassandraDaemon.java:581 - Exception in thread 
> Thread[CompactionExecutor:4,1,main]
> java.lang.RuntimeException: Not enough space for compaction, estimated 
> sstables = 1, expected write size = 161228934093
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.buildCompactionCandidatesForAvailableDiskSpace(CompactionTask.java:386)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:126)
>     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:77)
>     at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$7.execute(CompactionManager.java:613)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:377)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18260) Add details to Error message: Not enough space for compaction

2023-04-14 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712263#comment-17712263
 ] 

Henrik Ingo commented on CASSANDRA-18260:
-

Actually.. this isn't a bug fix. Why should it be merged to stable branches?

 

I would understand a motivation to keep branches in sync wrt trivial ghanges 
but if the patch already doesn't apply, why should we merge new functionality 
to a stable branch?

> Add details to Error message: Not enough space for compaction 
> --
>
> Key: CASSANDRA-18260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18260
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Observability/Logging
>Reporter: Brad Schoening
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When compaction fails, the log message should list a) the free space 
> available on disk at that point in time and b) perhaps the number and/or size 
> of the source sstables being compacted.
> Free space can change from one moment to the next, so when the below 
> compaction failed because it needed 161GB, upon checking the server a few 
> minutes later, it had 184GB free.  Similarly, the error message mentions it 
> was writing out one SSTable on this STCS table, but its not clear if it was 
> combining X -> 1 tables, or something else.
> {noformat}
> [INFO ] [CompactionExecutor:77758] cluster_id=87 ip_address=127.1.1.1  
> CompactionTask.java:241 - Compacted (8a1cffe0-abb5-11ed-b3fc-8d2ac2c52f0d) 1 
> sstables to [...] to level=0.  86.997GiB to 86.997GiB (~99% of original) in 
> 1,508,155ms.  Read Throughput = 59.069MiB/s, Write Throughput = 59.069MiB/s, 
> Row Throughput = ~20,283/s.  21,375 total partitions merged to 21,375.  
> Partition merge counts were \{1:21375, }
> [ERROR] [CompactionExecutor:4] cluster_id=87 ip_address=127.1.1.1  
> CassandraDaemon.java:581 - Exception in thread 
> Thread[CompactionExecutor:4,1,main]
> java.lang.RuntimeException: Not enough space for compaction, estimated 
> sstables = 1, expected write size = 161228934093
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.buildCompactionCandidatesForAvailableDiskSpace(CompactionTask.java:386)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:126)
>     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:77)
>     at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$7.execute(CompactionManager.java:613)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:377)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18260) Add details to Error message: Not enough space for compaction

2023-04-03 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708131#comment-17708131
 ] 

Henrik Ingo commented on CASSANDRA-18260:
-

[~mck] I believe what you are seeing here it that the mock {{FakeFileStore}} 
class didn't implement a {{toString()}} method. CASSANDRA-18287 seems to have 
an example of what the same message looks like in production use.

> Add details to Error message: Not enough space for compaction 
> --
>
> Key: CASSANDRA-18260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18260
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Observability/Logging
>Reporter: Brad Schoening
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When compaction fails, the log message should list a) the free space 
> available on disk at that point in time and b) perhaps the number and/or size 
> of the source sstables being compacted.
> Free space can change from one moment to the next, so when the below 
> compaction failed because it needed 161GB, upon checking the server a few 
> minutes later, it had 184GB free.  Similarly, the error message mentions it 
> was writing out one SSTable on this STCS table, but its not clear if it was 
> combining X -> 1 tables, or something else.
> {noformat}
> [INFO ] [CompactionExecutor:77758] cluster_id=87 ip_address=127.1.1.1  
> CompactionTask.java:241 - Compacted (8a1cffe0-abb5-11ed-b3fc-8d2ac2c52f0d) 1 
> sstables to [...] to level=0.  86.997GiB to 86.997GiB (~99% of original) in 
> 1,508,155ms.  Read Throughput = 59.069MiB/s, Write Throughput = 59.069MiB/s, 
> Row Throughput = ~20,283/s.  21,375 total partitions merged to 21,375.  
> Partition merge counts were \{1:21375, }
> [ERROR] [CompactionExecutor:4] cluster_id=87 ip_address=127.1.1.1  
> CassandraDaemon.java:581 - Exception in thread 
> Thread[CompactionExecutor:4,1,main]
> java.lang.RuntimeException: Not enough space for compaction, estimated 
> sstables = 1, expected write size = 161228934093
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.buildCompactionCandidatesForAvailableDiskSpace(CompactionTask.java:386)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:126)
>     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:77)
>     at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$7.execute(CompactionManager.java:613)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:377)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18260) Add details to Error message: Not enough space for compaction

2023-03-28 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706191#comment-17706191
 ] 

Henrik Ingo commented on CASSANDRA-18260:
-

Ok that's better. Updated PR to use it.

> Add details to Error message: Not enough space for compaction 
> --
>
> Key: CASSANDRA-18260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18260
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Logging
>Reporter: Brad Schoening
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When compaction fails, the log message should list a) the free space 
> available on disk at that point in time and b) perhaps the number and/or size 
> of the source sstables being compacted.
> Free space can change from one moment to the next, so when the below 
> compaction failed because it needed 161GB, upon checking the server a few 
> minutes later, it had 184GB free.  Similarly, the error message mentions it 
> was writing out one SSTable on this STCS table, but its not clear if it was 
> combining X -> 1 tables, or something else.
> [INFO ] [CompactionExecutor:77758] cluster_id=87 ip_address=127.1.1.1  
> CompactionTask.java:241 - Compacted (8a1cffe0-abb5-11ed-b3fc-8d2ac2c52f0d) 1 
> sstables to [...] to level=0.  86.997GiB to 86.997GiB (~99% of original) in 
> 1,508,155ms.  Read Throughput = 59.069MiB/s, Write Throughput = 59.069MiB/s, 
> Row Throughput = ~20,283/s.  21,375 total partitions merged to 21,375.  
> Partition merge counts were \{1:21375, }
> [ERROR] [CompactionExecutor:4] cluster_id=87 ip_address=127.1.1.1  
> CassandraDaemon.java:581 - Exception in thread 
> Thread[CompactionExecutor:4,1,main]
> java.lang.RuntimeException: Not enough space for compaction, estimated 
> sstables = 1, expected write size = 161228934093
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.buildCompactionCandidatesForAvailableDiskSpace(CompactionTask.java:386)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:126)
>     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:77)
>     at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$7.execute(CompactionManager.java:613)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:377)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18260) Add details to Error message: Not enough space for compaction

2023-03-28 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705963#comment-17705963
 ] 

Henrik Ingo commented on CASSANDRA-18260:
-

[~bschoeni] sure, here you go:



{quote}{color:#00}FileStore 
org.apache.cassandra.db.DirectoriesTest$FakeFileStore@6ddee60f has 30 bytes 
available, checking if we can write 20 bytes {color}
FileStore org.apache.cassandra.db.DirectoriesTest$FakeFileStore@1f87607c has 30 
bytes available, checking if we can write 20 bytes 
FileStore org.apache.cassandra.db.DirectoriesTest$FakeFileStore@4b862408 has 30 
bytes available, checking if we can write 20 bytes 
FileStore org.apache.cassandra.db.DirectoriesTest$FakeFileStore@6ddee60f has 30 
bytes available, checking if we can write 20 bytes 
FileStore org.apache.cassandra.db.DirectoriesTest$FakeFileStore@1f87607c has 19 
bytes available, checking if we can write 20 bytes 
FileStore org.apache.cassandra.db.DirectoriesTest$FakeFileStore@1f87607c has 
only 0 MiB available, but 0 MiB is needed 
FileStore org.apache.cassandra.db.DirectoriesTest$FakeFileStore@4b862408 has 30 
bytes available, checking if we can write 20 bytes 
FileStore org.apache.cassandra.db.DirectoriesTest$FakeFileStore@6ddee60f has 30 
bytes available, checking if we can write 20 bytes 
FileStore org.apache.cassandra.db.DirectoriesTest$FakeFileStore@1f87607c has 19 
bytes available, checking if we can write 20 bytes 
FileStore org.apache.cassandra.db.DirectoriesTest$FakeFileStore@1f87607c has 
only 0 MiB available, but 0 MiB is needed 
FileStore org.apache.cassandra.db.DirectoriesTest$FakeFileStore@4b862408 has 
20971511 bytes available, checking if we can write 26214409 bytes 
FileStore org.apache.cassandra.db.DirectoriesTest$FakeFileStore@4b862408 has 
only 20 MiB available, but 25 MiB is needed
 

{quote}
Where this is the new row:
{quote}FileStore org.apache.cassandra.db.DirectoriesTest$FakeFileStore@4b862408 
has only X MiB available, but X MiB is needed
{quote}
 

The above is from the unit test, so the error message "{_}Not enough space for 
compaction, estimated sstables = 1, expected write size = 161228934093{_}" is 
not there but would happen after this.

 

This reminds me also, I wanted to highlight the question: Do we want to 
preserve the exact format "1234567 bytes" or "1.2 MiB"?

> Add details to Error message: Not enough space for compaction 
> --
>
> Key: CASSANDRA-18260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18260
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Logging
>Reporter: Brad Schoening
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When compaction fails, the log message should list a) the free space 
> available on disk at that point in time and b) perhaps the number and/or size 
> of the source sstables being compacted.
> Free space can change from one moment to the next, so when the below 
> compaction failed because it needed 161GB, upon checking the server a few 
> minutes later, it had 184GB free.  Similarly, the error message mentions it 
> was writing out one SSTable on this STCS table, but its not clear if it was 
> combining X -> 1 tables, or something else.
> [INFO ] [CompactionExecutor:77758] cluster_id=87 ip_address=127.1.1.1  
> CompactionTask.java:241 - Compacted (8a1cffe0-abb5-11ed-b3fc-8d2ac2c52f0d) 1 
> sstables to [...] to level=0.  86.997GiB to 86.997GiB (~99% of original) in 
> 1,508,155ms.  Read Throughput = 59.069MiB/s, Write Throughput = 59.069MiB/s, 
> Row Throughput = ~20,283/s.  21,375 total partitions merged to 21,375.  
> Partition merge counts were \{1:21375, }
> [ERROR] [CompactionExecutor:4] cluster_id=87 ip_address=127.1.1.1  
> CassandraDaemon.java:581 - Exception in thread 
> Thread[CompactionExecutor:4,1,main]
> java.lang.RuntimeException: Not enough space for compaction, estimated 
> sstables = 1, expected write size = 161228934093
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.buildCompactionCandidatesForAvailableDiskSpace(CompactionTask.java:386)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:126)
>     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:77)
>     at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$7.execute(CompactionManager.java:613)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:377)
>     at java.base/java.ut

[jira] [Updated] (CASSANDRA-18260) Add details to Error message: Not enough space for compaction

2023-03-24 Thread Henrik Ingo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik Ingo updated CASSANDRA-18260:

Test and Documentation Plan: 
Tested on laptop with ant testclasslist

I'll get myself a circleci account next to run the full testsuite
 Status: Patch Available  (was: In Progress)

> Add details to Error message: Not enough space for compaction 
> --
>
> Key: CASSANDRA-18260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18260
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Logging
>Reporter: Brad Schoening
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When compaction fails, the log message should list a) the free space 
> available on disk at that point in time and b) perhaps the number and/or size 
> of the source sstables being compacted.
> Free space can change from one moment to the next, so when the below 
> compaction failed because it needed 161GB, upon checking the server a few 
> minutes later, it had 184GB free.  Similarly, the error message mentions it 
> was writing out one SSTable on this STCS table, but its not clear if it was 
> combining X -> 1 tables, or something else.
> [INFO ] [CompactionExecutor:77758] cluster_id=87 ip_address=127.1.1.1  
> CompactionTask.java:241 - Compacted (8a1cffe0-abb5-11ed-b3fc-8d2ac2c52f0d) 1 
> sstables to [...] to level=0.  86.997GiB to 86.997GiB (~99% of original) in 
> 1,508,155ms.  Read Throughput = 59.069MiB/s, Write Throughput = 59.069MiB/s, 
> Row Throughput = ~20,283/s.  21,375 total partitions merged to 21,375.  
> Partition merge counts were \{1:21375, }
> [ERROR] [CompactionExecutor:4] cluster_id=87 ip_address=127.1.1.1  
> CassandraDaemon.java:581 - Exception in thread 
> Thread[CompactionExecutor:4,1,main]
> java.lang.RuntimeException: Not enough space for compaction, estimated 
> sstables = 1, expected write size = 161228934093
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.buildCompactionCandidatesForAvailableDiskSpace(CompactionTask.java:386)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:126)
>     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:77)
>     at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$7.execute(CompactionManager.java:613)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:377)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18260) Add details to Error message: Not enough space for compaction

2023-03-24 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704557#comment-17704557
 ] 

Henrik Ingo commented on CASSANDRA-18260:
-

Hi

 

I've addressed a) in a PR that I just sent: 
[https://github.com/apache/cassandra/pull/2244]

 

This is my first Cassandra patch, and also first time in decades that I'm 
writing Java code professionally, so humbly and eagerly looking forward to all 
feedback, including trivial stuff I did wrong. I've tested locally with `ant 
testclasslist` but that's all. I'll get myself a circleci account now to run 
the full test suite.

 

I elected to write a new, separate log message from the part in the code where 
the free space calculation is done. This way I don't have to pass those 
variables somewhere else only for the purpose of adding them to the error 
message.

Asserting log messages was surprisingly difficult experience. If there is some 
preferred way to deal with this, I will happily change. For example, I believe 
it's possible to configure Logback into a mode where log messages can be 
expected in a deterministic order (at least when guarding against other threads 
with MDC). But I'm concerned such a configuration could impact performance and 
therefor test turnaround time. Also generally tests should test whatever is the 
default or production config, I wouldn't want to use a different config just 
for tests.

 

 

> Add details to Error message: Not enough space for compaction 
> --
>
> Key: CASSANDRA-18260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18260
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Logging
>Reporter: Brad Schoening
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When compaction fails, the log message should list a) the free space 
> available on disk at that point in time and b) perhaps the number and/or size 
> of the source sstables being compacted.
> Free space can change from one moment to the next, so when the below 
> compaction failed because it needed 161GB, upon checking the server a few 
> minutes later, it had 184GB free.  Similarly, the error message mentions it 
> was writing out one SSTable on this STCS table, but its not clear if it was 
> combining X -> 1 tables, or something else.
> [INFO ] [CompactionExecutor:77758] cluster_id=87 ip_address=127.1.1.1  
> CompactionTask.java:241 - Compacted (8a1cffe0-abb5-11ed-b3fc-8d2ac2c52f0d) 1 
> sstables to [...] to level=0.  86.997GiB to 86.997GiB (~99% of original) in 
> 1,508,155ms.  Read Throughput = 59.069MiB/s, Write Throughput = 59.069MiB/s, 
> Row Throughput = ~20,283/s.  21,375 total partitions merged to 21,375.  
> Partition merge counts were \{1:21375, }
> [ERROR] [CompactionExecutor:4] cluster_id=87 ip_address=127.1.1.1  
> CassandraDaemon.java:581 - Exception in thread 
> Thread[CompactionExecutor:4,1,main]
> java.lang.RuntimeException: Not enough space for compaction, estimated 
> sstables = 1, expected write size = 161228934093
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.buildCompactionCandidatesForAvailableDiskSpace(CompactionTask.java:386)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:126)
>     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:77)
>     at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$7.execute(CompactionManager.java:613)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:377)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-18260) Add details to Error message: Not enough space for compaction

2023-03-14 Thread Henrik Ingo (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik Ingo reassigned CASSANDRA-18260:
---

Assignee: Henrik Ingo

> Add details to Error message: Not enough space for compaction 
> --
>
> Key: CASSANDRA-18260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18260
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Logging
>Reporter: Brad Schoening
>Assignee: Henrik Ingo
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
>
> When compaction fails, the log message should list a) the free space 
> available on disk at that point in time and b) perhaps the number and/or size 
> of the source sstables being compacted.
> Free space can change from one moment to the next, so when the below 
> compaction failed because it needed 161GB, upon checking the server a few 
> minutes later, it had 184GB free.  Similarly, the error message mentions it 
> was writing out one SSTable on this STCS table, but its not clear if it was 
> combining X -> 1 tables, or something else.
> [INFO ] [CompactionExecutor:77758] cluster_id=87 ip_address=127.1.1.1  
> CompactionTask.java:241 - Compacted (8a1cffe0-abb5-11ed-b3fc-8d2ac2c52f0d) 1 
> sstables to [...] to level=0.  86.997GiB to 86.997GiB (~99% of original) in 
> 1,508,155ms.  Read Throughput = 59.069MiB/s, Write Throughput = 59.069MiB/s, 
> Row Throughput = ~20,283/s.  21,375 total partitions merged to 21,375.  
> Partition merge counts were \{1:21375, }
> [ERROR] [CompactionExecutor:4] cluster_id=87 ip_address=127.1.1.1  
> CassandraDaemon.java:581 - Exception in thread 
> Thread[CompactionExecutor:4,1,main]
> java.lang.RuntimeException: Not enough space for compaction, estimated 
> sstables = 1, expected write size = 161228934093
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.buildCompactionCandidatesForAvailableDiskSpace(CompactionTask.java:386)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:126)
>     at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>     at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:77)
>     at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:100)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$7.execute(CompactionManager.java:613)
>     at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:377)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18193) Provide design and API documentation

2023-01-27 Thread Henrik Ingo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681389#comment-17681389
 ] 

Henrik Ingo commented on CASSANDRA-18193:
-

I spent some hours on Wednesday high level eyeballing the diff against G* 
trunk, to form my own opinion of what I see. I might post something about that 
next week, but for now I just wanted to share a by-product that I found and 
according to Benedict could be something you want to uncomment before the merge?

https://github.com/apache/cassandra/blob/cep-15-accord/src/java/org/apache/cassandra/dht/Murmur3Partitioner.java#L240-L255

> Provide design and API documentation
> 
>
> Key: CASSANDRA-18193
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18193
> Project: Cassandra
>  Issue Type: Task
>  Components: Accord
>Reporter: Jacek Lewandowski
>Priority: Normal
>
> Would be great if we have at minimum:
> - white paper in a form of an AsciiDoc or Markdown somewhere in the project 
> tree
> - all interfaces and all methods in {{acccord.api}} have API docs explaining 
> the requirements for the implementations
> - enums and their values across the project are documented
> - interfaces, abstract classes, or classes that do not inherit from anything 
> in the project have at least some class level explanation
> Eventually, it would really awesome if concepts from the whitepaper are 
> somehow referenced in the code (or vice-versa). It would make it much easier 
> to understand the implementation and I believe it would improve reuse of this 
> project for external applications



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-18145) Run entire Cassandra Jenkins in an independent EC2/EKS account

2023-01-11 Thread Henrik Ingo (Jira)
Henrik Ingo created CASSANDRA-18145:
---

 Summary: Run entire Cassandra Jenkins in an independent EC2/EKS 
account
 Key: CASSANDRA-18145
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18145
 Project: Cassandra
  Issue Type: Task
Reporter: Henrik Ingo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org