from:"Benedict Elliott Smith \(Jira\)"

[jira] [Commented] (CASSANDRA-19297) Accord: RejectBefore must be up-to-date on joining nodes before ready to coordinate

2024-07-19 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17867267#comment-17867267
 ] 

Benedict Elliott Smith commented on CASSANDRA-19297:


Thanks! +1

> Accord: RejectBefore must be up-to-date on joining nodes before ready to 
> coordinate
> ---
>
> Key: CASSANDRA-19297
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19297
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Assignee: Blake Eggleston
>Priority: Normal
>  Labels: pull-request-available
>
> The exclusive sync point used to join the shard will be known by a majority 
> of the existing replicas, but in the event the quorum changes and the new 
> replica has not recorded the exclusive sync point this might in principle 
> lead to failing to reject a TxnId that should be rejected.
> Simple fix, but introduce tests to corroborate this issue, and see if can 
> reproduce in burn test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19758) Accord: CommandsForKey should self-prune

2024-07-08 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19758:
--

 Summary: Accord: CommandsForKey should self-prune
 Key: CASSANDRA-19758
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19758
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


CommandsForKey should periodically self-prune, so as to continue functioning 
well in-between garbage collections. This is a bit complicated, as once we 
prune we are left with potentially incomplete information, and have to 
sometimes load per-command information from disk. But the payoff is ensuring 
CommandsForKey objects - which drive the majority of the state machine - are 
kept to a reasonable size.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19288) Accord: Asynchronous reads may be unsafe

2024-06-11 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854202#comment-17854202
 ] 

Benedict Elliott Smith commented on CASSANDRA-19288:


If I remember correctly, when David introduced asynchronous reads into 
accord-core, it threw up problems. It might however have been a validation 
issue rather than a correctness issue. I think I vaguely recall realising after 
filing this that it might be that the merge logic assumes we won't see into the 
future, but we _can_ safely see into the future during the read so long as it 
is discarded, so we might only want to run merge validation logic on the 
coordinator and not the replica.

But, I never properly investigated, so might just be best to enable async reads 
in accord-core we can begin exercising them again and see what fails?

> Accord: Asynchronous reads may be unsafe
> 
>
> Key: CASSANDRA-19288
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19288
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Assignee: Blake Eggleston
>Priority: Normal
>
> In principle we should invalidate asynchronous reads before they complete if 
> the data they read may be invalid, but this anyway causes faults when we 
> permit them to occur in accord-core. We can and perhaps should simply ensure 
> the reads are issued against an sstable/memtable snapshot taken by the 
> command store, as this is lower cost and more robust. Otherwise we should 
> investigate what issue asynchronous reads cause.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-19445) Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"

2024-05-29 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850291#comment-17850291
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-19445 at 5/29/24 8:14 AM:
-

I would defer to [~bdeggleston] here, but if you are facing difficulties you 
can immediately supply your own logback config that sets this class' logging to 
WARN.


was (Author: benedict):
I would defer to [~bdeggleston] here, but if you are facing difficulties you 
can immediately supply your own logback config that sets this classes' logging 
to WARN.

> Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"
> --
>
> Key: CASSANDRA-19445
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19445
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions
>Reporter: Zbyszek Z
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: paxos-entry.txt, paxos-multiple.txt
>
>
> Hello,
> On our cluster logs are flooded with: 
> {code:java}
> INFO  [OptionalTasks:1] 2024-02-27 14:27:51,213 
> PaxosCleanupLocalCoordinator.java:185 - Completed 0 uncommitted paxos 
> instances for X on ranges 
> [(9210458530128018597,-9222146739399525061], 
> (-9222146739399525061,-9174246180597321488], 
> (-9174246180597321488,-9155837684527496840], 
> (-9155837684527496840,-9148981328078890812], 
> (-9148981328078890812,-9141853035919151700], 
> (-9141853035919151700,-9138872620588476741], {code}
> I cannot find anything in doc regarding this longline. Also this are huge log 
> payloads that heavy flood system.log. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19445) Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"

2024-05-29 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850291#comment-17850291
 ] 

Benedict Elliott Smith commented on CASSANDRA-19445:


I would defer to [~bdeggleston] here, but if you are facing difficulties you 
can immediately supply your own logback config that sets this classes' logging 
to WARN.

> Cassandra 4.1.4 floods logs with "Completed 0 uncommitted paxos instances for"
> --
>
> Key: CASSANDRA-19445
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19445
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions
>Reporter: Zbyszek Z
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: paxos-entry.txt, paxos-multiple.txt
>
>
> Hello,
> On our cluster logs are flooded with: 
> {code:java}
> INFO  [OptionalTasks:1] 2024-02-27 14:27:51,213 
> PaxosCleanupLocalCoordinator.java:185 - Completed 0 uncommitted paxos 
> instances for X on ranges 
> [(9210458530128018597,-9222146739399525061], 
> (-9222146739399525061,-9174246180597321488], 
> (-9174246180597321488,-9155837684527496840], 
> (-9155837684527496840,-9148981328078890812], 
> (-9148981328078890812,-9141853035919151700], 
> (-9141853035919151700,-9138872620588476741], {code}
> I cannot find anything in doc regarding this longline. Also this are huge log 
> payloads that heavy flood system.log. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19668) SIGSEV origininating in Paxos Scheduled Task

2024-05-29 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850279#comment-17850279
 ] 

Benedict Elliott Smith commented on CASSANDRA-19668:


I suspect the {{repairIterator}} version of this isn't guarded by an 
{{OpOrder}} so that it doesn't prevent the memtable being flushed and 
reclaimed, which is a bigger problem for off heap but a problem for regular 
memtables too. Probably we should be either taking an in-memory copy of the 
relevant data or else flushing and reading from disk. [~bdeggleston]?

> SIGSEV origininating in Paxos Scheduled Task
> 
>
> Key: CASSANDRA-19668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19668
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jon Haddad
>Priority: Normal
>
> I haven't gotten to the root cause of this yet. Several 4.1 nodes have 
> crashed in in production.  I'm not sure if this is related to Paxos v2 or 
> not, but it is enabled.  offheap_objects also enabled. 
> I'm not sure if this affects 5.0, yet.
> Most of the crashes don't have a stacktrace - they only reference this
> {noformat}
> Stack: [0x7fabf4c34000,0x7fabf4d34000],  sp=0x7fabf4d31f00,  free 
> space=1015k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> v  ~StubRoutines::jint_disjoint_arraycopy
> {noformat}
> They all are in the {{ScheduledTasks}} thread.
> However, one node does have this in the crash log:
> {noformat}
> ---  T H R E A D  ---
> Current thread (0x78b375eac800):  JavaThread "ScheduledTasks:1" daemon 
> [_thread_in_Java, id=151791, stack(0x78b34b78,0x78b34b88)]
> Stack: [0x78b34b78,0x78b34b88],  sp=0x78b34b87c350,  free 
> space=1008k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> J 29467 c2 
> org.apache.cassandra.db.rows.AbstractCell.clone(Lorg/apache/cassandra/utils/memory/ByteBufferCloner;)Lorg/apache/cassandra/db/rows/Cell;
>  (50 bytes) @ 0x78b3dd40a42f [0x78b3dd409de0+0x064f]
> J 17669 c2 
> org.apache.cassandra.db.rows.Cell.clone(Lorg/apache/cassandra/utils/memory/Cloner;)Lorg/apache/cassandra/db/rows/ColumnData;
>  (6 bytes) @ 0x78b3dc54edc0 [0x78b3dc54ed40+0x0080]
> J 17816 c2 
> org.apache.cassandra.db.rows.BTreeRow$$Lambda$845.apply(Ljava/lang/Object;)Ljava/lang/Object;
>  (12 bytes) @ 0x78b3dbed01a4 [0x78b3dbed0120+0x0084]
> J 17828 c2 
> org.apache.cassandra.utils.btree.BTree.transform([Ljava/lang/Object;Ljava/util/function/Function;)[Ljava/lang/Object;
>  (194 bytes) @ 0x78b3dc5f35f0 [0x78b3dc5f34a0+0x0150]
> J 35096 c2 
> org.apache.cassandra.db.rows.BTreeRow.clone(Lorg/apache/cassandra/utils/memory/Cloner;)Lorg/apache/cassandra/db/rows/Row;
>  (37 bytes) @ 0x78b3dda9111c [0x78b3dda90fe0+0x013c]
> J 30500 c2 
> org.apache.cassandra.utils.memory.EnsureOnHeap$CloneToHeap.applyToRow(Lorg/apache/cassandra/db/rows/Row;)Lorg/apache/cassandra/db/rows/Row;
>  (16 bytes) @ 0x78b3dd59b91c [0x78b3dd59b8c0+0x005c]
> J 26498 c2 org.apache.cassandra.db.transform.BaseRows.hasNext()Z (215 bytes) 
> @ 0x78b3dcf1c454 [0x78b3dcf1c180+0x02d4]
> J 30775 c2 
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext()Ljava/lang/Object;
>  (49 bytes) @ 0x78b3dc789020 [0x78b3dc788fc0+0x0060]
> J 9082 c2 org.apache.cassandra.utils.AbstractIterator.hasNext()Z (80 bytes) @ 
> 0x78b3dbb3c544 [0x78b3dbb3c440+0x0104]
> J 35593 c2 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows$PaxosMemtableToKeyStateIterator.computeNext()Lorg/apache/cassandra/service/paxos/uncommitted/PaxosKeyState;
>  (126 bytes) @ 0x78b3dc7ceeec [0x78b3dc7cee20+0x00cc]
> J 35591 c2 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows$PaxosMemtableToKeyStateIterator.computeNext()Ljava/lang/Object;
>  (5 bytes) @ 0x78b3dc7d09e4 [0x78b3dc7d09a0+0x0044]
> J 9082 c2 org.apache.cassandra.utils.AbstractIterator.hasNext()Z (80 bytes) @ 
> 0x78b3dbb3c544 [0x78b3dbb3c440+0x0104]
> J 34146 c2 
> com.google.common.collect.Iterators.addAll(Ljava/util/Collection;Ljava/util/Iterator;)Z
>  (41 bytes) @ 0x78b3dd9197e8 [0x78b3dd919680+0x0168]
> J 38256 c1 
> org.apache.cassandra.service.paxos.uncommitted.PaxosRows.toIterator(Lorg/apache/cassandra/db/partitions/UnfilteredPartitionIterator;Lorg/apache/cassandra/schema/TableId;Z)Lorg/apache/cassandra/utils/CloseableIterator;
>  (49 bytes) @ 0x78b3d6b677ac [0x78b3d6b672e0+0x04cc]
> J 34823 c1 
>

[jira] [Updated] (CASSANDRA-19617) Paxos may re-distribute stale commits that predate a collectable tombstone

2024-05-03 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-19617:
---
 Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable 
Corruption / Loss(12986)
   Complexity: Byzantine
Discovered By: Diff Testing
Fix Version/s: 4.1.x
   5.0-rc
 Severity: Critical
 Assignee: Benedict Elliott Smith
   Status: Open  (was: Triage Needed)

> Paxos may re-distribute stale commits that predate a collectable tombstone
> --
>
> Key: CASSANDRA-19617
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19617
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc
>
>
> Note: this bug only affects {{paxos_state_purging: {gc_grace, repaired}}}, 
> i.e. those introduced alongside Paxos v2.
> There are two problems:
> 1) Purging is applied only on compaction, not on load, which can lead to very 
> old commits being resurfaced in certain circumstances
> 2) PaxosPrepare does not filter commits based on paxos repair low bound
> This permits surprising situations to arise, where some replicas purge a 
> stale commit _and all newer commits_, but due to compaction peculiarities 
> some other replica may purge only the newer commits, leaving a stale commit 
> in some compaction "purgatory"\[1] to be returned to reads indefinitely. 
> So long as there are no newer commits, the paxos coordinator will see this 
> commit is not universally known and redistribute it - no matter how old it 
> is. This can permit an insert to be reapplied after GC grace has elapsed and 
> the tombstone has been collected.
> For proposals this is not a problem, as we correctly filter proposals based 
> on the last paxos repair time. This also does not affect clusters with the 
> legacy (and default) paxos state purging using TTL. Problem (1) only applies 
> also to the new {{gc_grace}} compatibility mode for purging.
> \[1] Compaction purgatory can arise for instance because paxos purging allows 
> whole sstables to be erased quite effectively, and if this is able to 
> ordinarily prevent sstables being promoted to L1, then if for some abnormal 
> reason sstables reach L1 (e.g. repairs being disabled for some time), those 
> that collect may remain uncompacted for an extended period without purging 
> being applied.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction

2024-04-29 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842098#comment-17842098
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-19597 at 4/29/24 8:02 PM:
-

Yes, exactly. If I remember correctly, this "queue" was originally intended to 
achieve two things:
1) ensure commit log records are invalidated correctly, as it used to only 
support essentially invalidations of a complete prefix;
2) serve as a kind of fsync so that when awaiting the completion of a flush on 
a particular table you can be certain all data written prior has made it to 
sstables

I'm not actually sure if any of this is necessary today though. Pretty sure we 
invalidate explicit ranges now, so the commit log semantics do not require 
this. I'm not sure off the top of my head why (except for non-durable 
tables/writes, or things that might want to read sstables prior to commit log 
replay) you would ever need to know all prior flushes had completed though, 
since the commit log will ensure they are re-written on restart.

But a low risk approach would be to just make this a per table queue.


was (Author: benedict):
Yes, exactly. If I remember correctly, this "queue" was originally intended to 
achieve two things:
1) ensure commit log records are invalidated correctly, as it used to only 
support essentially invalidations of a complete prefix;
2) serve as a kind of fsync so that when awaiting the completion of a flush on 
a particular table you can be certain all data written prior has made it to disk

I'm not actually sure if any of this is necessary today though. Pretty sure we 
invalidate explicit ranges now, so the commit log semantics do not require 
this. I'm not off the top of my head sure why (except for non-durable 
tables/writes) you would ever need to know all prior flushes had completed 
though, since the commit log will ensure they are re-written on restart.

But a low risk approach would be to just make this a per table queue.

> SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
> -
>
> Key: CASSANDRA-19597
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19597
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> There is a single post flush thread and that thread processes tasks in order 
> and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, 
> and that memtable flush can be blocked by slow IntervalTree building and 
> racing with compactors to try and build an interval tree.
> Unless there is a requirement for ordering we probably want to loosen this to 
> the actual ordering requirement so that problems in one keyspace can’t effect 
> another.
> SystemKeyspace and Gossip in particular cause lots of weird problems like 
> nodes marking each other down because Gossip can’t process nodes being 
> removed (blocking flush each time in SystemKeyspace.removeNode)
> A very simple fix here might be to queue the post flush task at the same time 
> as the flush in a per CFS queue, and then submit the task only once the flush 
> is completed.
> If flushes complete out of order the queue will still ensure their 
> completions are processed in order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction

2024-04-29 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842098#comment-17842098
 ] 

Benedict Elliott Smith commented on CASSANDRA-19597:


Yes, exactly. If I remember correctly, this "queue" was originally intended to 
achieve two things:
1) ensure commit log records are invalidated correctly, as it used to only 
support essentially invalidations of a complete prefix;
2) serve as a kind of fsync so that when awaiting the completion of a flush on 
a particular table you can be certain all data written prior has made it to disk

I'm not actually sure if any of this is necessary today though. Pretty sure we 
invalidate explicit ranges now, so the commit log semantics do not require 
this. I'm not off the top of my head sure why (except for non-durable 
tables/writes) you would ever need to know all prior flushes had completed 
though, since the commit log will ensure they are re-written on restart.

But a low risk approach would be to just make this a per table queue.

> SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
> -
>
> Key: CASSANDRA-19597
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19597
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> There is a single post flush thread and that thread processes tasks in order 
> and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, 
> and that memtable flush can be blocked by slow IntervalTree building and 
> racing with compactors to try and build an interval tree.
> Unless there is a requirement for ordering we probably want to loosen this to 
> the actual ordering requirement so that problems in one keyspace can’t effect 
> another.
> SystemKeyspace and Gossip in particular cause lots of weird problems like 
> nodes marking each other down because Gossip can’t process nodes being 
> removed (blocking flush each time in SystemKeyspace.removeNode)
> A very simple fix here might be to queue the post flush task at the same time 
> as the flush in a per CFS queue, and then submit the task only once the flush 
> is completed.
> If flushes complete out of order the queue will still ensure their 
> completions are processed in order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19564) MemtablePostFlush deadlock leads to stuck nodes and crashes

2024-04-18 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838525#comment-17838525
 ] 

Benedict Elliott Smith commented on CASSANDRA-19564:


The {{isBlocking}} flag is what indicates that you can skip the memtable 
allocator limit checks. The earliest possible {{OpOrder.Group}} (so walking the 
{{prev}} links until there are no more) is the one that will be stopping 
progress.

If you can upload / send a jstack dump while the node is locked up I can 
_probably_ diagnose it.

> MemtablePostFlush deadlock leads to stuck nodes and crashes
> ---
>
> Key: CASSANDRA-19564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19564
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Local/Memtable
>Reporter: Jon Haddad
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: image-2024-04-16-11-55-54-750.png, 
> image-2024-04-16-12-29-15-386.png, image-2024-04-16-13-43-11-064.png, 
> image-2024-04-16-13-53-24-455.png, image-2024-04-17-18-46-29-474.png, 
> image-2024-04-17-19-13-06-769.png, image-2024-04-17-19-14-34-344.png
>
>
> I've run into an issue on a 4.1.4 cluster where an entire node has locked up 
> due to what I believe is a deadlock in memtable flushing. Here's what I know 
> so far.  I've stitched together what happened based on conversations, logs, 
> and some flame graphs.
> *Log reports memtable flushing*
> The last successful flush happens at 12:19. 
> {noformat}
> INFO  [NativePoolCleaner] 2024-04-16 12:19:53,634 
> AbstractAllocatorMemtable.java:286 - Flushing largest CFS(Keyspace='ks', 
> ColumnFamily='version') to free up room. Used total: 0.24/0.33, live: 
> 0.16/0.20, flushing: 0.09/0.13, this: 0.13/0.15
> INFO  [NativePoolCleaner] 2024-04-16 12:19:53,634 ColumnFamilyStore.java:1012 
> - Enqueuing flush of ks.version, Reason: MEMTABLE_LIMIT, Usage: 660.521MiB 
> (13%) on-heap, 790.606MiB (15%) off-heap
> {noformat}
> *MemtablePostFlush appears to be blocked*
> At this point, MemtablePostFlush completed tasks stops incrementing, active 
> stays at 1 and pending starts to rise.
> {noformat}
> MemtablePostFlush   1    1   3446   0   0
> {noformat}
>  
> The flame graph reveals that PostFlush.call is stuck.  I don't have the line 
> number, but I know we're stuck in 
> {{org.apache.cassandra.db.ColumnFamilyStore.PostFlush#call}} given the visual 
> below:
> *!image-2024-04-16-13-43-11-064.png!*
> *Memtable flushing is now blocked.*
> All MemtableFlushWriter threads are Parked waiting on 
> {{{}OpOrder.Barrier.await{}}}. A wall clock profile of 30s reveals all time 
> is spent here.  Presumably we're waiting on the single threaded Post Flush.
> !image-2024-04-16-12-29-15-386.png!
> *Memtable allocations start to block*
> Eventually it looks like the NativeAllocator stops successfully allocating 
> memory. I assume it's waiting on memory to be freed, but since memtable 
> flushes are blocked, we wait indefinitely.
> Looking at a wall clock flame graph, all writer threads have reached the 
> allocation failure path of {{MemtableAllocator.allocate()}}.  I believe we're 
> waiting on {{signal.awaitThrowUncheckedOnInterrupt()}}
> {noformat}
>  MutationStage    48    828425      980253369      0    0{noformat}
> !image-2024-04-16-11-55-54-750.png!
>  
> *Compaction Stops*
> Since we write to the compaction history table, and that requires memtables, 
> compactions are now blocked as well.
>  
> !image-2024-04-16-13-53-24-455.png!
>  
> The node is now doing basically nothing and must be restarted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19564) MemtablePostFlush deadlock leads to stuck nodes and crashes

2024-04-17 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838376#comment-17838376
 ] 

Benedict Elliott Smith commented on CASSANDRA-19564:


Honestly a jstack output during the issue would probably be enough to spot a 
candidate issue. If you have one feel free to back channel it to me for a quick 
peek, in case I can easily spot something to dig into.

> MemtablePostFlush deadlock leads to stuck nodes and crashes
> ---
>
> Key: CASSANDRA-19564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19564
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Local/Memtable
>Reporter: Jon Haddad
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: image-2024-04-16-11-55-54-750.png, 
> image-2024-04-16-12-29-15-386.png, image-2024-04-16-13-43-11-064.png, 
> image-2024-04-16-13-53-24-455.png
>
>
> I've run into an issue on a 4.1.4 cluster where an entire node has locked up 
> due to what I believe is a deadlock in memtable flushing. Here's what I know 
> so far.  I've stitched together what happened based on conversations, logs, 
> and some flame graphs.
> *Log reports memtable flushing*
> The last successful flush happens at 12:19. 
> {noformat}
> INFO  [NativePoolCleaner] 2024-04-16 12:19:53,634 
> AbstractAllocatorMemtable.java:286 - Flushing largest CFS(Keyspace='ks', 
> ColumnFamily='version') to free up room. Used total: 0.24/0.33, live: 
> 0.16/0.20, flushing: 0.09/0.13, this: 0.13/0.15
> INFO  [NativePoolCleaner] 2024-04-16 12:19:53,634 ColumnFamilyStore.java:1012 
> - Enqueuing flush of ks.version, Reason: MEMTABLE_LIMIT, Usage: 660.521MiB 
> (13%) on-heap, 790.606MiB (15%) off-heap
> {noformat}
> *MemtablePostFlush appears to be blocked*
> At this point, MemtablePostFlush completed tasks stops incrementing, active 
> stays at 1 and pending starts to rise.
> {noformat}
> MemtablePostFlush   1    1   3446   0   0
> {noformat}
>  
> The flame graph reveals that PostFlush.call is stuck.  I don't have the line 
> number, but I know we're stuck in 
> {{org.apache.cassandra.db.ColumnFamilyStore.PostFlush#call}} given the visual 
> below:
> *!image-2024-04-16-13-43-11-064.png!*
> *Memtable flushing is now blocked.*
> All MemtableFlushWriter threads are Parked waiting on 
> {{{}OpOrder.Barrier.await{}}}. A wall clock profile of 30s reveals all time 
> is spent here.  Presumably we're waiting on the single threaded Post Flush.
> !image-2024-04-16-12-29-15-386.png!
> *Memtable allocations start to block*
> Eventually it looks like the NativeAllocator stops successfully allocating 
> memory. I assume it's waiting on memory to be freed, but since memtable 
> flushes are blocked, we wait indefinitely.
> Looking at a wall clock flame graph, all writer threads have reached the 
> allocation failure path of {{MemtableAllocator.allocate()}}.  I believe we're 
> waiting on {{signal.awaitThrowUncheckedOnInterrupt()}}
> {noformat}
>  MutationStage    48    828425      980253369      0    0{noformat}
> !image-2024-04-16-11-55-54-750.png!
>  
> *Compaction Stops*
> Since we write to the compaction history table, and that requires memtables, 
> compactions are now blocked as well.
>  
> !image-2024-04-16-13-53-24-455.png!
>  
> The node is now doing basically nothing and must be restarted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19564) MemtablePostFlush deadlock leads to stuck nodes and crashes

2024-04-17 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838249#comment-17838249
 ] 

Benedict Elliott Smith commented on CASSANDRA-19564:


Is the Flush that is blocked the one that the postFlush is waiting on? You can 
check this from a heap dump.

If it is, the question is why the writeBarrier it has issued doesn't complete - 
any write that is behind such an issued barrier should be clear to complete 
without blocking. In which case we have perhaps introduced some new blocking 
mechanism that sits behind the completion of the barrier that depends on the 
barrier itself finishing. This should also be apparent from a heap dump, from 
which you can find the OpOrder that haven't completed, and which threads are 
holding a reference to it and what they are blocking on.



> MemtablePostFlush deadlock leads to stuck nodes and crashes
> ---
>
> Key: CASSANDRA-19564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19564
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Local/Memtable
>Reporter: Jon Haddad
>Priority: Urgent
> Fix For: 4.1.x
>
> Attachments: image-2024-04-16-11-55-54-750.png, 
> image-2024-04-16-12-29-15-386.png, image-2024-04-16-13-43-11-064.png, 
> image-2024-04-16-13-53-24-455.png
>
>
> I've run into an issue on a 4.1.4 cluster where an entire node has locked up 
> due to what I believe is a deadlock in memtable flushing. Here's what I know 
> so far.  I've stitched together what happened based on conversations, logs, 
> and some flame graphs.
> *Log reports memtable flushing*
> The last successful flush happens at 12:19. 
> {noformat}
> INFO  [NativePoolCleaner] 2024-04-16 12:19:53,634 
> AbstractAllocatorMemtable.java:286 - Flushing largest CFS(Keyspace='ks', 
> ColumnFamily='version') to free up room. Used total: 0.24/0.33, live: 
> 0.16/0.20, flushing: 0.09/0.13, this: 0.13/0.15
> INFO  [NativePoolCleaner] 2024-04-16 12:19:53,634 ColumnFamilyStore.java:1012 
> - Enqueuing flush of ks.version, Reason: MEMTABLE_LIMIT, Usage: 660.521MiB 
> (13%) on-heap, 790.606MiB (15%) off-heap
> {noformat}
> *MemtablePostFlush appears to be blocked*
> At this point, MemtablePostFlush completed tasks stops incrementing, active 
> stays at 1 and pending starts to rise.
> {noformat}
> MemtablePostFlush   1    1   3446   0   0
> {noformat}
>  
> The flame graph reveals that PostFlush.call is stuck.  I don't have the line 
> number, but I know we're stuck in 
> {{org.apache.cassandra.db.ColumnFamilyStore.PostFlush#call}} given the visual 
> below:
> *!image-2024-04-16-13-43-11-064.png!*
> *Memtable flushing is now blocked.*
> All MemtableFlushWriter threads are Parked waiting on 
> {{{}OpOrder.Barrier.await{}}}. A wall clock profile of 30s reveals all time 
> is spent here.  Presumably we're waiting on the single threaded Post Flush.
> !image-2024-04-16-12-29-15-386.png!
> *Memtable allocations start to block*
> Eventually it looks like the NativeAllocator stops successfully allocating 
> memory. I assume it's waiting on memory to be freed, but since memtable 
> flushes are blocked, we wait indefinitely.
> Looking at a wall clock flame graph, all writer threads have reached the 
> allocation failure path of {{MemtableAllocator.allocate()}}.  I believe we're 
> waiting on {{signal.awaitThrowUncheckedOnInterrupt()}}
> {noformat}
>  MutationStage    48    828425      980253369      0    0{noformat}
> !image-2024-04-16-11-55-54-750.png!
>  
> *Compaction Stops*
> Since we write to the compaction history table, and that requires memtables, 
> compactions are now blocked as well.
>  
> !image-2024-04-16-13-53-24-455.png!
>  
> The node is now doing basically nothing and must be restarted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19308) Accord: Avoid maintaining separate FULL history; read the system table for mapReduce over command deps

2024-02-28 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821599#comment-17821599
 ] 

Benedict Elliott Smith commented on CASSANDRA-19308:


CASSANDRA-19310 likely makes this unnecessary at least for key transactions, as 
dependencies are now efficiently represented in CommandsForKey, and there is 
likely little to gain.

> Accord: Avoid maintaining separate FULL history; read the system table for 
> mapReduce over command deps
> --
>
> Key: CASSANDRA-19308
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19308
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Priority: Normal
>
> The FULL deps history is costly to maintain and to read. It is only used for 
> transaction recovery, and we can implement it by reading the accord system 
> table directly to fetch the deps of each transaction we find in the basic 
> deps history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-19310) Accord: More efficient CommandsForKey with transitive dependency elision

2024-02-28 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith reassigned CASSANDRA-19310:
--

Assignee: Benedict Elliott Smith

> Accord: More efficient CommandsForKey with transitive dependency elision
> 
>
> Key: CASSANDRA-19310
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19310
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
>
> We currently depend on state GC for dependency pruning, but we can prune 
> dependencies directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-19310) Accord: More efficient CommandsForKey with transitive dependency elision

2024-02-28 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-19310:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> Accord: More efficient CommandsForKey with transitive dependency elision
> 
>
> Key: CASSANDRA-19310
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19310
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Priority: Normal
>
> We currently depend on state GC for dependency pruning, but we can prune 
> dependencies directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-19310) Accord: More efficient CommandsForKey and transitive dependency elision

2024-02-28 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-19310:
---
Summary: Accord: More efficient CommandsForKey and transitive dependency 
elision  (was: Accord: Dependency pruning)

> Accord: More efficient CommandsForKey and transitive dependency elision
> ---
>
> Key: CASSANDRA-19310
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19310
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Priority: Normal
>
> We currently depend on state GC for dependency pruning, but we can prune 
> dependencies directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-19310) Accord: More efficient CommandsForKey with transitive dependency elision

2024-02-28 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-19310:
---
Summary: Accord: More efficient CommandsForKey with transitive dependency 
elision  (was: Accord: More efficient CommandsForKey and transitive dependency 
elision)

> Accord: More efficient CommandsForKey with transitive dependency elision
> 
>
> Key: CASSANDRA-19310
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19310
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Priority: Normal
>
> We currently depend on state GC for dependency pruning, but we can prune 
> dependencies directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-19305) Accord: Fast single-partition reads

2024-02-28 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-19305:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> Accord: Fast single-partition reads
> ---
>
> Key: CASSANDRA-19305
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19305
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Introduce guaranteed 1RT single-partition reads with no transaction metadata



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19359) Accord: Never recover read-only transactions; simply invalidate

2024-02-01 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19359:
--

 Summary: Accord: Never recover read-only transactions; simply 
invalidate
 Key: CASSANDRA-19359
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19359
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


read-only transactions do not need to be recovered to supply client responses 
or for other transactions to make progress. The only situation that might 
require a read to be recovered is for recovery of a write transaction that 
needs to know whether the read might have witnessed or not-witnessed it at a 
specific `executeAt`. This can be special-cased, either to run recovery in this 
circumstance, or to simply compute the necessary recovery information to decide 
whether it is possible or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19358) Accord: AccordBootstrapTest hangs because topology fetching appears to stall

2024-02-01 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19358:
--

 Summary: Accord: AccordBootstrapTest hangs because topology 
fetching appears to stall
 Key: CASSANDRA-19358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19358
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


This likely means there is some serious progress issue with topology fetching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19357) Accord: Harden Node.Id handling: graceful restart for left nodes, and ensure don’t cause problems with IP reuse

2024-02-01 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19357:
--

 Summary: Accord: Harden Node.Id handling: graceful restart for 
left nodes, and ensure don’t cause problems with IP reuse
 Key: CASSANDRA-19357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19357
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


We rely on TCM for mapping node.id to replicas/IPs, but TCM is not 
Accord-epoch-aware, so it might erase a mapping before Accord is finished with 
it (and so, after a reboot Accord may not be able to find it again), but also 
might permit an IP to be re-used for a new Node.Id when Accord is still using 
it for an older epoch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19356) Accord: Range transaction state indexing / caching

2024-02-01 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19356:
--

 Summary: Accord: Range transaction state indexing / caching
 Key: CASSANDRA-19356
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19356
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


Range transactions are kept entirely in-memory at present. This is fine so long 
as we only use them for book-keeping and they do not exist too long, but runs 
the risk of OOM if cleanup doesn't excise them for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19355) Accord: PreLoadContext must properly and consistently support ranges

2024-02-01 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19355:
--

 Summary: Accord: PreLoadContext must properly and consistently 
support ranges
 Key: CASSANDRA-19355
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19355
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


There are some mechanisms for ensuring range transactions are loaded for range 
transactions, but they do not currently work properly (having several race 
conditions), are potentially costly in terms of memory consumption, and are 
inconsistent with how they work for key transactions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19354) Accord: Integrate speculative retry with Accord’s slow read mechanism

2024-02-01 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19354:
--

 Summary: Accord: Integrate speculative retry with Accord’s slow 
read mechanism
 Key: CASSANDRA-19354
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19354
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-19349) Timeuuid compare is broken

2024-01-31 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812787#comment-17812787
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-19349 at 1/31/24 4:27 PM:
-

There is something weird going on here but it has been this way for a very long 
time - since UUIDs were introduced in fact (at the CQL layer, I mean), I think.

 

Essentially the storage layer's {{compareCustom}} is not consistent with plain 
object comparison. This doesn't appear to be documented, but I don't think in 
practice this is a problem.


was (Author: benedict):
There is something weird going on here but it has been this way for a very long 
time - since the TimeUUID type (at the CQL layer, I mean) was introduced in 
fact, I think.

 

Essentially the storage layer's {{compareCustom}} is not consistent with plain 
object comparison. This doesn't appear to be documented, but I don't think in 
practice this is a problem.

> Timeuuid compare is broken
> --
>
> Key: CASSANDRA-19349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Andreas Mager
>Priority: Normal
>
> {{I have stumbled over a wired problem on my pc.}}
> {{When i turn on my wifi interface, then some of my integration test are 
> failing.}}
> {{The mac part(lsb) of the timeuuids become changed in our Uuid 
> implementation.}}
> {{These uuids are used for the cassandra insertions and queries.}}
>  
> {{TestSetup with "broken" Uuids:}}
> {code:java}
> CREATE TABLE object_comment (
>     object timeuuid,
>     comment timeuuid,
>     value blob,
>     PRIMARY KEY (object, comment)
> )
> INSERT INTO object_comment (object, comment , value) VALUES 
> (95278adc-c03f-11ee-ab43-bb35e932d536, cf9e6440-c01e-11ee-847b-34cff6b1be80, 
> 0x01);
> INSERT INTO object_comment (object, comment , value) VALUES 
> (95278adc-c03f-11ee-ab43-bb35e932d536, cf9f75b0-c01e-11ee-847b-34cff6b1be80, 
> 0x02);
> // cf9f75b0-c01e-11ee-847b-34cff6b1be7f is lsb-1 and the same timestamp
> SELECT * FROM object_comment where object = 
> 95278adc-c03f-11ee-ab43-bb35e932d536 AND comment <= 
> cf9f75b0-c01e-11ee-847b-34cff6b1be7f; object                               | 
> comment                              | value
> --+--+---
>  95278adc-c03f-11ee-ab43-bb35e932d536 | cf9e6440-c01e-11ee-847b-34cff6b1be80 
> |  0x01
>  95278adc-c03f-11ee-ab43-bb35e932d536 | cf9f75b0-c01e-11ee-847b-34cff6b1be80 
> |  0x02(2 rows)
>  {code}
>  
>  
> The second row must not be present. The Only row expected is : 
> {code:java}
> 95278adc-c03f-11ee-ab43-bb35e932d536 | cf9e6440-c01e-11ee-847b-34cff6b1be80 | 
>  0x01{code}
>  
> I think i have found the cause of the issue.
> The Methods `org.apache.cassandra.utils.TimeUUID#compareTo` and 
> `org.apache.cassandra.db.marshal.TimeUUIDType#compareCustom` return different 
> results.
> Test pseudocode:
> {code:java}
> var id = UUID.fromString("cf9f75b0-c01e-11ee-847b-34cff6b1be80");
> var idDecrementInLsb = 
> UUID.fromString("cf9f75b0-c01e-11ee-847b-34cff6b1be7f");
> // java.util.UUID#compareTo
> assertThat(idDecrementInLsb.compareTo(id)).isEqualTo(-1);
> var timeUuidDec = 
> org.apache.cassandra.utils.TimeUUID.fromUuid(idDecrementInLsb);
> var timeUuidId = org.apache.cassandra.utils.TimeUUID.fromUuid(id);
> // org.apache.cassandra.utils.TimeUUID#compareTo
> assertThat(timeUuidDec.compareTo(timeUuidId)).isEqualTo(-1);
> // org.apache.cassandra.db.marshal.TimeUUIDType.compareCustom
> assertThat(org.apache.cassandra.db.marshal.TimeUUIDType.compareCustom(idDecrementInLsb,
>  id1)).isEqualTo(-1); // This fails
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-19349) Timeuuid compare is broken

2024-01-31 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812787#comment-17812787
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-19349 at 1/31/24 4:27 PM:
-

There is something weird going on here but it has been this way for a very long 
time - since the TimeUUID type (at the CQL layer, I mean) was introduced in 
fact, I think.

 

Essentially the storage layer's {{compareCustom}} is not consistent with plain 
object comparison. This doesn't appear to be documented, but I don't think in 
practice this is a problem.


was (Author: benedict):
There is something weird going on here but it has been this way for a very long 
time - since the TimeUUID type was introduced in fact, I think.

 

Essentially the storage layer's {{compareCustom}} is not consistent with plain 
object comparison. This doesn't appear to be documented, but I don't think in 
practice this is a problem.

> Timeuuid compare is broken
> --
>
> Key: CASSANDRA-19349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Andreas Mager
>Priority: Normal
>
> {{I have stumbled over a wired problem on my pc.}}
> {{When i turn on my wifi interface, then some of my integration test are 
> failing.}}
> {{The mac part(lsb) of the timeuuids become changed in our Uuid 
> implementation.}}
> {{These uuids are used for the cassandra insertions and queries.}}
>  
> {{TestSetup with "broken" Uuids:}}
> {code:java}
> CREATE TABLE object_comment (
>     object timeuuid,
>     comment timeuuid,
>     value blob,
>     PRIMARY KEY (object, comment)
> )
> INSERT INTO object_comment (object, comment , value) VALUES 
> (95278adc-c03f-11ee-ab43-bb35e932d536, cf9e6440-c01e-11ee-847b-34cff6b1be80, 
> 0x01);
> INSERT INTO object_comment (object, comment , value) VALUES 
> (95278adc-c03f-11ee-ab43-bb35e932d536, cf9f75b0-c01e-11ee-847b-34cff6b1be80, 
> 0x02);
> // cf9f75b0-c01e-11ee-847b-34cff6b1be7f is lsb-1 and the same timestamp
> SELECT * FROM object_comment where object = 
> 95278adc-c03f-11ee-ab43-bb35e932d536 AND comment <= 
> cf9f75b0-c01e-11ee-847b-34cff6b1be7f; object                               | 
> comment                              | value
> --+--+---
>  95278adc-c03f-11ee-ab43-bb35e932d536 | cf9e6440-c01e-11ee-847b-34cff6b1be80 
> |  0x01
>  95278adc-c03f-11ee-ab43-bb35e932d536 | cf9f75b0-c01e-11ee-847b-34cff6b1be80 
> |  0x02(2 rows)
>  {code}
>  
>  
> The second row must not be present. The Only row expected is : 
> {code:java}
> 95278adc-c03f-11ee-ab43-bb35e932d536 | cf9e6440-c01e-11ee-847b-34cff6b1be80 | 
>  0x01{code}
>  
> I think i have found the cause of the issue.
> The Methods `org.apache.cassandra.utils.TimeUUID#compareTo` and 
> `org.apache.cassandra.db.marshal.TimeUUIDType#compareCustom` return different 
> results.
> Test pseudocode:
> {code:java}
> var id = UUID.fromString("cf9f75b0-c01e-11ee-847b-34cff6b1be80");
> var idDecrementInLsb = 
> UUID.fromString("cf9f75b0-c01e-11ee-847b-34cff6b1be7f");
> // java.util.UUID#compareTo
> assertThat(idDecrementInLsb.compareTo(id)).isEqualTo(-1);
> var timeUuidDec = 
> org.apache.cassandra.utils.TimeUUID.fromUuid(idDecrementInLsb);
> var timeUuidId = org.apache.cassandra.utils.TimeUUID.fromUuid(id);
> // org.apache.cassandra.utils.TimeUUID#compareTo
> assertThat(timeUuidDec.compareTo(timeUuidId)).isEqualTo(-1);
> // org.apache.cassandra.db.marshal.TimeUUIDType.compareCustom
> assertThat(org.apache.cassandra.db.marshal.TimeUUIDType.compareCustom(idDecrementInLsb,
>  id1)).isEqualTo(-1); // This fails
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19349) Timeuuid compare is broken

2024-01-31 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-19349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812787#comment-17812787
 ] 

Benedict Elliott Smith commented on CASSANDRA-19349:


There is something weird going on here but it has been this way for a very long 
time - since the TimeUUID type was introduced in fact, I think.

 

Essentially the storage layer's {{compareCustom}} is not consistent with plain 
object comparison. This doesn't appear to be documented, but I don't think in 
practice this is a problem.

> Timeuuid compare is broken
> --
>
> Key: CASSANDRA-19349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19349
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Andreas Mager
>Priority: Normal
>
> {{I have stumbled over a wired problem on my pc.}}
> {{When i turn on my wifi interface, then some of my integration test are 
> failing.}}
> {{The mac part(lsb) of the timeuuids become changed in our Uuid 
> implementation.}}
> {{These uuids are used for the cassandra insertions and queries.}}
>  
> {{TestSetup with "broken" Uuids:}}
> {code:java}
> CREATE TABLE object_comment (
>     object timeuuid,
>     comment timeuuid,
>     value blob,
>     PRIMARY KEY (object, comment)
> )
> INSERT INTO object_comment (object, comment , value) VALUES 
> (95278adc-c03f-11ee-ab43-bb35e932d536, cf9e6440-c01e-11ee-847b-34cff6b1be80, 
> 0x01);
> INSERT INTO object_comment (object, comment , value) VALUES 
> (95278adc-c03f-11ee-ab43-bb35e932d536, cf9f75b0-c01e-11ee-847b-34cff6b1be80, 
> 0x02);
> // cf9f75b0-c01e-11ee-847b-34cff6b1be7f is lsb-1 and the same timestamp
> SELECT * FROM object_comment where object = 
> 95278adc-c03f-11ee-ab43-bb35e932d536 AND comment <= 
> cf9f75b0-c01e-11ee-847b-34cff6b1be7f; object                               | 
> comment                              | value
> --+--+---
>  95278adc-c03f-11ee-ab43-bb35e932d536 | cf9e6440-c01e-11ee-847b-34cff6b1be80 
> |  0x01
>  95278adc-c03f-11ee-ab43-bb35e932d536 | cf9f75b0-c01e-11ee-847b-34cff6b1be80 
> |  0x02(2 rows)
>  {code}
>  
>  
> The second row must not be present. The Only row expected is : 
> {code:java}
> 95278adc-c03f-11ee-ab43-bb35e932d536 | cf9e6440-c01e-11ee-847b-34cff6b1be80 | 
>  0x01{code}
>  
> I think i have found the cause of the issue.
> The Methods `org.apache.cassandra.utils.TimeUUID#compareTo` and 
> `org.apache.cassandra.db.marshal.TimeUUIDType#compareCustom` return different 
> results.
> Test pseudocode:
> {code:java}
> var id = UUID.fromString("cf9f75b0-c01e-11ee-847b-34cff6b1be80");
> var idDecrementInLsb = 
> UUID.fromString("cf9f75b0-c01e-11ee-847b-34cff6b1be7f");
> // java.util.UUID#compareTo
> assertThat(idDecrementInLsb.compareTo(id)).isEqualTo(-1);
> var timeUuidDec = 
> org.apache.cassandra.utils.TimeUUID.fromUuid(idDecrementInLsb);
> var timeUuidId = org.apache.cassandra.utils.TimeUUID.fromUuid(id);
> // org.apache.cassandra.utils.TimeUUID#compareTo
> assertThat(timeUuidDec.compareTo(timeUuidId)).isEqualTo(-1);
> // org.apache.cassandra.db.marshal.TimeUUIDType.compareCustom
> assertThat(org.apache.cassandra.db.marshal.TimeUUIDType.compareCustom(idDecrementInLsb,
>  id1)).isEqualTo(-1); // This fails
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19323) Accord: table configuration

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19323:
--

 Summary: Accord: table configuration
 Key: CASSANDRA-19323
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19323
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


We must be able to enable/disable Accord and specify various Accord settings at 
the table level via schema changes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19322) Accord: Fast path reconfiguration

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19322:
--

 Summary: Accord: Fast path reconfiguration 
 Key: CASSANDRA-19322
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19322
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


We must be able to provide configuration that decides the fast path based on 
the topology, and reconfigure the fast path in the event of outages



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19320) Accord: Metrics to detect stalled transactions or other problems

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19320:
--

 Summary: Accord: Metrics to detect stalled transactions or other 
problems
 Key: CASSANDRA-19320
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19320
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


In order to detect faults in the transaction system or other issues, we must 
introduce metrics that expose potential issues promptly, such as stalled or 
failed transactions, failure to coordinate durability and cleanup state, etc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19321) Accord: Command to mark replicas as “stale" for decommission

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19321:
--

 Summary: Accord: Command to mark replicas as “stale" for 
decommission
 Key: CASSANDRA-19321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19321
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


So that other replicas may continue to cleanup their state, we must have an 
operator command for marking replicas as stale so that the remaining replicas 
do not wait for them to coordinate their durability status.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19319) Accord: Developer journal replay debug feature

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19319:
--

 Summary: Accord: Developer journal replay debug feature
 Key: CASSANDRA-19319
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19319
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


In order to assist debugging of faults in the transaction system, we must have 
a mechanism for replaying journals locally to understand how a CommandStore 
reached a given state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19318) Accord: Virtual table functionality to modify current state of transactions, trigger various cleanup operations etc

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19318:
--

 Summary: Accord: Virtual table functionality to modify current 
state of transactions, trigger various cleanup operations etc
 Key: CASSANDRA-19318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19318
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


To assist operators in resolving issues with the transaction system, we must 
offer facilities for injecting state modifications, trigger various internal 
book-keeping operations, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19317) Accord: Virtual table to expose current state of transactions

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19317:
--

 Summary: Accord: Virtual table to expose current state of 
transactions
 Key: CASSANDRA-19317
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19317
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


To assist operators and debugging of any faults in the transaction system we 
must expose as much internal information as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19316) Accord: De-duplicate and timeout reads/WaitingToApply

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19316:
--

 Summary: Accord: De-duplicate and timeout reads/WaitingToApply
 Key: CASSANDRA-19316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19316
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


Currently we can have infinitely many copies of read callbacks for the same 
transaction to the same recipient replica. This work can perhaps be merged with 
that to optimise FetchData callbacks, introducing an efficient global read 
callback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19315) Accord: CommandStore rebalancing

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19315:
--

 Summary: Accord: CommandStore rebalancing
 Key: CASSANDRA-19315
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19315
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


Currently we cannot internally re-shard gracefully within a node, and topology 
changes increase the number of internal shards. We may want to settle for some 
less-than-optimal approach that is easy to implement for now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19314) Accord: SimpleProgressLog: make sure not too simple; probably at least page to system table as necessary

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19314:
--

 Summary: Accord: SimpleProgressLog: make sure not too simple; 
probably at least page to system table as necessary
 Key: CASSANDRA-19314
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19314
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


The SPL is a core system for progress, and was only originally intended to be 
relied on as a reference implementation. However, we can modify it a little to 
make it satisfactory for the intended purpose.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19313) Accord: Reduce overhead of NotifyWaitingOn with a WaitingToExecute SaveStatus

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19313:
--

 Summary: Accord: Reduce overhead of NotifyWaitingOn with a 
WaitingToExecute SaveStatus
 Key: CASSANDRA-19313
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19313
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


Long execution graphs can do a lot of duplicated work invoking 
{{NotifyWaitingOn}} repeatedly on a transaction that is already waiting on a 
dependent transaction to execute. This can easily be avoided by introducing a 
{{SaveStatus}} that indicates the transaction is actively managing its 
dependencies for execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19312) Accord: Introduce long-lived callbacks for progress to reduce overhead of repeated FetchData calls

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19312:
--

 Summary: Accord: Introduce long-lived callbacks for progress to 
reduce overhead of repeated FetchData calls
 Key: CASSANDRA-19312
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19312
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


We currently poll actively on replicas waiting to hear news of a transaction 
their execution depends upon. We should instead register a long-lived callback 
at most once per peer, and periodically batch-wise confirm callbacks for our 
transactions are are still registered. We can simultaneously make our callback 
management much less costly, by having a global callback manager that just 
tracks TxnId->Replica.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19311) Accord: (Resource Consumption) DurableBefore should be shared between shards

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19311:
--

 Summary: Accord: (Resource Consumption) DurableBefore should be 
shared between shards
 Key: CASSANDRA-19311
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19311
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict Elliott Smith


{{DurableBefore}} is a fairly large structure, and is a cluster-universal 
concept. So a given node can share it between all {{CommandStore}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19309) Accord: General performance investigation/improvement

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19309:
--

 Summary: Accord: General performance investigation/improvement
 Key: CASSANDRA-19309
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19309
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19310) Accord: Dependency pruning

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19310:
--

 Summary: Accord: Dependency pruning
 Key: CASSANDRA-19310
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19310
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


We currently depend on state GC for dependency pruning, but we can prune 
dependencies directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19308) Accord: Avoid maintaining separate FULL history; read the system table for mapReduce over command deps

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19308:
--

 Summary: Accord: Avoid maintaining separate FULL history; read the 
system table for mapReduce over command deps
 Key: CASSANDRA-19308
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19308
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


The FULL deps history is costly to maintain and to read. It is only used for 
transaction recovery, and we can implement it by reading the accord system 
table directly to fetch the deps of each transaction we find in the basic deps 
history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19306) Accord: Introduce a "Medium path"

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19306:
--

 Summary: Accord: Introduce a "Medium path"
 Key: CASSANDRA-19306
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19306
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


Accord transactions are currently either one or three round-trips. There is a 
_relatively_ simple modification to the protocol that permits two round-trip 
transactions if the coordinator's proposed timestamp is agreed on the slow path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19305) Accord: Fast single-partition reads

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19305:
--

 Summary: Accord: Fast single-partition reads
 Key: CASSANDRA-19305
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19305
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


Introduce guaranteed 1RT single-partition reads with no transaction metadata



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19304) Accord: General invariant improvements/validation/investigation

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19304:
--

 Summary: Accord: General invariant 
improvements/validation/investigation
 Key: CASSANDRA-19304
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19304
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19303) Accord: Address or triage all TODOs with priority >= ‘expected’ in cassandra-accord

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19303:
--

 Summary: Accord: Address or triage all TODOs with priority >= 
‘expected’ in cassandra-accord
 Key: CASSANDRA-19303
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19303
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19302) Accord: Support for dropping keyspaces and table

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19302:
--

 Summary: Accord: Support for dropping keyspaces and table
 Key: CASSANDRA-19302
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19302
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19301) Accord: Support routing standard and range reads through Accord

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19301:
--

 Summary: Accord: Support routing standard and range reads through 
Accord
 Key: CASSANDRA-19301
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19301
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


For compatibility with non-Accord transactions we should be able to 
transparently upgrade normal reads to reads serviced by Accord. Range reads can 
safely employ the fast 1RT read optimisation since they do not expect 
serializable consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19298) Accord: Deps.isEqualOrFuller is incorrect

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19298:
--

 Summary: Accord: Deps.isEqualOrFuller is incorrect
 Key: CASSANDRA-19298
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19298
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


Deps may be considered equal or fuller if a Deps only has the same TxnId and 
Keys, when in fact some TxnId may cover different keys. However, any Deps 
associated with a given Commit Ballot that has been sliced correctly would 
satisfy this property safely with only the above checks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19297) Accord: RejectBefore must be up-to-date on joining nodes before ready to coordinate

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19297:
--

 Summary: Accord: RejectBefore must be up-to-date on joining nodes 
before ready to coordinate
 Key: CASSANDRA-19297
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19297
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


The exclusive sync point used to join the shard will be known by a majority of 
the existing replicas, but in the event the quorum changes and the new replica 
has not recorded the exclusive sync point this might in principle lead to 
failing to reject a TxnId that should be rejected.

Simple fix, but introduce tests to corroborate this issue, and see if can 
reproduce in burn test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19296) Accord: Improve and Document CoordinateShardDurable semantics

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19296:
--

 Summary: Accord: Improve and Document CoordinateShardDurable 
semantics
 Key: CASSANDRA-19296
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19296
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


Firstly, CoordinateShardDurable should retry in future epochs if necessary. In 
principle this isn't a problem; the next CoordinateShardDurable should pick up 
where this one left-off. But we should consider the logic very carefully, and 
anyway not leave dangling waits.

We should also carefully consider the special-case where replicas are 
bootstrapping in the future and we are coordinating the shard durability. This 
replica should safely participate in the sync point, waiting for only the 
transactions it requires to be replicated to it. So this should also function 
as expected, but this should be tested and documented carefully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-19294) Accord: Remove concept of non-participating home keys

2024-01-24 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-19294:
---
Component/s: Accord

> Accord: Remove concept of non-participating home keys
> -
>
> Key: CASSANDRA-19294
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19294
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Priority: Normal
>
> This concept causes a lot more trouble than it is worth, complicating a lot 
> of logic particularly around state GC, and forbids coordinator-only members 
> of the cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19295) Accord: Remove concept of covering() for PartialX; assume access to FullRoute for most behaviours

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19295:
--

 Summary: Accord: Remove concept of covering() for PartialX; assume 
access to FullRoute for most behaviours
 Key: CASSANDRA-19295
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19295
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


This is a costly abstraction to compute particularly as topologies grow, and 
only complicates the internal logic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19294) Accord: Remove concept of non-participating home keys

2024-01-24 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19294:
--

 Summary: Accord: Remove concept of non-participating home keys
 Key: CASSANDRA-19294
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19294
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict Elliott Smith


This concept causes a lot more trouble than it is worth, complicating a lot of 
logic particularly around state GC, and forbids coordinator-only members of the 
cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19288) Accord: Asynchronous reads may be unsafe

2024-01-23 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19288:
--

 Summary: Accord: Asynchronous reads may be unsafe
 Key: CASSANDRA-19288
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19288
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


In principle we should invalidate asynchronous reads before they complete if 
the data they read may be invalid, but this anyway causes faults when we permit 
them to occur in accord-core. We can and perhaps should simply ensure the reads 
are issued against an sstable/memtable snapshot taken by the command store, as 
this is lower cost and more robust. Otherwise we should investigate what issue 
asynchronous reads cause.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-19287) Accord: Ensure no storage timestamp clashes across Accord bootstrap

2024-01-23 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-19287:
---
Description: 
At present bootstrap does not propagate the local metadata associated with the 
shard that is being bootstrapped. However, due to the many-to-one relation 
between Accord timestamps and sstable/C* timestamps it is possible for 
transactions with the same sstable timestamp to occur either side of a 
bootstrap for a single key. We can resolve this by either
 # Propagating the timestamp state from Accord system tables alongside bootstrap
 # Making the relationship between timestamps 1:1, by
 ** assigning each replica in the cluster a range of timestamps to allocate for 
Accord transactions
 ** permit timestamps larger than 8 bytes
 # Prevent timestamp clashes across a SyncPoint

  was:At present bootstrap does not propagate the local metadata associated 
with the shard that is being bootstrapped. However, due to the many-to-one 
relation between Accord timestamps and sstable/C* timestamps it is possible for 
transactions with the same sstable timestamp to occur either side of a 
bootstrap for a single key. We can resolve this by either 


> Accord: Ensure no storage timestamp clashes across Accord bootstrap
> ---
>
> Key: CASSANDRA-19287
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19287
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Priority: Normal
>
> At present bootstrap does not propagate the local metadata associated with 
> the shard that is being bootstrapped. However, due to the many-to-one 
> relation between Accord timestamps and sstable/C* timestamps it is possible 
> for transactions with the same sstable timestamp to occur either side of a 
> bootstrap for a single key. We can resolve this by either
>  # Propagating the timestamp state from Accord system tables alongside 
> bootstrap
>  # Making the relationship between timestamps 1:1, by
>  ** assigning each replica in the cluster a range of timestamps to allocate 
> for Accord transactions
>  ** permit timestamps larger than 8 bytes
>  # Prevent timestamp clashes across a SyncPoint



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19287) Accord: Ensure no storage timestamp clashes across Accord bootstrap

2024-01-23 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19287:
--

 Summary: Accord: Ensure no storage timestamp clashes across Accord 
bootstrap
 Key: CASSANDRA-19287
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19287
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord
Reporter: Benedict Elliott Smith


At present bootstrap does not propagate the local metadata associated with the 
shard that is being bootstrapped. However, due to the many-to-one relation 
between Accord timestamps and sstable/C* timestamps it is possible for 
transactions with the same sstable timestamp to occur either side of a 
bootstrap for a single key. We can resolve this by either 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-15464) Inserts to set slow due to AtomicBTreePartition for ComplexColumnData.dataSize

2023-12-20 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798900#comment-17798900
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-15464 at 12/20/23 10:18 AM:
---

I think it is likely to have been fixed by CASSANDRA-15511, although 
CASSANDRA-18125 did fix up accounting in this area in follow-up.


was (Author: benedict):
I think it is likely to have been fixed by CASSANDRA-15511

> Inserts to set slow due to AtomicBTreePartition for 
> ComplexColumnData.dataSize
> 
>
> Key: CASSANDRA-15464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15464
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Eric Jacobsen
>Priority: Normal
>
> Concurrent inserts to set can cause client timeouts and excessive CPU 
> due to compare and swap in AtomicBTreePartition for 
> ComplexColumnData.dataSize. As the length of the set gets longer, the 
> probability of doing the compare decreases.
> The problem we saw in production was with insertions into a set with 
> len(set) hundreds to thousands. Because of the semantics of what we 
> store in the set, we had not anticipated the length being more than about 10. 
> (Almost all rows have length <= 6, the largest observed was 7032. Total 
> number of rows < 4000. 3 machines were used.)
> The bad behavior we saw was all machines went to 100% cpu on all cores, and 
> clients were timing out. Our immediate solution in production was adding more 
> machines (went from 3 machines to 6 machines). The stack included 
> partitions.AtomicBTreePartition.addAllWithSizeDelta … 
> ComplexColumnData.dataSize.
> The AtomicBTreePartition code uses a Compare And Swap approach, yet the time 
> between compares is dependent on the length of the set. When the length of 
> the set is long, with concurrent updates, each loop is unlikely to make 
> forward progress and can be delayed looping.
> Here is one example call stack:
> {noformat}
> "SharedPool-Worker-40" #167 daemon prio=10 os_prio=0 tid=0x7f9bb4032800 
> nid=0x2ee5 runnable [0x7f9b067f4000]
> java.lang.Thread.State: RUNNABLE
> at 
> org.apache.cassandra.db.rows.ComplexColumnData.dataSize(ComplexColumnData.java:114)
> at org.apache.cassandra.db.rows.BTreeRow.dataSize(BTreeRow.java:373)
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:292)
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:235)
> at org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:159)
> at org.apache.cassandra.utils.btree.TreeBuilder.update(TreeBuilder.java:73)
> at org.apache.cassandra.utils.btree.BTree.update(BTree.java:181)
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition.addAllWithSizeDelta(AtomicBTreePartition.java:155)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:254)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1204)
> at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:573)
> at org.apache.cassandra.db.Keyspace.applyFuture(Keyspace.java:384)
> at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:205)
> at org.apache.cassandra.hints.Hint.applyFuture(Hint.java:99)
> at org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:95)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In a test program to repro the problem, we raise the number of concurrent 
> users and lower the think time between queries. Updating elements of 
> low-length sets can occur without errors, and with long-length sets, clients 
> time out with errors and there are periods with all cores 99.x% CPU and with 
> jstack shows time going to  ComplexColumnData.dataSize.
> Here is the schema. Our long term application solution was to just have the 
> set elements be part of the primary key and avoid using set, thus 
> guaranteeing the code does not go through ComplexColumnData.dataSize
> {noformat}
> CREATE TABLE x.x (
>  x int PRIMARY KEY,
>  y set ) ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To

[jira] [Commented] (CASSANDRA-15464) Inserts to set slow due to AtomicBTreePartition for ComplexColumnData.dataSize

2023-12-20 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798900#comment-17798900
 ] 

Benedict Elliott Smith commented on CASSANDRA-15464:


I think it is likely to have been fixed by CASSANDRA-15511

> Inserts to set slow due to AtomicBTreePartition for 
> ComplexColumnData.dataSize
> 
>
> Key: CASSANDRA-15464
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15464
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Eric Jacobsen
>Priority: Normal
>
> Concurrent inserts to set can cause client timeouts and excessive CPU 
> due to compare and swap in AtomicBTreePartition for 
> ComplexColumnData.dataSize. As the length of the set gets longer, the 
> probability of doing the compare decreases.
> The problem we saw in production was with insertions into a set with 
> len(set) hundreds to thousands. Because of the semantics of what we 
> store in the set, we had not anticipated the length being more than about 10. 
> (Almost all rows have length <= 6, the largest observed was 7032. Total 
> number of rows < 4000. 3 machines were used.)
> The bad behavior we saw was all machines went to 100% cpu on all cores, and 
> clients were timing out. Our immediate solution in production was adding more 
> machines (went from 3 machines to 6 machines). The stack included 
> partitions.AtomicBTreePartition.addAllWithSizeDelta … 
> ComplexColumnData.dataSize.
> The AtomicBTreePartition code uses a Compare And Swap approach, yet the time 
> between compares is dependent on the length of the set. When the length of 
> the set is long, with concurrent updates, each loop is unlikely to make 
> forward progress and can be delayed looping.
> Here is one example call stack:
> {noformat}
> "SharedPool-Worker-40" #167 daemon prio=10 os_prio=0 tid=0x7f9bb4032800 
> nid=0x2ee5 runnable [0x7f9b067f4000]
> java.lang.Thread.State: RUNNABLE
> at 
> org.apache.cassandra.db.rows.ComplexColumnData.dataSize(ComplexColumnData.java:114)
> at org.apache.cassandra.db.rows.BTreeRow.dataSize(BTreeRow.java:373)
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:292)
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:235)
> at org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:159)
> at org.apache.cassandra.utils.btree.TreeBuilder.update(TreeBuilder.java:73)
> at org.apache.cassandra.utils.btree.BTree.update(BTree.java:181)
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition.addAllWithSizeDelta(AtomicBTreePartition.java:155)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:254)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1204)
> at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:573)
> at org.apache.cassandra.db.Keyspace.applyFuture(Keyspace.java:384)
> at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:205)
> at org.apache.cassandra.hints.Hint.applyFuture(Hint.java:99)
> at org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:95)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In a test program to repro the problem, we raise the number of concurrent 
> users and lower the think time between queries. Updating elements of 
> low-length sets can occur without errors, and with long-length sets, clients 
> time out with errors and there are periods with all cores 99.x% CPU and with 
> jstack shows time going to  ComplexColumnData.dataSize.
> Here is the schema. Our long term application solution was to just have the 
> set elements be part of the primary key and avoid using set, thus 
> guaranteeing the code does not go through ComplexColumnData.dataSize
> {noformat}
> CREATE TABLE x.x (
>  x int PRIMARY KEY,
>  y set ) ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-19045) Various Accord protocol fixes and improvements to validation

2023-11-21 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-19045:
--

 Summary: Various Accord protocol fixes and improvements to 
validation
 Key: CASSANDRA-19045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19045
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict Elliott Smith


Improve validation, and address various faults discovered by the improved 
validation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-19045) Various Accord protocol fixes and improvements to validation

2023-11-21 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-19045:
---
Component/s: Accord

> Various Accord protocol fixes and improvements to validation
> 
>
> Key: CASSANDRA-19045
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19045
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Priority: Normal
>
> Improve validation, and address various faults discovered by the improved 
> validation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18988) Updating the column of a non-existent row in an Accord transaction results in Atomicity violation

2023-11-01 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781661#comment-17781661
 ] 

Benedict Elliott Smith commented on CASSANDRA-18988:


Thanks. I'll let [~maedhroz] figure out the shape of what we think should 
happen, and perhaps this discussion can be taken on list, since it is API 
impacting and what we do today is correct - but the specifics of how this 
impacts e.g. row markers perhaps warrants discussion. For instance, I might 
expect the result of the first operation to look like this:

partition | account_id | balance
---++-
   default |  0 | 100
   default |  1 |  90
   default |  3 |  null

> Updating the column of a non-existent row in an Accord transaction results in 
> Atomicity violation
> -
>
> Key: CASSANDRA-18988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18988
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Luis E Fernandez
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> *System configuration and information:*
> Single node Cassandra with Accord transactions enabled running on docker
> Built from commit: 
> [a7cd114435704b988c81f47ef53d0bfd6441f38b|https://github.com/apache/cassandra/commit/a7cd114435704b988c81f47ef53d0bfd6441f38b]
> CQLSH: [cqlsh 6.2.0 | Cassandra 5.0-alpha2-SNAPSHOT | CQL spec 3.4.7 | Native 
> protocol v5]
>  
> *Steps to reproduce in CQLSH:*
> {code:java}
> CREATE KEYSPACE accord WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'} AND durable_writes = true;{code}
> {code:java}
> CREATE TABLE accord.accounts (
>     partition text,
>     account_id int,
>     balance int,
>     PRIMARY KEY (partition, account_id)
> ) WITH CLUSTERING ORDER BY (account_id ASC);
> {code}
> {code:java}
> BEGIN TRANSACTION
> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
> ('default', 0, 100);
> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
> ('default', 1, 100);
> COMMIT TRANSACTION;{code}
> atomicity bug happens after executing the following statement:
> Based on [Cassandra 
> documentation|https://cassandra.apache.org/doc/4.1/cassandra/cql/dml.html#update-statement]
>  regarding the use of UPDATE statements, I expect the result of this 
> transaction to be the insertion of a new account (\{ account_id: 3, balance: 
> 10 }). The total balance across the three (3) accounts should be maintained 
> (200). After executing the below transaction, the total number of accounts 
> remains at two (2) and the total balance drops to 190. Basically, it appears 
> as if only one half of the transaction proceeds.
> {code:java}
> BEGIN TRANSACTION
> UPDATE accord.accounts
> SET balance -= 10
> WHERE
>   partition = 'default'
>   AND account_id = 1;
> UPDATE accord.accounts
> SET balance += 10
> WHERE
>   partition = 'default'
>   AND account_id = 3;
> COMMIT TRANSACTION;{code}
> Bug / Error:
> ==
> The result of performing a table read after executing the buggy transaction 
> is:
> {code:java}
>  partition | account_id | balance
> ---++-
>    default |          0 |     100
>    default |          1 |      90
> {code}
> {color:#172b4d}Note that the above transactions are not possible without a 
> transaction block because only counter type columns can be updated with += or 
> -= syntax in normal (non-transactional) cql statements. Using counter type 
> columns also results in a separate, related bug: 
> [CASSANDRA-18987|https://issues.apache.org/jira/browse/CASSANDRA-18987]{color}
> {color:#172b4d}This was found while testing Accord transactions with 
> [~henrik.ingo] and team.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-18988) Updating the column of a non-existent row in an Accord transaction results in Atomicity violation

2023-11-01 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781661#comment-17781661
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-18988 at 11/1/23 9:12 AM:
-

Thanks. I'll let [~maedhroz] figure out the shape of what we think should 
happen and how that relates to what happens today, and then perhaps this 
discussion can be taken on list. It is API impacting, and what we do today is 
correct - but the specifics of how this impacts e.g. row markers perhaps 
warrants discussion. For instance, I might expect the result of the first 
operation to look like this:

partition | account_id | balance
---++-
   default |  0 | 100
   default |  1 |  90
   default |  3 |  null


was (Author: benedict):
Thanks. I'll let [~maedhroz] figure out the shape of what we think should 
happen, and perhaps this discussion can be taken on list, since it is API 
impacting and what we do today is correct - but the specifics of how this 
impacts e.g. row markers perhaps warrants discussion. For instance, I might 
expect the result of the first operation to look like this:

partition | account_id | balance
---++-
   default |  0 | 100
   default |  1 |  90
   default |  3 |  null

> Updating the column of a non-existent row in an Accord transaction results in 
> Atomicity violation
> -
>
> Key: CASSANDRA-18988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18988
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Luis E Fernandez
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 5.x
>
>
> *System configuration and information:*
> Single node Cassandra with Accord transactions enabled running on docker
> Built from commit: 
> [a7cd114435704b988c81f47ef53d0bfd6441f38b|https://github.com/apache/cassandra/commit/a7cd114435704b988c81f47ef53d0bfd6441f38b]
> CQLSH: [cqlsh 6.2.0 | Cassandra 5.0-alpha2-SNAPSHOT | CQL spec 3.4.7 | Native 
> protocol v5]
>  
> *Steps to reproduce in CQLSH:*
> {code:java}
> CREATE KEYSPACE accord WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'} AND durable_writes = true;{code}
> {code:java}
> CREATE TABLE accord.accounts (
>     partition text,
>     account_id int,
>     balance int,
>     PRIMARY KEY (partition, account_id)
> ) WITH CLUSTERING ORDER BY (account_id ASC);
> {code}
> {code:java}
> BEGIN TRANSACTION
> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
> ('default', 0, 100);
> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
> ('default', 1, 100);
> COMMIT TRANSACTION;{code}
> atomicity bug happens after executing the following statement:
> Based on [Cassandra 
> documentation|https://cassandra.apache.org/doc/4.1/cassandra/cql/dml.html#update-statement]
>  regarding the use of UPDATE statements, I expect the result of this 
> transaction to be the insertion of a new account (\{ account_id: 3, balance: 
> 10 }). The total balance across the three (3) accounts should be maintained 
> (200). After executing the below transaction, the total number of accounts 
> remains at two (2) and the total balance drops to 190. Basically, it appears 
> as if only one half of the transaction proceeds.
> {code:java}
> BEGIN TRANSACTION
> UPDATE accord.accounts
> SET balance -= 10
> WHERE
>   partition = 'default'
>   AND account_id = 1;
> UPDATE accord.accounts
> SET balance += 10
> WHERE
>   partition = 'default'
>   AND account_id = 3;
> COMMIT TRANSACTION;{code}
> Bug / Error:
> ==
> The result of performing a table read after executing the buggy transaction 
> is:
> {code:java}
>  partition | account_id | balance
> ---++-
>    default |          0 |     100
>    default |          1 |      90
> {code}
> {color:#172b4d}Note that the above transactions are not possible without a 
> transaction block because only counter type columns can be updated with += or 
> -= syntax in normal (non-transactional) cql statements. Using counter type 
> columns also results in a separate, related bug: 
> [CASSANDRA-18987|https://issues.apache.org/jira/browse/CASSANDRA-18987]{color}
> {color:#172b4d}This was found while testing Accord transactions with 
> [~henrik.ingo] and team.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional

[jira] [Commented] (CASSANDRA-18988) Updating the column of a non-existent row in an Accord transaction results in Atomicity violation

2023-10-31 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781534#comment-17781534
 ] 

Benedict Elliott Smith commented on CASSANDRA-18988:


Thanks for the report [~antithesis-luis]. [~maedhroz] can you take a look?

I think that technically this outcome is correct: {{null}} + 10 == {{null}}. 
Whether a partition should be inserted for this implicit delete I don't know, 
but the result of this should certainly be {{null}}.

It's worth taking a closer look at the semantics either way.

[~antithesis-luis] can you confirm if you see the behaviour with {{UPDATE set 
balance = 10}}, rather than {{+= 10}}? This would be a more serious problem.

> Updating the column of a non-existent row in an Accord transaction results in 
> Atomicity violation
> -
>
> Key: CASSANDRA-18988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18988
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Luis E Fernandez
>Priority: Normal
> Fix For: 5.x
>
>
> *System configuration and information:*
> Single node Cassandra with Accord transactions enabled running on docker
> Built from commit: 
> [a7cd114435704b988c81f47ef53d0bfd6441f38b|https://github.com/apache/cassandra/commit/a7cd114435704b988c81f47ef53d0bfd6441f38b]
> CQLSH: [cqlsh 6.2.0 | Cassandra 5.0-alpha2-SNAPSHOT | CQL spec 3.4.7 | Native 
> protocol v5]
>  
> *Steps to reproduce in CQLSH:*
> {code:java}
> CREATE KEYSPACE accord WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'} AND durable_writes = true;{code}
> {code:java}
> CREATE TABLE accord.accounts (
>     partition text,
>     account_id int,
>     balance int,
>     PRIMARY KEY (partition, account_id)
> ) WITH CLUSTERING ORDER BY (account_id ASC);
> {code}
> {code:java}
> BEGIN TRANSACTION
> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
> ('default', 0, 100);
> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
> ('default', 1, 100);
> COMMIT TRANSACTION;{code}
> atomicity bug happens after executing the following statement:
> Based on [Cassandra 
> documentation|https://cassandra.apache.org/doc/4.1/cassandra/cql/dml.html#update-statement]
>  regarding the use of UPDATE statements, I expect the result of this 
> transaction to be the insertion of a new account (\{ account_id: 3, balance: 
> 10 }). The total balance across the three (3) accounts should be maintained 
> (200). After executing the below transaction, the total number of accounts 
> remains at two (2) and the total balance drops to 190. Basically, it appears 
> as if only one half of the transaction proceeds.
> {code:java}
> BEGIN TRANSACTION
> UPDATE accord.accounts
> SET balance -= 10
> WHERE
>   partition = 'default'
>   AND account_id = 1;
> UPDATE accord.accounts
> SET balance += 10
> WHERE
>   partition = 'default'
>   AND account_id = 3;
> COMMIT TRANSACTION;{code}
> Bug / Error:
> ==
> The result of performing a table read after executing the buggy transaction 
> is:
> {code:java}
>  partition | account_id | balance
> ---++-
>    default |          0 |     100
>    default |          1 |      90
> {code}
> {color:#172b4d}Note that the above transactions are not possible without a 
> transaction block because only counter type columns can be updated with += or 
> -= syntax in normal (non-transactional) cql statements. Using counter type 
> columns also results in a separate, related bug: 
> [CASSANDRA-18987|https://issues.apache.org/jira/browse/CASSANDRA-18987]{color}
> {color:#172b4d}This was found while testing Accord transactions with 
> [~henrik.ingo] and team.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-18987) Using counter column type in Accord transactions leads to Atomicity / Consistency violations

2023-10-31 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781533#comment-17781533
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-18987 at 10/31/23 9:52 PM:
--

Thanks for the report. Counter columns are inherently not transactional, and I 
don't know why they are permitted to be included in transactions. I assume it's 
an oversight. [~maedhroz] can you take a look?


was (Author: benedict):
Thanks for the report. Counter columns are inherently not transactional, and I 
don't know why they are permitted to be included in transactions. [~maedhroz] 
can you take a look?

> Using counter column type in Accord transactions leads to Atomicity / 
> Consistency violations
> 
>
> Key: CASSANDRA-18987
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18987
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Luis E Fernandez
>Priority: Normal
> Fix For: 5.x
>
>
> *System configuration and information:*
> Single node Cassandra with Accord transactions enabled running on docker
> Built from commit: 
> [a7cd114435704b988c81f47ef53d0bfd6441f38b|https://github.com/apache/cassandra/commit/a7cd114435704b988c81f47ef53d0bfd6441f38b]
> CQLSH: [cqlsh 6.2.0 | Cassandra 5.0-alpha2-SNAPSHOT | CQL spec 3.4.7 | Native 
> protocol v5]
>  
> *Steps to reproduce in CQLSH:*
> {code:java}
> CREATE KEYSPACE accord WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'} AND durable_writes = true;{code}
> {code:java}
> CREATE TABLE accord.accounts (
>     partition text,
>     account_id int,
>     balance counter,
>     PRIMARY KEY (partition, account_id)
> ) WITH CLUSTERING ORDER BY (account_id ASC);
> {code}
> {code:java}
> BEGIN TRANSACTION
>   UPDATE accord.accounts
> SET balance += 100
>   WHERE
> partition = 'default'
> AND account_id = 0;
>   UPDATE accord.accounts
> SET balance += 100
>   WHERE
> partition = 'default'
> AND account_id =1;
> COMMIT TRANSACTION;{code}
> bug happens after executing the following statement:
> Based on [Cassandra 
> documentation|https://cassandra.apache.org/doc/trunk/cassandra/developing/cql/types.html#counters]
>  regarding the use of counters, I expect the following results:
> Transaction A: subtract 10 from the balance of account 1 (total ending 
> balance of 90) and add 10 to the balance of account 0 (total ending balance 
> of 110)
> {*}Bug A{*}: Neither account's balance is updated and the state of the rows 
> is left unchanged
> {code:java}
> /* Transaction A */
> BEGIN TRANSACTION
> UPDATE accord.accounts
> SET balance -= 10
> WHERE
>   partition = 'default'
>   AND account_id = 1;
> UPDATE accord.accounts
> SET balance += 10
> WHERE
>   partition = 'default'
>   AND account_id = 0;
> COMMIT TRANSACTION;{code}
> Transaction B: subtract 10 from the balance of account 1 (total ending 
> balance of 90) and add 10 to the balance of a new account 2 (total ending 
> balance of 10)
> {*}Bug B{*}: Only the new account 2 is created. The balance of account 1 is 
> left unchanged
> {code:java}
> /* Transaction B */
> BEGIN TRANSACTION
> UPDATE accord.accounts
> SET balance -= 10
> WHERE
>   partition = 'default'
>   AND account_id = 1;
> UPDATE accord.accounts
> SET balance += 10
> WHERE
>   partition = 'default'
>   AND account_id = 2;
> COMMIT TRANSACTION;{code}
> Bug / Error:
> ==
> The result of performing a table read after executing each buggy transaction 
> is:
> {code:java}
> /* Transaction / Bug A */
>  partition | account_id | balance
> ---++-
>    default |          0 |     100
>    default |          1 |     100{code}
> {code:java}
> /* Transaction / Bug B */
>  partition | account_id | balance
> ---++-
>    default |          0 |     100
>    default |          1 |     100
>    default |          2 |      10 {code}
> Note that performing the above statements without transaction blocks works as 
> expected.
> {color:#172b4d}This was found while testing Accord transactions with 
> [~henrik.ingo] and team.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18987) Using counter column type in Accord transactions leads to Atomicity / Consistency violations

2023-10-31 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781533#comment-17781533
 ] 

Benedict Elliott Smith commented on CASSANDRA-18987:


Thanks for the report. Counter columns are inherently not transactional, and I 
don't know why they are permitted to be included in transactions. [~maedhroz] 
can you take a look?

> Using counter column type in Accord transactions leads to Atomicity / 
> Consistency violations
> 
>
> Key: CASSANDRA-18987
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18987
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Luis E Fernandez
>Priority: Normal
> Fix For: 5.x
>
>
> *System configuration and information:*
> Single node Cassandra with Accord transactions enabled running on docker
> Built from commit: 
> [a7cd114435704b988c81f47ef53d0bfd6441f38b|https://github.com/apache/cassandra/commit/a7cd114435704b988c81f47ef53d0bfd6441f38b]
> CQLSH: [cqlsh 6.2.0 | Cassandra 5.0-alpha2-SNAPSHOT | CQL spec 3.4.7 | Native 
> protocol v5]
>  
> *Steps to reproduce in CQLSH:*
> {code:java}
> CREATE KEYSPACE accord WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'} AND durable_writes = true;{code}
> {code:java}
> CREATE TABLE accord.accounts (
>     partition text,
>     account_id int,
>     balance counter,
>     PRIMARY KEY (partition, account_id)
> ) WITH CLUSTERING ORDER BY (account_id ASC);
> {code}
> {code:java}
> BEGIN TRANSACTION
>   UPDATE accord.accounts
> SET balance += 100
>   WHERE
> partition = 'default'
> AND account_id = 0;
>   UPDATE accord.accounts
> SET balance += 100
>   WHERE
> partition = 'default'
> AND account_id =1;
> COMMIT TRANSACTION;{code}
> bug happens after executing the following statement:
> Based on [Cassandra 
> documentation|https://cassandra.apache.org/doc/trunk/cassandra/developing/cql/types.html#counters]
>  regarding the use of counters, I expect the following results:
> Transaction A: subtract 10 from the balance of account 1 (total ending 
> balance of 90) and add 10 to the balance of account 0 (total ending balance 
> of 110)
> {*}Bug A{*}: Neither account's balance is updated and the state of the rows 
> is left unchanged
> {code:java}
> /* Transaction A */
> BEGIN TRANSACTION
> UPDATE accord.accounts
> SET balance -= 10
> WHERE
>   partition = 'default'
>   AND account_id = 1;
> UPDATE accord.accounts
> SET balance += 10
> WHERE
>   partition = 'default'
>   AND account_id = 0;
> COMMIT TRANSACTION;{code}
> Transaction B: subtract 10 from the balance of account 1 (total ending 
> balance of 90) and add 10 to the balance of a new account 2 (total ending 
> balance of 10)
> {*}Bug B{*}: Only the new account 2 is created. The balance of account 1 is 
> left unchanged
> {code:java}
> /* Transaction B */
> BEGIN TRANSACTION
> UPDATE accord.accounts
> SET balance -= 10
> WHERE
>   partition = 'default'
>   AND account_id = 1;
> UPDATE accord.accounts
> SET balance += 10
> WHERE
>   partition = 'default'
>   AND account_id = 2;
> COMMIT TRANSACTION;{code}
> Bug / Error:
> ==
> The result of performing a table read after executing each buggy transaction 
> is:
> {code:java}
> /* Transaction / Bug A */
>  partition | account_id | balance
> ---++-
>    default |          0 |     100
>    default |          1 |     100{code}
> {code:java}
> /* Transaction / Bug B */
>  partition | account_id | balance
> ---++-
>    default |          0 |     100
>    default |          1 |     100
>    default |          2 |      10 {code}
> Note that performing the above statements without transaction blocks works as 
> expected.
> {color:#172b4d}This was found while testing Accord transactions with 
> [~henrik.ingo] and team.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-10-06 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17772546#comment-17772546
 ] 

Benedict Elliott Smith commented on CASSANDRA-18798:


Ok, so I have taken a quick look at the code, and I can see the problem. We 
have implemented an {{AccordUpdateParameters} that 1) sets the ClientState 
timestamp and nowInSec to 42 on the assumption that all updates will be 
computed on the replica side. 2) does not copy over the logic from 
CASUpdateParameters for ensuring list appends are performed correctly.

What I can say is that the time used for the cell path's TimeUUID definitely 
needs to be set deterministically. This could be set on the replicas using 
CommandsForKey's timestamp bounds, but it must handle the additional complexity 
of List appends a la CASUpdateParameters. If we are currently deriving these on 
the coordinator, we're going to be having a very bad time as the coordinator 
seems to always use a timestamp of {{42}}.

This is another spot where I suspect we really want to update Accord to 
generate unique HLCs, as it would simplify this a great deal.


> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Assignee: Henrik Ingo
>Priority: Normal
> Attachments: image-2023-09-26-20-05-25-846.png
>
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733

[jira] [Comment Edited] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-08-28 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759505#comment-17759505
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-18798 at 8/28/23 8:42 AM:
-

Either way this is a protocol bug, as if the insert by process 6 has a lower 
timestamp than the insert by process 7 then it should occur before, and so the 
read by process 5 should be deferred until the insert has completed.

I won't spend time debugging this as a result, as we have several known 
protocol bugs that could cause this, that we have been deferring fixing until 
now (I plan to address over the next 2-3 weeks). If you have a simulator seed 
that produces this we could perhaps confirm which protocol bug if any might 
have caused this, as it is always nice to know which protocol bugs we have 
reproductions for via which routes. 

It's great to have some further external validation that these bugs can be 
found via this form of testing.


was (Author: benedict):
Either way this is a protocol bug, as if the insert by process 6 has a lower 
timestamp than the insert by process 7 then it should occur before, and so the 
read by process 5 should be deferred until the insert has completed.

I won't spend time debugging this as a result, as we have several known 
protocol bugs that could cause this, that we have been deferring fixing until 
now (I plan to address over the next 2-3 weeks). If you have a simulator seed 
that produces this we could perhaps confirm which protocol bug if any might 
have caused this, as it is always nice to know which protocol bugs we have 
reproductions for via which routes.


> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Priority: Normal
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time

[jira] [Commented] (CASSANDRA-18798) Appending to list in Accord transactions uses insertion timestamp

2023-08-28 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759505#comment-17759505
 ] 

Benedict Elliott Smith commented on CASSANDRA-18798:


Either way this is a protocol bug, as if the insert by process 6 has a lower 
timestamp than the insert by process 7 then it should occur before, and so the 
read by process 5 should be deferred until the insert has completed.

I won't spend time debugging this as a result, as we have several known 
protocol bugs that could cause this, that we have been deferring fixing until 
now (I plan to address over the next 2-3 weeks). If you have a simulator seed 
that produces this we could perhaps confirm which protocol bug if any might 
have caused this, as it is always nice to know which protocol bugs we have 
reproductions for via which routes.


> Appending to list in Accord transactions uses insertion timestamp
> -
>
> Key: CASSANDRA-18798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18798
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Jaroslaw Kijanowski
>Priority: Normal
>
> Given the following schema:
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS accord WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 3};
> CREATE TABLE IF NOT EXISTS accord.list_append(id int PRIMARY KEY,contents 
> LIST);
> TRUNCATE accord.list_append;{code}
> And the following two possible queries executed by 10 threads in parallel:
> {code:java}
> BEGIN TRANSACTION
>   LET row = (SELECT * FROM list_append WHERE id = ?);
>   SELECT row.contents;
> COMMIT TRANSACTION;"
> BEGIN TRANSACTION
>   UPDATE list_append SET contents += ? WHERE id = ?;
> COMMIT TRANSACTION;"
> {code}
> there seems to be an issue with transaction guarantees. Here's an excerpt in 
> the edn format from a test.
> {code:java}
> {:type :invoke    :process 8    :value [[:append 5 352]]    :tid 3    :n 52   
>  :time 1692607285967116627}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 54    
> :time 1692607286078732473}
> {:type :invoke    :process 6    :value [[:append 5 553]]    :tid 5    :n 53   
>  :time 1692607286133833428}
> {:type :invoke    :process 7    :value [[:append 5 455]]    :tid 4    :n 55   
>  :time 1692607286149702511}
> {:type :ok    :process 8    :value [[:append 5 352]]    :tid 3    :n 52    
> :time 1692607286156314099}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 52    
> :time 1692607286167090389}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352]]]    :tid 1    :n 54    :time 1692607286168657534}
> {:type :invoke    :process 1    :value [[:r 5 nil]]    :tid 0    :n 51    
> :time 1692607286201762938}
> {:type :ok    :process 7    :value [[:append 5 455]]    :tid 4    :n 55    
> :time 1692607286245571513}
> {:type :invoke    :process 7    :value [[:r 5 nil]]    :tid 4    :n 56    
> :time 1692607286245655775}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 455]]]    :tid 9    :n 52    :time 1692607286253928906}
> {:type :invoke    :process 5    :value [[:r 5 nil]]    :tid 9    :n 53    
> :time 1692607286254095215}
> {:type :ok    :process 6    :value [[:append 5 553]]    :tid 5    :n 53    
> :time 1692607286266263422}
> {:type :ok    :process 1    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 0    :n 51    :time 1692607286271617955}
> {:type :ok    :process 7    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 4    :n 56    :time 1692607286271816933}
> {:type :ok    :process 5    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19 623 22 425 24 926 25 832 130 733 430 533 29 933 333 
> 537 934 538 740 139 744 938 544 42 646 749 242 546 547 548 753 450 150 349 48 
> 852 352 553 455]]]    :tid 9    :n 53    :time 1692607286281483026}
> {:type :invoke    :process 9    :value [[:r 5 nil]]    :tid 1    :n 56    
> :time 1692607286284097561}
> {:type :ok    :process 9    :value [[:r 5 [303 304 604 6 306 509 909 409 912 
> 411 514 415 719 419 19

[jira] [Commented] (CASSANDRA-18355) CEP-15: Transaction Result Serialization Efficiency

2023-08-15 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754713#comment-17754713
 ] 

Benedict Elliott Smith commented on CASSANDRA-18355:


So, it would also be nice in this patch to ensure we aren't double writing the 
transaction contents. We already persist any constant write values in the 
transaction, and don't need them to reconstruct their portion of the `Writes` - 
which for most cases will be the vast majority of a `Writes`. 

So, really, instead of `Writes` we should be persisting only what we read from 
replicas that are necessary for computing the `Writes` from the local 
`PartialTxn`. Does that make sense?

> CEP-15: Transaction Result Serialization Efficiency
> ---
>
> Key: CASSANDRA-18355
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18355
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> There are two things we probably don’t need to serialize and write to the 
> Accord state tables:
>  
> 1.) Internal/external read responses
> 2.) The full result of the transaction



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14227) Extend maximum expiration date

2023-05-22 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724840#comment-17724840
 ] 

Benedict Elliott Smith commented on CASSANDRA-14227:


Sorry, the downside of lots of Jira traffic (incl from GitHub comments) is that 
I don't check the email notifications for a high traffic ticket.

I won't have time to look at the code soon, but I trust you to have addressed 
my concerns given what you describe above. Feel free to proceed.

> Extend maximum expiration date
> --
>
> Key: CASSANDRA-14227
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14227
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Paulo Motta (Deprecated)
>Assignee: Berenguer Blasi
>Priority: Urgent
> Fix For: 5.x
>
> Attachments: C14227 Perf check 2023.03.21.pdf, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png, unnamed-1.png
>
>
> The maximum expiration timestamp that can be represented by the storage 
> engine is
> 2038-01-19T03:14:06+00:00 due to the encoding of {{localExpirationTime}} as 
> an int32.
> On CASSANDRA-14092 we added an overflow policy which rejects requests with 
> expiration above the maximum date as a temporary measure, but we should 
> remove this limitation by updating the storage engine to support at least the 
> maximum allowed TTL of 20 years.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18204) CEP-15: (C*) Add git submodule for Accord

2023-05-16 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723270#comment-17723270
 ] 

Benedict Elliott Smith commented on CASSANDRA-18204:


We have this discussion roughly once per major. If you look back through dev@ 
you'll find the last one a few years back.

I don't recall NA ever being the approved approach, though. ".x" lines are 
target versions, whereas concrete versions are the ones a fix landed in. 
There's always ambiguity over the next release, as it's sort of both. But since 
there is no 5.0 version, only 5.0-alphaN, 5.0-betaN and 5.0.0, perhaps 5.0 is 
the correct label. I forget what we landed upon last time.

> CEP-15: (C*) Add git submodule for Accord
> -
>
> Key: CASSANDRA-18204
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18204
> Project: Cassandra
>  Issue Type: Task
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 5.0
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> As talked about in dev@ thread "Intra-project dependencies”, we talked about 
> adding git submodules but before doing this had to work out a few issues 
> first; this ticket is to track this work.
> Goals
> * when checking out an older commit, or pulling in newer commits, the 
> submodule should also be updated automatically
> * release artifact must include the submodule and must be able to build 
> without issue
> * build.xml must be updated to build the submodule
> * build.xml must be updated to release the submodule jar



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-18204) CEP-15: (C*) Add git submodule for Accord

2023-05-16 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723270#comment-17723270
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-18204 at 5/16/23 7:59 PM:
-

We have this discussion roughly once per major. If you look back through dev@ 
you'll find the last one a few years back.

I don't recall NA ever being the approved approach, though. ".x" lines are 
target versions, whereas concrete versions are the ones a fix landed in. 
There's always ambiguity over the next release, as it's sort of both. But since 
there is no 5.0 version, only 5.0-alphaN, 5.0-betaN and 5.0.0, perhaps 5.0 is 
the correct label (and makes sense to me). I forget what we landed upon last 
time.

Work that has actually landed should probably be labelled as 5.0-alpha1


was (Author: benedict):
We have this discussion roughly once per major. If you look back through dev@ 
you'll find the last one a few years back.

I don't recall NA ever being the approved approach, though. ".x" lines are 
target versions, whereas concrete versions are the ones a fix landed in. 
There's always ambiguity over the next release, as it's sort of both. But since 
there is no 5.0 version, only 5.0-alphaN, 5.0-betaN and 5.0.0, perhaps 5.0 is 
the correct label. I forget what we landed upon last time.

> CEP-15: (C*) Add git submodule for Accord
> -
>
> Key: CASSANDRA-18204
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18204
> Project: Cassandra
>  Issue Type: Task
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 5.0
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> As talked about in dev@ thread "Intra-project dependencies”, we talked about 
> adding git submodules but before doing this had to work out a few issues 
> first; this ticket is to track this work.
> Goals
> * when checking out an older commit, or pulling in newer commits, the 
> submodule should also be updated automatically
> * release artifact must include the submodule and must be able to build 
> without issue
> * build.xml must be updated to build the submodule
> * build.xml must be updated to release the submodule jar



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-18523) CEP-15: (Accord) Join cluster without full transaction log

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-18523:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> CEP-15: (Accord) Join cluster without full transaction log
> --
>
> Key: CASSANDRA-18523
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18523
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
>
> Joining replicas should not require the full transaction history to 
> successfully start serving queries. This ticket introduces mechanisms for a 
> replica to join (or catch up) with a data snapshot and all transactions that 
> execute after that snapshot. This is a precursor for transaction state GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-18523) CEP-15: (Accord) Join cluster without full transaction log

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith reassigned CASSANDRA-18523:
--

Assignee: Benedict Elliott Smith

> CEP-15: (Accord) Join cluster without full transaction log
> --
>
> Key: CASSANDRA-18523
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18523
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
>
> Joining replicas should not require the full transaction history to 
> successfully start serving queries. This ticket introduces mechanisms for a 
> replica to join (or catch up) with a data snapshot and all transactions that 
> execute after that snapshot. This is a precursor for transaction state GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-18175) CEP-15: (Accord) Introduce ExclusiveSyncPoint transactions

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith reassigned CASSANDRA-18175:
--

Assignee: Benedict Elliott Smith

> CEP-15: (Accord) Introduce ExclusiveSyncPoint transactions
> --
>
> Key: CASSANDRA-18175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18175
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
>
> Introduce a mechanism for invalidating older {{TxnId}}, so that a newly 
> bootstrapped node may have a complete log as of a point in time {{TxnId}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-18175) CEP-15: (Accord) Introduce ExclusiveSyncPoint transactions

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-18175:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> CEP-15: (Accord) Introduce ExclusiveSyncPoint transactions
> --
>
> Key: CASSANDRA-18175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18175
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
>
> Introduce a mechanism for invalidating older {{TxnId}}, so that a newly 
> bootstrapped node may have a complete log as of a point in time {{TxnId}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-18524) CEP-15: (Accord) Separate durable and transient listeners

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith reassigned CASSANDRA-18524:
--

Assignee: Benedict Elliott Smith

> CEP-15: (Accord) Separate durable and transient listeners
> -
>
> Key: CASSANDRA-18524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18524
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
>
> Transient listeners should be handled differently and, ironically, should be 
> more "persistent" in that they should not disappear when we evict state from 
> cache. This patch separates listeners into `DurableAndIdempotent` and 
> `Transient` with the latter being saved in a shared global register that also 
> more easily permits us to ensure we do not invoke listeners redundantly (and 
> for listeners themselves to know if we have done so). This is also a stepping 
> stone to ensuring listeners survive cache eviction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-18524) CEP-15: (Accord) Separate durable and transient listeners

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-18524:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> CEP-15: (Accord) Separate durable and transient listeners
> -
>
> Key: CASSANDRA-18524
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18524
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict Elliott Smith
>Priority: Normal
>
> Transient listeners should be handled differently and, ironically, should be 
> more "persistent" in that they should not disappear when we evict state from 
> cache. This patch separates listeners into `DurableAndIdempotent` and 
> `Transient` with the latter being saved in a shared global register that also 
> more easily permits us to ensure we do not invoke listeners redundantly (and 
> for listeners themselves to know if we have done so). This is also a stepping 
> stone to ensuring listeners survive cache eviction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-18524) CEP-15: (Accord) Separate durable and transient listeners

2023-05-12 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-18524:
--

 Summary: CEP-15: (Accord) Separate durable and transient listeners
 Key: CASSANDRA-18524
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18524
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict Elliott Smith


Transient listeners should be handled differently and, ironically, should be 
more "persistent" in that they should not disappear when we evict state from 
cache. This patch separates listeners into `DurableAndIdempotent` and 
`Transient` with the latter being saved in a shared global register that also 
more easily permits us to ensure we do not invoke listeners redundantly (and 
for listeners themselves to know if we have done so). This is also a stepping 
stone to ensuring listeners survive cache eviction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-18523) CEP-15: (Accord) Join cluster without full transaction log

2023-05-12 Thread Benedict Elliott Smith (Jira)

Benedict Elliott Smith created CASSANDRA-18523:
--

 Summary: CEP-15: (Accord) Join cluster without full transaction log
 Key: CASSANDRA-18523
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18523
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict Elliott Smith


Joining replicas should not require the full transaction history to 
successfully start serving queries. This ticket introduces mechanisms for a 
replica to join (or catch up) with a data snapshot and all transactions that 
execute after that snapshot. This is a precursor for transaction state GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-18171) CEP-15: (Accord) Faster SimpleProgressLog and BurnTest

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith reassigned CASSANDRA-18171:
--

Assignee: Benedict Elliott Smith

> CEP-15: (Accord) Faster SimpleProgressLog and BurnTest
> --
>
> Key: CASSANDRA-18171
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18171
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
>
> Some general efficiency improvements, most notably affecting 
> `SimpleProgressLog`, to manage the list of transactions we expect progress on 
> rather than polling all transactions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-18174) CEP-15: (Accord/C*) Introduce range transactions

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith reassigned CASSANDRA-18174:
--

Assignee: Benedict Elliott Smith

> CEP-15: (Accord/C*) Introduce range transactions
> 
>
> Key: CASSANDRA-18174
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18174
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
>
> Support range transactions in Accord, to facilitate bootstrap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-18172) CEP-15: (Accord/C*) Refactor Timestamp/TxnId

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-18172:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> CEP-15: (Accord/C*) Refactor Timestamp/TxnId
> 
>
> Key: CASSANDRA-18172
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18172
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict Elliott Smith
>Priority: Normal
>
> Reduce the amount of storage required for Timestamp and TxnId by compressing 
> epoch to 48 bits, and real/logical to a single 64-bit HLC, while also 
> supporting flag carrier bits for communicating protocol state information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-18172) CEP-15: (Accord/C*) Refactor Timestamp/TxnId

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith reassigned CASSANDRA-18172:
--

Assignee: Benedict Elliott Smith

> CEP-15: (Accord/C*) Refactor Timestamp/TxnId
> 
>
> Key: CASSANDRA-18172
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18172
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
>
> Reduce the amount of storage required for Timestamp and TxnId by compressing 
> epoch to 48 bits, and real/logical to a single 64-bit HLC, while also 
> supporting flag carrier bits for communicating protocol state information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-18171) CEP-15: (Accord) Faster SimpleProgressLog and BurnTest

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-18171:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> CEP-15: (Accord) Faster SimpleProgressLog and BurnTest
> --
>
> Key: CASSANDRA-18171
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18171
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict Elliott Smith
>Priority: Normal
>
> Some general efficiency improvements, most notably affecting 
> `SimpleProgressLog`, to manage the list of transactions we expect progress on 
> rather than polling all transactions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-18174) CEP-15: (Accord/C*) Introduce range transactions

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-18174:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> CEP-15: (Accord/C*) Introduce range transactions
> 
>
> Key: CASSANDRA-18174
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18174
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord
>Reporter: Benedict Elliott Smith
>Priority: Normal
>
> Support range transactions in Accord, to facilitate bootstrap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-18173) CEP-15: (Accord/C*) Introduce RangeDeps

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-18173:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> CEP-15: (Accord/C*) Introduce RangeDeps
> ---
>
> Key: CASSANDRA-18173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18173
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict Elliott Smith
>Priority: Normal
>
> In order to support range transactions, we must be able to separately manage 
> dependencies that cover ranges rather than specific keys. This patch splits 
> {{Deps}} into {{KeyDeps}} and {{RangeDeps}}, while introducing a new 
> {{SearchableRangeList}} structure for efficiently looking up range 
> intersections.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-18173) CEP-15: (Accord/C*) Introduce RangeDeps

2023-05-12 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith reassigned CASSANDRA-18173:
--

Assignee: Benedict Elliott Smith

> CEP-15: (Accord/C*) Introduce RangeDeps
> ---
>
> Key: CASSANDRA-18173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18173
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
>
> In order to support range transactions, we must be able to separately manage 
> dependencies that cover ranges rather than specific keys. This patch splits 
> {{Deps}} into {{KeyDeps}} and {{RangeDeps}}, while introducing a new 
> {{SearchableRangeList}} structure for efficiently looking up range 
> intersections.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18484) FunctionCall can throw more specific exceptions

2023-04-27 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17717149#comment-17717149
 ] 

Benedict Elliott Smith commented on CASSANDRA-18484:


{{InvalidRequestException}} isn't a checked exception - it's a special case of 
{{RuntimeException}}

> FunctionCall can throw more specific exceptions
> ---
>
> Key: CASSANDRA-18484
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18484
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Hao Zhong
>Priority: Normal
>
> FunctionCall has the following code:
> {code:java}
>  private static ByteBuffer executeInternal(ProtocolVersion protocolVersion, 
> ScalarFunction fun, List params) throws InvalidRequestException   
> {
> ByteBuffer result = fun.execute(protocolVersion, params);
> try
> {
> // Check the method didn't lied on it's declared return type
> if (result != null)
> fun.returnType().validate(result);
> return result;
> }
> catch (MarshalException e)
> {
> throw new RuntimeException(String.format("Return of function %s 
> (%s) is not a valid value for its declared return type %s",
>  fun, 
> ByteBufferUtil.bytesToHex(result), fun.returnType().asCQL3Type()), e);
> }
> }
> {code}
> When validate throws MarshalException, it rethrows  RuntimeException. Other 
> methods throw more specific exceptions. For example, BytesConversionFcts 
> throws 
> {color:#00}InvalidRequestException:{color}
> {code:java}
> public ByteBuffer execute(ProtocolVersion protocolVersion, List 
> parameters)
> {
> ByteBuffer val = parameters.get(0);
> if (val != null)
> {
> try
> {
> toType.getType().validate(val);
> }
> catch (MarshalException e)
> {
> throw new InvalidRequestException(String.format("In call 
> to function %s, value 0x%s is not a " +
> "valid 
> binary representation for type %s",
> name, 
> ByteBufferUtil.bytesToHex(val), toType));
> }
> }
> return val;
> }
>  {code}
> {color:#00}{color:#00}As another example, Validation also rethrows 
> this exception:{color}{color}
> {code:java}
> public static void validateKey(TableMetadata metadata, ByteBuffer key)
> {
> ...
> try
> {
> metadata.partitionKeyType.validate(key);
> }
> catch (MarshalException e)
> {
> throw new InvalidRequestException(e.getMessage());
> }
> }
>    {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18470) Average of "decimal" values rounds the average if all inputs are integers

2023-04-21 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714905#comment-17714905
 ] 

Benedict Elliott Smith commented on CASSANDRA-18470:


Oof, that is a pretty serious bug IMO, and probably deserves its own ticket. 
[~ifesdjeen], [~blambov], [~blerer]: this appears to have been introduced by 
CASSANDRA-12417, would any of you like to have a look?

> Average of "decimal" values rounds the average if all inputs are integers
> -
>
> Key: CASSANDRA-18470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18470
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> When running the AVG aggregator on "decimal" values, each value is an 
> arbitrary-precision number which may be an integer or fractional, but it is 
> expected that the average would be, in general, fractional. But it turns out 
> that if all the values are integer *without* a ".0", the aggregator sums them 
> up as integers and the final division returns an integer too instead of the 
> fractional response expected from a "decimal" value.
> For example:
>  # AVG of {{decimal}} values 1.0 and 2.0 returns 1.5, as expected.
>  # AVG of 1.0 and 2 or 1 and 2.0 also return 1.5.
>  # But AVG of 1 and 2 returns... 1. This is wrong. The user asked for the 
> average to be a "decimal", not a "varint", so there is no reason why it 
> should be rounded up to be an integer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-18470) Average of "decimal" values rounds the average if all inputs are integers

2023-04-20 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714573#comment-17714573
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-18470 at 4/20/23 12:35 PM:
--

I think this is ambiguous to be honest. In general we have very inadequately 
both _considered_ and _documented_ our behaviour for these kinds of features 
and data types. However, it is not immediately obvious this behaviour is 
_incorrect_ since we do not ask the user to specify a level of precision of the 
output, and since we support arbitrary precision we have to make some decision 
based on the inputs, and in this case neither parameter has any fractional 
component, so the result is rounded to the same.

There's an argument to be made that this is really inappropriate for an 
aggregation, as the order in which values occur in the aggregation affects the 
result. But I think the correct solution is probably to permit a precision to 
be provided with the operator. We could plausibly also pick a default precision 
that is non-zero, though this might constrain the precision below an acceptable 
level for some workloads. We could permit the user to configure a default 
precision for this operator, and/or use the default precision as a lower bound 
only.

Probably our implementation is wrong, though, given this behaviour. It seems 
that we assume we have good precision and therefore recompute the average on 
each new datum, as opposed to maintaining a running sum and count. This would 
also solve the problem of the order of provision modifying the output.


was (Author: benedict):
I think this is ambiguous to be honest. In general we have very inadequately 
both _considered_ and _documented_ our behaviour for these kinds of features 
and data types. However, it is not immediately obvious this behaviour is 
_incorrect_ since we do not ask the user to specify a level of precision of the 
output, and since we support arbitrary precision we have to make some decision 
based on the inputs, and in this case neither parameter has any fractional 
component, so the result is rounded to the same.

There's an argument to be made that this is really inappropriate for an 
aggregation, as the order in which values occur in the aggregation affects the 
result. But I think the correct solution is probably to permit a precision to 
be provided with the operator. We could plausibly also pick a default precision 
that is non-zero, though this might constrain the precision below an acceptable 
level for some workloads. We could permit the user to configure a default 
precision for this operator, and/or use the default precision as a lower bound 
only.

> Average of "decimal" values rounds the average if all inputs are integers
> -
>
> Key: CASSANDRA-18470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18470
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> When running the AVG aggregator on "decimal" values, each value is an 
> arbitrary-precision number which may be an integer or fractional, but it is 
> expected that the average would be, in general, fractional. But it turns out 
> that if all the values are integer *without* a ".0", the aggregator sums them 
> up as integers and the final division returns an integer too instead of the 
> fractional response expected from a "decimal" value.
> For example:
>  # AVG of {{decimal}} values 1.0 and 2.0 returns 1.5, as expected.
>  # AVG of 1.0 and 2 or 1 and 2.0 also return 1.5.
>  # But AVG of 1 and 2 returns... 1. This is wrong. The user asked for the 
> average to be a "decimal", not a "varint", so there is no reason why it 
> should be rounded up to be an integer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18470) Average of "decimal" values rounds the average if all inputs are integers

2023-04-20 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714573#comment-17714573
 ] 

Benedict Elliott Smith commented on CASSANDRA-18470:


I think this is ambiguous to be honest. In general we have very inadequately 
both _considered_ and _documented_ our behaviour for these kinds of features 
and data types. However, it is not immediately obvious this behaviour is 
_incorrect_ since we do not ask the user to specify a level of precision of the 
output, and since we support arbitrary precision we have to make some decision 
based on the inputs, and in this case neither parameter has any fractional 
component, so the result is rounded to the same.

There's an argument to be made that this is really inappropriate for an 
aggregation, as the order in which values occur in the aggregation affects the 
result. But I think the correct solution is probably to permit a precision to 
be provided with the operator. We could plausibly also pick a default precision 
that is non-zero, though this might constrain the precision below an acceptable 
level for some workloads. We could permit the user to configure a default 
precision for this operator, and/or use the default precision as a lower bound 
only.

> Average of "decimal" values rounds the average if all inputs are integers
> -
>
> Key: CASSANDRA-18470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18470
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> When running the AVG aggregator on "decimal" values, each value is an 
> arbitrary-precision number which may be an integer or fractional, but it is 
> expected that the average would be, in general, fractional. But it turns out 
> that if all the values are integer *without* a ".0", the aggregator sums them 
> up as integers and the final division returns an integer too instead of the 
> fractional response expected from a "decimal" value.
> For example:
>  # AVG of {{decimal}} values 1.0 and 2.0 returns 1.5, as expected.
>  # AVG of 1.0 and 2 or 1 and 2.0 also return 1.5.
>  # But AVG of 1 and 2 returns... 1. This is wrong. The user asked for the 
> average to be a "decimal", not a "varint", so there is no reason why it 
> should be rounded up to be an integer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-18466) Paxos only repair is treated as an incremental repair

2023-04-20 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-18466:
---
Complexity: Low Hanging Fruit  (was: Normal)

> Paxos only repair is treated as an incremental repair
> -
>
> Key: CASSANDRA-18466
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18466
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Andrew
>Priority: Normal
>  Labels: lhf
> Fix For: 4.1.x, 5.x
>
>
> Paxos only repair tries to continue or is treated as an incremental repair. 
> This happened on 4.1.0 and 4.1.1 when trying to run repair in preparation for 
> enabling paxos_state_purging. The repair was in preparation mode triggered 
> multiple anti-compactions on the nodes. Running the command with --full 
> behaves in the expected way, ie. only the paxos data is repaired and it's 
> finished within a few seconds.
> {code:java}
> nodetool repair --paxos-only // This does not behave as expected, does it 
> complete quickly and seems to be waiting on anticompactions
> {code}
> {code:java}
> nodetool repair --full --paxos-only // Completes within a few seconds as 
> expected
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-18466) Paxos only repair is treated as an incremental repair

2023-04-20 Thread Benedict Elliott Smith (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-18466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict Elliott Smith updated CASSANDRA-18466:
---
Labels: lhf  (was: )

> Paxos only repair is treated as an incremental repair
> -
>
> Key: CASSANDRA-18466
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18466
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Andrew
>Priority: Normal
>  Labels: lhf
> Fix For: 4.1.x, 5.x
>
>
> Paxos only repair tries to continue or is treated as an incremental repair. 
> This happened on 4.1.0 and 4.1.1 when trying to run repair in preparation for 
> enabling paxos_state_purging. The repair was in preparation mode triggered 
> multiple anti-compactions on the nodes. Running the command with --full 
> behaves in the expected way, ie. only the paxos data is repaired and it's 
> finished within a few seconds.
> {code:java}
> nodetool repair --paxos-only // This does not behave as expected, does it 
> complete quickly and seems to be waiting on anticompactions
> {code}
> {code:java}
> nodetool repair --full --paxos-only // Completes within a few seconds as 
> expected
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18466) Paxos only repair is treated as an incremental repair

2023-04-20 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714491#comment-17714491
 ] 

Benedict Elliott Smith commented on CASSANDRA-18466:


[~maxwellguo] yes, and for paxos-only repairs this should not really happen - 
since it's not really doing a regular repair at all, and incremental repairs 
bring in a lot of baggage for clusters that haven't run them yet.

> Paxos only repair is treated as an incremental repair
> -
>
> Key: CASSANDRA-18466
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18466
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Andrew
>Priority: Normal
> Fix For: 4.1.x, 5.x
>
>
> Paxos only repair tries to continue or is treated as an incremental repair. 
> This happened on 4.1.0 and 4.1.1 when trying to run repair in preparation for 
> enabling paxos_state_purging. The repair was in preparation mode triggered 
> multiple anti-compactions on the nodes. Running the command with --full 
> behaves in the expected way, ie. only the paxos data is repaired and it's 
> finished within a few seconds.
> {code:java}
> nodetool repair --paxos-only // This does not behave as expected, does it 
> complete quickly and seems to be waiting on anticompactions
> {code}
> {code:java}
> nodetool repair --full --paxos-only // Completes within a few seconds as 
> expected
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18465) Add support for multiple condition branches and results in Accord transaction

2023-04-19 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713988#comment-17713988
 ] 

Benedict Elliott Smith commented on CASSANDRA-18465:


This was always intended to be the natural evolution of the syntax, so fully 
support this of course.

> Add support for multiple condition branches and results in Accord transaction
> -
>
> Key: CASSANDRA-18465
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18465
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Accord, CQL/Syntax
>Reporter: Jacek Lewandowski
>Priority: Normal
>
> I'd like to propose adding support for multiple branches and result sets for 
> Accord transactions. It could look like this:
> {code:sql}
> BEGIN TRANSACTION
>   LET a = ...
>   LET b = ...
>   IF condition THEN
> SELECT 'one', a.value
>     UPDATE ...
>   ELSE IF condition2 THEN
> SELECT 'two', b.value
> UPDATE ...
>   ELSE
> SELECT 'three', NULL
>   END IF
> COMMIT TRANSACTION
> {code}
> The existing syntax would remain valid, when a single SELECT is defined in 
> which case the conditional SELECTs would not be valid. 
> SELECTs would be validated to return columns of the same type. They would be 
> able to return literals as well.
> This would be make the result of the transaction more intuitive as the client 
> would know explicitly if the updates where applied or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-18433) Row cache inconsistency issue: A read can put stale data into row cache in a race condition

2023-04-08 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709903#comment-17709903
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-18433 at 4/8/23 4:41 PM:


Hmm, the semantics of {{replace}} _should_ only update if the sentinel is still 
present. However, the {{OHCacheAdapter}} appears to invoke {{addOrReplace}}. It 
appears this bug has existed since it was introduced, and there does not appear 
to be an equivalent {{replace}} method in the underlying implementation. 
Unfortunately, the row cache is not a widely used facility anymore, at least 
amongst the contributor-base, so it has not benefitted from the push for 
improved quality in the project

I would suggest trying to swap the underlying cache implementation by setting 
{{row_cache_class_name}} in your yaml to 
"org.apache.cassandra.cache.CaffeineCache" - though this will have very 
different heap behaviour, the cache implementation itself is very good. Or, I 
would consider disabling the row cache.

Fixing the existing implementation may take some time, as I don't know if 
OHCache is actively maintained any longer.


was (Author: benedict):
Hmm, the semantics of {{replace}} _should_ only update if the sentinel is still 
present. However, the {{OHCacheAdapter}} appears to invoke {{addOrReplace}}. It 
appears this bug has existed since it was introduced, and there does not appear 
to be an equivalent {{replace}} method in the underlying implementation. 
Unfortunately, the row cache is not a widely used facility anymore, so it has 
not benefitted from the push for improved quality in the project

I would suggest trying to swap the underlying cache implementation by setting 
{{row_cache_class_name}} in your yaml to 
"org.apache.cassandra.cache.CaffeineCache" - though this will have very 
different heap behaviour, the cache implementation itself is very good. Or, I 
would consider disabling the row cache.

Fixing the existing implementation may take some time, as I don't know if 
OHCache is actively maintained any longer.

> Row cache inconsistency issue: A read can put stale data into row cache in a 
> race condition
> ---
>
> Key: CASSANDRA-18433
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18433
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Huapeng Yuan
>Priority: Normal
> Fix For: 3.11.x
>
>
> We found the issue in our production system which has the version 3.11.6.  
> When we did an update and then read immediately after update successfully, we 
> may read the stale data sometimes.  Same issue for writeAll + readOne 
> consistency and writeQuorm+readQuorum. The issue is gone once we disabled the 
> row cache.
> The config for row cache: 
> caching = \{'keys': 'ALL', 'rows_per_partition': 'ALL'}
>  
> After some investigations, we think there is a race condition during 
> read/write path. Problems:
> When two threads are reading and writing the same partition (for example, two 
> rows with same partition key) at same time, the read thread may load the 
> stale data into row cache for the row which is being updated.
> {{}}
> {panel:title=The steps of write-thread inserting a row to partition p}
> {{W-Step }}{{{}1{}}}{{{}: inserts the value v1 to memtable.{}}}
> {{W-Step }}{{{}2{}}}{{{}: invalidates the row cache using partition key.{}}}
> {panel}
> {{}}
> {panel:title=The steps of read-thread reading a row from partition p}
> {{R-Step }}{{{}1{}}}{{{}: Checks row cache and finds whether the row is not 
> present in cache. If not, goes to '{}}}{{{}R-Step {}}}{{{}2'{}}}{{{}.{}}}
> {{R-Step }}{{{}2{}}}{{{}: Insert a sentinel (timestamp) as the row value into 
> row cache to tell other read threads should skip the row cache.{}}}
> {{R-Step }}{{{}3{}}}{{{}: Read from storage layer and get value v0 which can 
> be older than v1.{}}}
> {{R-Step }}{{{}4{}}}{{{}: Insert v0 to row cache {}}}{{for}} {{the row by 
> checking }}{{if}} {{the row doesn't exist or it has the same sentinel. *The 
> inconsistency is caused by this step. Should not insert the stale value if 
> the sentinel doesn't exist in row cache any more.*}}
> {panel}
> {{}}
> {panel:title=The sequence to reproduce the issue}
> {{R-Step }}{{1}}
> {{R-Step }}{{2}}
> {{R-Step }}{{3}}
> {{W-Step }}{{1}}
> {{W-Step }}{{2}}
> {{R-Step }}{{4}}
> {panel}
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18433) Row cache inconsistency issue: A read can put stale data into row cache in a race condition

2023-04-08 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709903#comment-17709903
 ] 

Benedict Elliott Smith commented on CASSANDRA-18433:


Hmm, the semantics of {{replace}} _should_ only update if the sentinel is still 
present. However, the {{OHCacheAdapter}} appears to invoke {{addOrReplace}}. It 
appears this bug has existed since it was introduced, and there does not appear 
to be an equivalent {{replace}} method in the underlying implementation. 
Unfortunately, the row cache is not a widely used facility anymore, so it has 
not benefitted from the push for improved quality in the project

I would suggest trying to swap the underlying cache implementation by setting 
{{row_cache_class_name}} in your yaml to 
"org.apache.cassandra.cache.CaffeineCache" - though this will have very 
different heap behaviour, the cache implementation itself is very good. Or, I 
would consider disabling the row cache.

Fixing the existing implementation may take some time, as I don't know if 
OHCache is actively maintained any longer.

> Row cache inconsistency issue: A read can put stale data into row cache in a 
> race condition
> ---
>
> Key: CASSANDRA-18433
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18433
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Huapeng Yuan
>Priority: Normal
> Fix For: 3.11.x
>
>
> We found the issue in our production system which has the version 3.11.6.  
> When we did an update and then read immediately after update successfully, we 
> may read the stale data sometimes.  Same issue for writeAll + readOne 
> consistency and writeQuorm+readQuorum. The issue is gone once we disabled the 
> row cache.
> The config for row cache: 
> caching = \{'keys': 'ALL', 'rows_per_partition': 'ALL'}
>  
> After some investigations, we think there is a race condition during 
> read/write path. Problems:
> When two threads are reading and writing the same partition (for example, two 
> rows with same partition key) at same time, the read thread may load the 
> stale data into row cache for the row which is being updated.
> {{}}
> {panel:title=The steps of write-thread inserting a row to partition p}
> {{W-Step }}{{{}1{}}}{{{}: inserts the value v1 to memtable.{}}}
> {{W-Step }}{{{}2{}}}{{{}: invalidates the row cache using partition key.{}}}
> {panel}
> {{}}
> {panel:title=The steps of read-thread reading a row from partition p}
> {{R-Step }}{{{}1{}}}{{{}: Checks row cache and finds whether the row is not 
> present in cache. If not, goes to '{}}}{{{}R-Step {}}}{{{}2'{}}}{{{}.{}}}
> {{R-Step }}{{{}2{}}}{{{}: Insert a sentinel (timestamp) as the row value into 
> row cache to tell other read threads should skip the row cache.{}}}
> {{R-Step }}{{{}3{}}}{{{}: Read from storage layer and get value v0 which can 
> be older than v1.{}}}
> {{R-Step }}{{{}4{}}}{{{}: Insert v0 to row cache {}}}{{for}} {{the row by 
> checking }}{{if}} {{the row doesn't exist or it has the same sentinel. *The 
> inconsistency is caused by this step. Should not insert the stale value if 
> the sentinel doesn't exist in row cache any more.*}}
> {panel}
> {{}}
> {panel:title=The sequence to reproduce the issue}
> {{R-Step }}{{1}}
> {{R-Step }}{{2}}
> {{R-Step }}{{3}}
> {{W-Step }}{{1}}
> {{W-Step }}{{2}}
> {{R-Step }}{{4}}
> {panel}
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1380 matches

Mail list logo