[jira] [Commented] (CASSANDRA-18580) Baseline Metrics for Accord Transactions

Jacek Lewandowski (Jira) Mon, 07 Aug 2023 09:56:09 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751746#comment-17751746
 ]


Jacek Lewandowski commented on CASSANDRA-18580:
-----------------------------------------------

thanks [~maedhroz], so far I have those:

measured on a replica:
- time to commit
- time to execute
- time to apply
- application time (how long did writes take) 
- partial deps histogram

measured on a coordinator:
- dependencies histogram
- fast path meter
- slow path meter
- preempted transactions meter
- timeouts meter
- time to recover (which is a timer + meter, so the number )
- something computed as current timestamp measured after the recovery minus 
ballot as a timestamp (though need to investigate if it makes any sense)

all of that is separated for read-only and rw transactions (based on flags in 
TxnId)

I'm happy to add the missing one: side of the progress log and size of commands 
for key when loaded





> Baseline Metrics for Accord Transactions
> ----------------------------------------
>
>                 Key: CASSANDRA-18580
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18580
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Accord, Observability/JMX, Observability/Metrics
>            Reporter: Caleb Rackliffe
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Based on some conversations w/ [~benedict] and [~dcapwell], this is the 
> initial set of metrics that seem both feasible to implement and useful as we 
> monitor the health of a cluster performing Accord transactions:
> 1.) Basic latency metrics for transactions up to the point of COMMIT and rate 
> metrics for preemption, failure, and timeouts at the coordinator.
> This has already been implemented and split into read and write-specific 
> metrics. Our position for now is that metrics around preemption should be 
> useful in place of a more difficult-to-define metric around how many 
> transactions are completed via recovery.
> 2.) Global cache stats/metrics (i.e. aggregated for all command stores)
> We could, at some point, build metrics scoped to a specific {{CommandStore}}, 
> but they might be awkward in MBean/JMX space, as command stores would have to 
> be identified by ID or key range…the latter possibly being able to change 
> across epochs. (An alternative would be just publishing command 
> store-specific stats on-demand to a virtual table instead.)
> 3.) Something like a decaying histogram of the number of dependencies per 
> transaction (or per partial transaction).
> If this is getting worse over time, it could be useful to know/be a way for 
> us to detect that contention is increasing. We should be able to hook this up 
> to {{ProgressLog}} notifications. Recording for PartialDeps/PartialTxn (which 
> ProgressLog gives us at pre-accept) seems acceptable, given this is a 
> directional metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18580) Baseline Metrics for Accord Transactions

Reply via email to