[ 
https://issues.apache.org/jira/browse/CASSANDRA-21321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084333#comment-18084333
 ] 

Dmitry Konstantinov commented on CASSANDRA-21321:
-------------------------------------------------

yes, in case of batches we may have multiple rows as a part of PartitionUpdate 
and the cost of this operation can be quite different, so it makes sense to 
count number of rows additionally to the number of operations

> Add RowsRead and RowsMutated counters to TableMetrics for accurate per-table 
> row throughput tracking
> ----------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21321
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21321
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Observability/Metrics
>            Reporter: Piotr Walczak
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 7.x
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Cassandra already exposes {{LiveScannedHistogram}} (rows returned per read) 
> and {{RowsMutatedPerWriteHistogram}} (rows touched per write - 
> https://issues.apache.org/jira/browse/CASSANDRA-21320) at the table level. 
> These histograms are valuable for understanding the *distribution* of rows 
> per operation, but they are *insufficient for measuring total row throughput* 
> over a time window — which is essential for capacity planning and predictions.
> Specifically:
>  * A histogram records _how many rows a single operation touched_ (e.g., 
> "this query returned 47 rows").
>  * To derive total rows from a histogram you would need to multiply bucket 
> midpoints by counts — an approximation that becomes increasingly inaccurate 
> for skewed distributions.
>  * Histograms also do not give you a simple delta between two scrape 
> timestamps, making windowed rate calculations fragile.
> A *monotonically-increasing counter* solves all of these problems: scrape the 
> value at {{t1}} and {{{}t2{}}}, subtract, divide by the interval — exact 
> rows/sec with no approximation.
>  
> Changes can look similar to: 
> [https://github.com/apache/cassandra/compare/trunk...pwalczak:cassandra:pwalczak/CASSANDRA-21321?expand=1]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to