[
https://issues.apache.org/jira/browse/CASSANDRA-21321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083479#comment-18083479
]
Stefan Miklosovic commented on CASSANDRA-21321:
-----------------------------------------------
There is similar metric already like {{writeLatency}} It is called in
{{ColumnFamilyStore.apply}} like
{code}
metric.writeLatency.addNano(nanoTime() - start);
{code}
This is then used in {{localWriteCount}} in {{TableStatsHolder}} where it calls
{{getCount()}} on {{WriteLatency}}. It is basically what {{Local read count: }}
and {{Local write count: }} in {{nodetool tablestats}} for a specific table
returns. That metric also operations on {{PartitionUpdate}} but not on so
granular level as this newly proposed metric is doing. The new one calls
{{PartitionUpdate.affectedRowCount}} and {{getCount()}} on the existing one
seems to be just a way to get _how many times we did a write_ instead of _how
many rows a particular PU is affecting_ which are two different things.
I think this patch makes sense if I read the situation correctly, but I am not
sure if we should not introduce this new metric to {{nodetool tablestats}} too.
Might come as a follow up though.
> Add RowsRead and RowsMutated counters to TableMetrics for accurate per-table
> row throughput tracking
> ----------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-21321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21321
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Observability/Metrics
> Reporter: Piotr Walczak
> Assignee: Stefan Miklosovic
> Priority: Normal
> Fix For: 7.x
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Cassandra already exposes {{LiveScannedHistogram}} (rows returned per read)
> and {{RowsMutatedPerWriteHistogram}} (rows touched per write -
> https://issues.apache.org/jira/browse/CASSANDRA-21320) at the table level.
> These histograms are valuable for understanding the *distribution* of rows
> per operation, but they are *insufficient for measuring total row throughput*
> over a time window — which is essential for capacity planning and predictions.
> Specifically:
> * A histogram records _how many rows a single operation touched_ (e.g.,
> "this query returned 47 rows").
> * To derive total rows from a histogram you would need to multiply bucket
> midpoints by counts — an approximation that becomes increasingly inaccurate
> for skewed distributions.
> * Histograms also do not give you a simple delta between two scrape
> timestamps, making windowed rate calculations fragile.
> A *monotonically-increasing counter* solves all of these problems: scrape the
> value at {{t1}} and {{{}t2{}}}, subtract, divide by the interval — exact
> rows/sec with no approximation.
>
> Changes can look similar to:
> [https://github.com/apache/cassandra/compare/trunk...pwalczak:cassandra:pwalczak/CASSANDRA-21321?expand=1]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]