[ 
https://issues.apache.org/jira/browse/CASSANDRA-19770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswin Karthik updated CASSANDRA-19770:
--------------------------------------
    Description: 
Cassandra version: 4.1.5

Since [CASSANDRA-16760|https://issues.apache.org/jira/browse/CASSANDRA-16760] 
and [these 
changes|https://github.com/apache/cassandra/pull/1091/files#diff-07f330b65d5335967ea96f80674b25415c70994d99b97795ed4db696c92b3ff5L532],
 the metric reporter is dividing the microseconds metrics by 10^6 and reporting 
it as  milliseconds unit (it should be divided by 10^3). This means an 
additional division of 10^3 happens causing the metrics to be wrong.

The sample configuration or documentation does not include how to configure the 
metrics reporter to report it correctly.

Steps to reproduce:

Contents of metrics-reporter-config-sample.yaml
{noformat}
console:
  -
    outfile: '/tmp/metrics.out'
    period: 10
    timeunit: 'SECONDS'
    predicate:
      color: "white"
      useQualifiedName: true
      patterns:
        - "^org.apache.cassandra.metrics.ClientRequest.+" # includes 
ClientRequestMetrics
{noformat}

Cassandra started with flag
{noformat}
-Dcassandra.metricsReporterConfigFile=metrics-reporter-config-sample.yaml
{noformat}

Run cassandra-stress to generate load
{noformat}
tools/bin/cassandra-stress write duration=1m cl=ONE -rate threads=1000
{noformat}

Post that
If you check via nodetool
{noformat}
bin/nodetool sjk mxdump -q 
org.apache.cassandra.metrics:type=ClientRequest,scope=Write-ONE,name=Latency

{
  "beans" : [ {
    "name" : 
"org.apache.cassandra.metrics:type=ClientRequest,scope=Write-ONE,name=Latency",
    "modelerType" : 
"org.apache.cassandra.metrics.CassandraMetricsRegistry$JmxTimer",
    "Max" : 654949.0,
    "999thPercentile" : 11864.0,
    "DurationUnit" : "microseconds",
    ....
  } ]
}

{noformat}

The max is 654949.0 micros which  654 millis.

However, the metric reporter emits 0.65 millis because of the division of 
additional 10^3 factor

{noformat}
❯ tail -n100 /tmp/metrics.out | grep -A 20 Latency.Write-ONE
org.apache.cassandra.metrics.ClientRequest.Latency.Write-ONE
            count = 17053398
            max = 0.65 milliseconds
            99.9% <= 0.01 milliseconds
            ...
{noformat}


  was:
Cassandra version: 4.1.5

Since [CASSANDRA-16760|https://issues.apache.org/jira/browse/CASSANDRA-16760] 
and [these 
changes|https://github.com/apache/cassandra/pull/1091/files#diff-07f330b65d5335967ea96f80674b25415c70994d99b97795ed4db696c92b3ff5L532],
the metric reporter is dividing the microseconds metrics by 10^6 and reporting 
it as  milliseconds unit (it should be divided by 10^3). This means an 
additional division of 10^3 happens causing the metrics to be wrong.

The sample configuration or documentation does not include how to configure the 
metrics reporter to report it correctly.

Steps to reproduce:

Contents of metrics-reporter-config-sample.yaml
{noformat}
console:
  -
    outfile: '/tmp/metrics.out'
    period: 10
    timeunit: 'SECONDS'
    predicate:
      color: "white"
      useQualifiedName: true
      patterns:
        - "^org.apache.cassandra.metrics.ClientRequest.+" # includes 
ClientRequestMetrics
{noformat}

Cassandra started with flag
{noformat}
-Dcassandra.metricsReporterConfigFile=metrics-reporter-config-sample.yaml
{noformat}

Run cassandra-stress to generate load
{noformat}
tools/bin/cassandra-stress write duration=1m cl=ONE -rate threads=1000
{noformat}

Post that
If you check via nodetool
{noformat}
bin/nodetool sjk mxdump -q 
org.apache.cassandra.metrics:type=ClientRequest,scope=Write-ONE,name=Latency

{
  "beans" : [ {
    "name" : 
"org.apache.cassandra.metrics:type=ClientRequest,scope=Write-ONE,name=Latency",
    "modelerType" : 
"org.apache.cassandra.metrics.CassandraMetricsRegistry$JmxTimer",
    "Max" : 654949.0,
    "999thPercentile" : 11864.0,
    "DurationUnit" : "microseconds",
    ....
  } ]
}

{noformat}

The max is 654949.0 micros which  654 millis.

However, the metric reporter emits 0.65 millis because of the division of 
additional 10^3 factor

{noformat}
❯ tail -n100 /tmp/metrics.out | grep -A 20 Latency.Write-ONE
org.apache.cassandra.metrics.ClientRequest.Latency.Write-ONE
            count = 17053398
            max = 0.65 milliseconds
            99.9% <= 0.01 milliseconds
            ...
{noformat}



> Incorrect latency metrics reported by metric-reporter
> -----------------------------------------------------
>
>                 Key: CASSANDRA-19770
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19770
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Observability/Metrics
>            Reporter: Aswin Karthik
>            Priority: Normal
>
> Cassandra version: 4.1.5
> Since [CASSANDRA-16760|https://issues.apache.org/jira/browse/CASSANDRA-16760] 
> and [these 
> changes|https://github.com/apache/cassandra/pull/1091/files#diff-07f330b65d5335967ea96f80674b25415c70994d99b97795ed4db696c92b3ff5L532],
>  the metric reporter is dividing the microseconds metrics by 10^6 and 
> reporting it as  milliseconds unit (it should be divided by 10^3). This means 
> an additional division of 10^3 happens causing the metrics to be wrong.
> The sample configuration or documentation does not include how to configure 
> the metrics reporter to report it correctly.
> Steps to reproduce:
> Contents of metrics-reporter-config-sample.yaml
> {noformat}
> console:
>   -
>     outfile: '/tmp/metrics.out'
>     period: 10
>     timeunit: 'SECONDS'
>     predicate:
>       color: "white"
>       useQualifiedName: true
>       patterns:
>         - "^org.apache.cassandra.metrics.ClientRequest.+" # includes 
> ClientRequestMetrics
> {noformat}
> Cassandra started with flag
> {noformat}
> -Dcassandra.metricsReporterConfigFile=metrics-reporter-config-sample.yaml
> {noformat}
> Run cassandra-stress to generate load
> {noformat}
> tools/bin/cassandra-stress write duration=1m cl=ONE -rate threads=1000
> {noformat}
> Post that
> If you check via nodetool
> {noformat}
> bin/nodetool sjk mxdump -q 
> org.apache.cassandra.metrics:type=ClientRequest,scope=Write-ONE,name=Latency
> {
>   "beans" : [ {
>     "name" : 
> "org.apache.cassandra.metrics:type=ClientRequest,scope=Write-ONE,name=Latency",
>     "modelerType" : 
> "org.apache.cassandra.metrics.CassandraMetricsRegistry$JmxTimer",
>     "Max" : 654949.0,
>     "999thPercentile" : 11864.0,
>     "DurationUnit" : "microseconds",
>     ....
>   } ]
> }
> {noformat}
> The max is 654949.0 micros which  654 millis.
> However, the metric reporter emits 0.65 millis because of the division of 
> additional 10^3 factor
> {noformat}
> ❯ tail -n100 /tmp/metrics.out | grep -A 20 Latency.Write-ONE
> org.apache.cassandra.metrics.ClientRequest.Latency.Write-ONE
>             count = 17053398
>             max = 0.65 milliseconds
>             99.9% <= 0.01 milliseconds
>             ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to