[jira] [Comment Edited] (CASSANDRA-19015) Nodetool 'tablestats' formatting uses inconsistent significant digits

Brad Schoening (Jira) Wed, 15 Nov 2023 10:39:13 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786428#comment-17786428
 ]


Brad Schoening edited comment on CASSANDRA-19015 at 11/15/23 6:37 PM:
----------------------------------------------------------------------

The current output is not just too many decimals, it is also an inconsistent 
number of decimal places. The result is neither human nor machine friendly.

For example, 

 
{noformat}
    Read Latency: 8.894851803649942 ms
    Local read latency: 6.857 ms
{noformat}
 

What should be corrected here is:
 * *inconsistent display and rounding* for the same units of measure and same 
scale (i.e., latency)
 * *accuracy* - reporting milliseconds to 15 decimal places is inaccurate 
because (most) CPUs can measure beyond a few nanoseconds.

The display length issues are due to IEEE floating point representation. I can 
see the StatsTable.java uses the double type for latency in MS. 
{code:java}
StatsKeyspace:

    private double totalReadTime;
    private double totalWriteTime;

StatsTable.java

    public double localReadLatencyMs;
    public double localWriteLatencyMs; {code}
but inconsistently applies a *%01.3f* format to it:
{code:java}
TableStatsPrinter.java:

missing format on these doubles in ms:

out.println("\tRead Latency: " + keyspace.readLatency() + " ms");
out.println("\tWrite Latency: " + keyspace.writeLatency() + " ms");

but correctly formats these doubles in ms:

out.printf(indent + "Local read latency: %01.3f ms%n", 
table.localReadLatencyMs); 
  out.printf(indent + "Local write latency: %01.3f ms%n", 
table.localWriteLatencyMs);{code}
 

We should maybe leave the thousands separator for a different Jira and use this 
Jira to focus on displaying floating point numbers with an appropriate and 
consistent number of decimal places.  I see now there is a -H option to display 
bytes in KiB, MiB, etc. In addition there is also a -F format option to display 
results in a machine readable json or yaml.  


was (Author: bschoeni):
The current output is not just too many decimals, it is also an inconsistent 
number of decimal places. The result is neither human nor machine friendly.

For example, 

 
{noformat}
    Read Latency: 8.894851803649942 ms
    Local read latency: 6.857 ms
{noformat}
 

What should be corrected here is:
 * *inconsistent display and rounding* for the same units of measure and same 
scale (i.e., latency)
 * *accuracy* - reporting milliseconds to 15 decimal places is inaccurate 
because (most) CPUs can measure beyond a few nanoseconds.

The display length issues are due to IEEE floating point representation. I can 
see the StatsTable.java uses the double type for latency in MS. 
{code:java}
StatsKeyspace:

    private double totalReadTime;
    private double totalWriteTime;

StatsTable.java

    public double localReadLatencyMs;
    public double localWriteLatencyMs; {code}
but inconsistently applies a format to it:
{code:java}
TableStatsPrinter.java:

missing format on these doubles in ms:

out.println("\tRead Latency: " + keyspace.readLatency() + " ms");
out.println("\tWrite Latency: " + keyspace.writeLatency() + " ms");

but correctly formats these doubles in ms:

out.printf(indent + "Local read latency: %01.3f ms%n", 
table.localReadLatencyMs); 
  out.printf(indent + "Local write latency: %01.3f ms%n", 
table.localWriteLatencyMs);{code}
 

We should maybe leave the thousands separator for a different Jira and use this 
Jira to focus on displaying floating point numbers with an appropriate and 
consistent number of decimal places.  I see now there is a -H option to display 
bytes in KiB, MiB, etc. In addition there is also a -F format option to display 
results in a machine readable json or yaml.  

> Nodetool 'tablestats' formatting uses inconsistent significant digits
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-19015
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19015
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tool/nodetool
>            Reporter: Brad Schoening
>            Assignee: Leo
>            Priority: Low
>
> Nodetool reports milliseconds (ms) with anywhere from 3 to 15 significant 
> digits.  Ratios use five or sixteen decimal places.  Averages use 1 or 13 
> decimal places.
>  * milliseconds should use 3 decimal places 
>  * ratios should use 3 decimal places (tenths of a percent)
>  * averages should use 1 or 2
> For readability, it would be helpful if large integers had comma separators.  
> I.e., space used: as 1,463,210,998,523 and/or in GiB/MiB/KiB.  It's unclear 
> if the exact disk size is somehow useful, as it may change minute-by-minute, 
> if not, rounding would be best, or displaying both   Space used (live): 
> 1,463,210,998,523 (1,463GiB)
> Total number of tables: 83
> ----------------
> Keyspace : X
>     Read Count: 1007337271
>     Read Latency: 8.485891803649942 ms
>     Write Count: 67550181
>     Write Latency: 0.02556443163342523 ms
>     Pending Flushes: 0
>         Table: Y
>         SSTable count: 7183
>         Old SSTable count: 0
>         SSTables in each level: [0, 9, 92, 754, 6328, 0, 0, 0, 0]
>         Space used (live): 1463210998523
>         Space used (total): 1463210998523
>         Space used by snapshots (total): 0
>         Off heap memory used (total): 607419608
>         SSTable Compression Ratio: 0.3146620992793412
>         Number of partitions (estimate): 24784137
>         Memtable cell count: 106067
>         Memtable data size: 248539982
>         Memtable off heap memory used: 0
>         Memtable switch count: 256
>         Local read count: 865440924
>         Local read latency: 6.857 ms
>         Local write count: 13881409
>         Local write latency: 0.037 ms
>         Pending flushes: 0
>         Percent repaired: 0.0
>         Bytes repaired: 0.000KiB
>         Bytes unrepaired: 4315.386GiB
>         Bytes pending repair: 0.000KiB
>         Bloom filter false positives: 11027855
>         Bloom filter false ratio: 0.01099
>         Bloom filter space used: 33590024
>         Bloom filter off heap memory used: 33532560
>         Index summary off heap memory used: 8174024
>         Compression metadata off heap memory used: 565713024
>         Compacted partition minimum bytes: 36
>         Compacted partition maximum bytes: 17797419593
>         Compacted partition mean bytes: 189740
>         Average live cells per slice (last five minutes): 1443.2146104466253
>         Maximum live cells per slice (last five minutes): 105778
>         Average tombstones per slice (last five minutes): 1.0
>         Maximum tombstones per slice (last five minutes): 1
>         Dropped Mutations: 0
>         Droppable tombstone ratio: 0.00000



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-19015) Nodetool 'tablestats' formatting uses inconsistent significant digits

Reply via email to