Collecting Latency Metrics
Hi All, I'm creating a dashboard that should collect read/write latency metrics on C* 3.x. In older versions (e.g. 2.0) I used to divide the total read latency in microseconds with the read count. Is there a metric attribute that shows read/write latency without the need to do the math, such as in nodetool tablestats "Local read latency" output? I saw there's a Mean attribute in org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right one. I'd really appreciate your help on this one. Thanks!
Re: Collecting Latency Metrics
To answer your question org.apache.cassandra.metrics:type=Table,name=ReadTotalLatency can give you the total local read latency in microseconds and you can get the count from the Latency read metric. If you are going to do that be sure to do it on the delta from previous query (new - last) for both total latency and counter or else you will slowly converge to a global average that will almost never change as the quantity of reads simply removes outliers. The mean attribute of the Latency metric you mentioned will give you an approximation for this actually as its taking the total/count of a decaying histogram of the latencies. It will however be even less accurate than using the deltas since the bounds of the decaying wont necessarily match up with your reading intervals and histogram introduces a worst case 20% round up. Even with using deltas though this will hide outliers, you could end up with really bad queries that don't even show up as a tick on your graph (although *generally* it will). Chris On Wed, May 29, 2019 at 9:32 AM shalom sagges wrote: > Hi All, > > I'm creating a dashboard that should collect read/write latency metrics on > C* 3.x. > In older versions (e.g. 2.0) I used to divide the total read latency in > microseconds with the read count. > > Is there a metric attribute that shows read/write latency without the need > to do the math, such as in nodetool tablestats "Local read latency" output? > I saw there's a Mean attribute in org.apache.cassandra.metrics.ReadLatency > but I'm not sure this is the right one. > > I'd really appreciate your help on this one. > Thanks! > > >
Re: Collecting Latency Metrics
Hello, This metric is available indeed: Most of the metrics available are documented here: http://cassandra.apache.org/doc/latest/operating/metrics.html For client requests (coordinator perspective latency): http://cassandra.apache.org/doc/latest/operating/metrics.html#client-request-metrics For local requests (per table/host latency, locally, no network communication included): http://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics LatencySpecial type that tracks latency (in microseconds) with a Timer plus > a Counter that tracks the total latency accrued since starting. The > former is useful if you track the change in total latency since the last > check. Each metric name of this type will have ‘Latency’ and ‘TotalLatency’ > appended to it. You need 'Latency', not 'TotalLatency'. I would guess that's the issue because latencies are available for as far as I remember (including C*2.0, 1.2 for sure :)). Also, be aware that quite a few things changed in the metric structure between C* 2.1 and C*2.2 (and C*3.0 is similar to C*2.2). Examples of changes: - ColumnFamily --> Table - 99percentile --> p99 - 1MinuteRate --> m1_rate - metric name before KS and Table names and some other changes of this kind. - ^ aggregations / aliases and indexes changed because of this ^ - breaking most of the charts (in my case at least). - ‘.value’ is not appended to the metric name anymore for gauges, nothing instead. For example (Grafana / Graphite): From ```aliasByNode(averageSeriesWithWildcards(cassandra.$env.$dc.$host.org.apache.cassandra.metrics.ColumnFamily.$ks.$table.ReadLatency.95percentile, 2, 3), 1, 7, 8, 9)``` to ```aliasByNode(averageSeriesWithWildcards(cassandra.$env.$dc.$host.org.apache.cassandra.metrics.Table.ReadLatency.$ks.$table.p95, 2, 3), 1, 8, 9, 10)``` Another tip, is to use ccm locally (https://github.com/riptano/ccm) for example and 'jconsole $cassandra_pid'. I use this -->jconsole $(ccm node1 show | grep pid | awk -F= '{print $2}') Once you're in, you can explore available mbeans and find the metrics available in 'org.apache.cassandra.[...]'. It's not ideal as you search 'manually' but it allowed me to find some metrics in the past or fix issues from the doc above. Out of curiosity, may I ask what backend you used for your monitoring? C*heers, --- Alain Rodriguez - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com Le mer. 29 mai 2019 à 15:32, shalom sagges a écrit : > Hi All, > > I'm creating a dashboard that should collect read/write latency metrics on > C* 3.x. > In older versions (e.g. 2.0) I used to divide the total read latency in > microseconds with the read count. > > Is there a metric attribute that shows read/write latency without the need > to do the math, such as in nodetool tablestats "Local read latency" output? > I saw there's a Mean attribute in org.apache.cassandra.metrics.ReadLatency > but I'm not sure this is the right one. > > I'd really appreciate your help on this one. > Thanks! > > >
Re: Collecting Latency Metrics
There are various attributes under org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the latency in milliseconds Thanks Paul www.redshots.com > On 29 May 2019, at 15:31, shalom sagges wrote: > > Hi All, > > I'm creating a dashboard that should collect read/write latency metrics on C* > 3.x. > In older versions (e.g. 2.0) I used to divide the total read latency in > microseconds with the read count. > > Is there a metric attribute that shows read/write latency without the need to > do the math, such as in nodetool tablestats "Local read latency" output? > I saw there's a Mean attribute in org.apache.cassandra.metrics.ReadLatency > but I'm not sure this is the right one. > > I'd really appreciate your help on this one. > Thanks! > > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Collecting Latency Metrics
If I only send ReadTotalLatency to Graphite/Grafana, can I run an average on it and use "scale to seconds=1" ? Will that do the trick? Thanks! On Wed, May 29, 2019 at 5:31 PM shalom sagges wrote: > Hi All, > > I'm creating a dashboard that should collect read/write latency metrics on > C* 3.x. > In older versions (e.g. 2.0) I used to divide the total read latency in > microseconds with the read count. > > Is there a metric attribute that shows read/write latency without the need > to do the math, such as in nodetool tablestats "Local read latency" output? > I saw there's a Mean attribute in org.apache.cassandra.metrics.ReadLatency > but I'm not sure this is the right one. > > I'd really appreciate your help on this one. > Thanks! > > >
Re: Collecting Latency Metrics
Thanks for your replies guys. I really appreciate it. @Alain, I use Graphite for backend on top of Grafana. But the goal is to move from Graphite to Prometheus eventually. I tried to find a direct way of getting a specific Latency metric in average and as Chris pointed out, then Mean value isn't that accurate. I do not wish to use the percentile metrics either, but a single latency metric like the *"Local read latency" *output in nodetool tablestats. Looking at the code of nodetool tablestats, it seems that C* also divides *ReadTotalLatency.Count* with *ReadLatency.Count *to get the latency result. So I guess I will have no choice but to run the calculation on my own via Graphite: divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count Does this seem right to you? Thanks! On Thu, May 30, 2019 at 12:34 AM Paul Chandler wrote: > There are various attributes under > org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the > latency in milliseconds > > Thanks > > Paul > www.redshots.com > > > On 29 May 2019, at 15:31, shalom sagges wrote: > > > > Hi All, > > > > I'm creating a dashboard that should collect read/write latency metrics > on C* 3.x. > > In older versions (e.g. 2.0) I used to divide the total read latency in > microseconds with the read count. > > > > Is there a metric attribute that shows read/write latency without the > need to do the math, such as in nodetool tablestats "Local read latency" > output? > > I saw there's a Mean attribute in > org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right > one. > > > > I'd really appreciate your help on this one. > > Thanks! > > > > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
Re: Collecting Latency Metrics
Sorry for the duplicated emails but I just want to make sure I'm doing it correctly: To summarize, are both ways accurate or one is better than the other? divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count OR alias(scaleToSeconds(averageSeriesWithWildcards(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count),7,8,9),1),'test') WDYT? On Thu, May 30, 2019 at 2:29 PM shalom sagges wrote: > Thanks for your replies guys. I really appreciate it. > > @Alain, I use Graphite for backend on top of Grafana. But the goal is to > move from Graphite to Prometheus eventually. > > I tried to find a direct way of getting a specific Latency metric in > average and as Chris pointed out, then Mean value isn't that accurate. > I do not wish to use the percentile metrics either, but a single latency > metric like the *"Local read latency" *output in nodetool tablestats. > Looking at the code of nodetool tablestats, it seems that C* also divides > *ReadTotalLatency.Count* with *ReadLatency.Count *to get the latency > result. > > So I guess I will have no choice but to run the calculation on my own via > Graphite: > > divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count > > Does this seem right to you? > > Thanks! > > On Thu, May 30, 2019 at 12:34 AM Paul Chandler wrote: > >> There are various attributes under >> org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the >> latency in milliseconds >> >> Thanks >> >> Paul >> www.redshots.com >> >> > On 29 May 2019, at 15:31, shalom sagges wrote: >> > >> > Hi All, >> > >> > I'm creating a dashboard that should collect read/write latency metrics >> on C* 3.x. >> > In older versions (e.g. 2.0) I used to divide the total read latency in >> microseconds with the read count. >> > >> > Is there a metric attribute that shows read/write latency without the >> need to do the math, such as in nodetool tablestats "Local read latency" >> output? >> > I saw there's a Mean attribute in >> org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right >> one. >> > >> > I'd really appreciate your help on this one. >> > Thanks! >> > >> > >> >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >>
Re: Collecting Latency Metrics
> > org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the > latency in milliseconds > Its actually in microseconds, unless calling the values() operation which gives the histogram in nanoseconds On Wed, May 29, 2019 at 4:34 PM Paul Chandler wrote: > There are various attributes under > org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the > latency in milliseconds > > Thanks > > Paul > www.redshots.com > > > On 29 May 2019, at 15:31, shalom sagges wrote: > > > > Hi All, > > > > I'm creating a dashboard that should collect read/write latency metrics > on C* 3.x. > > In older versions (e.g. 2.0) I used to divide the total read latency in > microseconds with the read count. > > > > Is there a metric attribute that shows read/write latency without the > need to do the math, such as in nodetool tablestats "Local read latency" > output? > > I saw there's a Mean attribute in > org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right > one. > > > > I'd really appreciate your help on this one. > > Thanks! > > > > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
Re: Collecting Latency Metrics
For what it is worth, generally I would recommend just using the mean vs calculating it yourself. It's a lot easier and averages are meaningless for anything besides trending anyway (which is really what this is useful for, finding issues on the larger scale), especially with high volume clusters so the loss in accuracy kinda moot. Your average for local reads/writes will almost always be sub millisecond but you might end up having 500 millisecond requests or worse that the mean will hide. Chris On Thu, May 30, 2019 at 6:30 AM shalom sagges wrote: > Thanks for your replies guys. I really appreciate it. > > @Alain, I use Graphite for backend on top of Grafana. But the goal is to > move from Graphite to Prometheus eventually. > > I tried to find a direct way of getting a specific Latency metric in > average and as Chris pointed out, then Mean value isn't that accurate. > I do not wish to use the percentile metrics either, but a single latency > metric like the *"Local read latency" *output in nodetool tablestats. > Looking at the code of nodetool tablestats, it seems that C* also divides > *ReadTotalLatency.Count* with *ReadLatency.Count *to get the latency > result. > > So I guess I will have no choice but to run the calculation on my own via > Graphite: > > divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count > > Does this seem right to you? > > Thanks! > > On Thu, May 30, 2019 at 12:34 AM Paul Chandler wrote: > >> There are various attributes under >> org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the >> latency in milliseconds >> >> Thanks >> >> Paul >> www.redshots.com >> >> > On 29 May 2019, at 15:31, shalom sagges wrote: >> > >> > Hi All, >> > >> > I'm creating a dashboard that should collect read/write latency metrics >> on C* 3.x. >> > In older versions (e.g. 2.0) I used to divide the total read latency in >> microseconds with the read count. >> > >> > Is there a metric attribute that shows read/write latency without the >> need to do the math, such as in nodetool tablestats "Local read latency" >> output? >> > I saw there's a Mean attribute in >> org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right >> one. >> > >> > I'd really appreciate your help on this one. >> > Thanks! >> > >> > >> >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >>
Re: Collecting Latency Metrics
Yep. I would *never* use mean when it comes to performance to make any sort of decisions. I prefer to graph all the p99 latencies as well as the max. Some good reading on the topic: https://bravenewgeek.com/everything-you-know-about-latency-is-wrong/ On Thu, May 30, 2019 at 7:35 AM Chris Lohfink wrote: > For what it is worth, generally I would recommend just using the mean vs > calculating it yourself. It's a lot easier and averages are meaningless for > anything besides trending anyway (which is really what this is useful for, > finding issues on the larger scale), especially with high volume clusters > so the loss in accuracy kinda moot. Your average for local reads/writes > will almost always be sub millisecond but you might end up having 500 > millisecond requests or worse that the mean will hide. > > Chris > > On Thu, May 30, 2019 at 6:30 AM shalom sagges > wrote: > >> Thanks for your replies guys. I really appreciate it. >> >> @Alain, I use Graphite for backend on top of Grafana. But the goal is to >> move from Graphite to Prometheus eventually. >> >> I tried to find a direct way of getting a specific Latency metric in >> average and as Chris pointed out, then Mean value isn't that accurate. >> I do not wish to use the percentile metrics either, but a single latency >> metric like the *"Local read latency" *output in nodetool tablestats. >> Looking at the code of nodetool tablestats, it seems that C* also divides >> *ReadTotalLatency.Count* with *ReadLatency.Count *to get the latency >> result. >> >> So I guess I will have no choice but to run the calculation on my own via >> Graphite: >> >> divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count >> >> Does this seem right to you? >> >> Thanks! >> >> On Thu, May 30, 2019 at 12:34 AM Paul Chandler wrote: >> >>> There are various attributes under >>> org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the >>> latency in milliseconds >>> >>> Thanks >>> >>> Paul >>> www.redshots.com >>> >>> > On 29 May 2019, at 15:31, shalom sagges >>> wrote: >>> > >>> > Hi All, >>> > >>> > I'm creating a dashboard that should collect read/write latency >>> metrics on C* 3.x. >>> > In older versions (e.g. 2.0) I used to divide the total read latency >>> in microseconds with the read count. >>> > >>> > Is there a metric attribute that shows read/write latency without the >>> need to do the math, such as in nodetool tablestats "Local read latency" >>> output? >>> > I saw there's a Mean attribute in >>> org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right >>> one. >>> > >>> > I'd really appreciate your help on this one. >>> > Thanks! >>> > >>> > >>> >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: user-h...@cassandra.apache.org >>> >>>
Re: Collecting Latency Metrics
Thanks a lot for your comments. This mailing list is truly *the *definitive guide to Cassandra *. * The knowledge transferred here is invaluable. So just wanted to give a big shout out to anyone who is helping out here. Regards, On Thu, May 30, 2019 at 6:10 PM Jon Haddad wrote: > Yep. I would *never* use mean when it comes to performance to make any > sort of decisions. I prefer to graph all the p99 latencies as well as the > max. > > Some good reading on the topic: > https://bravenewgeek.com/everything-you-know-about-latency-is-wrong/ > > On Thu, May 30, 2019 at 7:35 AM Chris Lohfink > wrote: > >> For what it is worth, generally I would recommend just using the mean vs >> calculating it yourself. It's a lot easier and averages are meaningless for >> anything besides trending anyway (which is really what this is useful for, >> finding issues on the larger scale), especially with high volume clusters >> so the loss in accuracy kinda moot. Your average for local reads/writes >> will almost always be sub millisecond but you might end up having 500 >> millisecond requests or worse that the mean will hide. >> >> Chris >> >> On Thu, May 30, 2019 at 6:30 AM shalom sagges >> wrote: >> >>> Thanks for your replies guys. I really appreciate it. >>> >>> @Alain, I use Graphite for backend on top of Grafana. But the goal is to >>> move from Graphite to Prometheus eventually. >>> >>> I tried to find a direct way of getting a specific Latency metric in >>> average and as Chris pointed out, then Mean value isn't that accurate. >>> I do not wish to use the percentile metrics either, but a single latency >>> metric like the *"Local read latency" *output in nodetool tablestats. >>> Looking at the code of nodetool tablestats, it seems that C* also >>> divides *ReadTotalLatency.Count* with *ReadLatency.Count *to get the >>> latency result. >>> >>> So I guess I will have no choice but to run the calculation on my own >>> via Graphite: >>> >>> divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count >>> >>> Does this seem right to you? >>> >>> Thanks! >>> >>> On Thu, May 30, 2019 at 12:34 AM Paul Chandler >>> wrote: >>> There are various attributes under org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the latency in milliseconds Thanks Paul www.redshots.com > On 29 May 2019, at 15:31, shalom sagges wrote: > > Hi All, > > I'm creating a dashboard that should collect read/write latency metrics on C* 3.x. > In older versions (e.g. 2.0) I used to divide the total read latency in microseconds with the read count. > > Is there a metric attribute that shows read/write latency without the need to do the math, such as in nodetool tablestats "Local read latency" output? > I saw there's a Mean attribute in org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right one. > > I'd really appreciate your help on this one. > Thanks! > > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org