[ceph-users] Re: monitoring apply_latency / commit_latency ?

2023-03-24 Thread Konstantin Shalygin
Hi Matthias,

Prometheus exporter already have all this metrics, you can setup Grafana panels 
as you want
Also, the apply latency in a metric for a pre-bluestore, i.e. filestore
For Bluestore apply latency is the same as commit latency, you can check this 
via `ceph osd perf` command




k

> On 25 Mar 2023, at 00:02, Matthias Ferdinand  wrote:
> 
> Hi,
> 
> I would like to understand how the per-OSD data from "ceph osd perf"
> (i.e.  apply_latency, commit_latency) is generated. So far I couldn't
> find documentation on this. "ceph osd perf" output is nice for a quick
> glimpse, but is not very well suited for graphing. Output values are
> from the most recent 5s-averages apparently.
> 
> With "ceph daemon osd.X perf dump" OTOH, you get quite a lot of latency
> metrics, while it is just not obvious to me how they aggregate into
> apply_latency and commit_latency. Or some comparably easy read latency
> metric (something that is missing completely in "ceph osd perf").
> 
> Can somebody shed some light on this?
> 
> 
> Regards
> Matthias
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitoring apply_latency / commit_latency ?

2023-03-25 Thread Matthias Ferdinand
On Sat, Mar 25, 2023 at 11:09:58AM +0700, Konstantin Shalygin wrote:
> Hi Matthias,
> 
> Prometheus exporter already have all this metrics, you can setup Grafana 
> panels as you want
> Also, the apply latency in a metric for a pre-bluestore, i.e. filestore
> For Bluestore apply latency is the same as commit latency, you can check this 
> via `ceph osd perf` command


Thanks Konstantin,

do I guess right that the metrics shown in your screenshot correspond to
values

  "bluestore.txc_commit_lat.description": "Average commit latency",
  "bluestore.txc_throttle_lat.description": "Average submit throttle latency",
  "bluestore.txc_submit_lat.description": "Average submit latency",
  "bluestore.read_lat.description": "Average read latency",

from "ceph daemon osd.X perf dump"?


And "ceph osd perf" output would correspond to
  "bluestore.txc_commit_lat.description": "Average commit latency",
or
  "filestore.apply_latency.description": "Apply latency",
  "filestore.journal_latency.description": "Average journal queue completing 
latency",
depending on OSD format?

It looks like "read_lat" is Bluestore only, and there is no comparable
value for Filestore.

There are other, format-agnostic OSD latency values:
  "osd.op_r_latency.description": "Latency of read operation (including queue 
time)",
  "osd.op_w_latency.description": "Latency of write operation (including queue 
time)",
  "osd.op_rw_latency.description": "Latency of read-modify-write operation 
(including queue time)",


More guesswork:
  - is osd.op_X_latency about client->OSD command timing?
  - are bluestore/filestore values about OSD->storage op timing?

Please bear with me :-) I just try to get some rough understanding what
the numbers to be collected and graphed actually mean and how they are
related to each other.


Regards
Matthias

> > On 25 Mar 2023, at 00:02, Matthias Ferdinand  wrote:
> > 
> > Hi,
> > 
> > I would like to understand how the per-OSD data from "ceph osd perf"
> > (i.e.  apply_latency, commit_latency) is generated. So far I couldn't
> > find documentation on this. "ceph osd perf" output is nice for a quick
> > glimpse, but is not very well suited for graphing. Output values are
> > from the most recent 5s-averages apparently.
> > 
> > With "ceph daemon osd.X perf dump" OTOH, you get quite a lot of latency
> > metrics, while it is just not obvious to me how they aggregate into
> > apply_latency and commit_latency. Or some comparably easy read latency
> > metric (something that is missing completely in "ceph osd perf").
> > 
> > Can somebody shed some light on this?
> > 
> > 
> > Regards
> > Matthias
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitoring apply_latency / commit_latency ?

2023-03-30 Thread Konstantin Shalygin
Hi,

> On 25 Mar 2023, at 23:15, Matthias Ferdinand  wrote:
> 
> from "ceph daemon osd.X perf dump"?


No, from ceph-mgr prometheus exporter
You can enable it via `ceph mgr module enable prometheus`

> Please bear with me :-) I just try to get some rough understanding what
> the numbers to be collected and graphed actually mean and how they are
> related to each other.

I think you can find metrics descriptions at source of official Grafana 
dashborad [1]


[1] 
https://github.com/ceph/ceph/blob/main/monitoring/ceph-mixin/dashboards_out/osds-overview.json
k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitoring apply_latency / commit_latency ?

2023-04-02 Thread Matthias Ferdinand
On Thu, Mar 30, 2023 at 08:56:06PM +0400, Konstantin Shalygin wrote:
> Hi,
> 
> > On 25 Mar 2023, at 23:15, Matthias Ferdinand  wrote:
> > 
> > from "ceph daemon osd.X perf dump"?
> 
> 
> No, from ceph-mgr prometheus exporter
> You can enable it via `ceph mgr module enable prometheus`

Hi Konstantin,

thanks :-)
I understand that grafana graphs are generated from prometheus metrics.
I just wanted to know which OSD daemon-perf values feed these prometheus
metrics (or if they are generated in some other way).


Output for "ceph daemon osd.X perf dump" is quite large; most of the
time I am just looking for some kind of latency indicator, or checking
if there are "slow" bytes in bluestore OSDs. Most of the output lines
get filtered away immediately by the next grep/jq. Can somebody tell me
if asking often (like every second) for full perf dump output could slow
down the OSD?


Regards
Matthias
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitoring apply_latency / commit_latency ?

2023-04-02 Thread Konstantin Shalygin

Hi,

> On 2 Apr 2023, at 23:14, Matthias Ferdinand  wrote:
> 
> I understand that grafana graphs are generated from prometheus metrics.
> I just wanted to know which OSD daemon-perf values feed these prometheus
> metrics (or if they are generated in some other way).

Yep, this perf metrics is generated in some way 🙂
You can consult with ceph-mgr prometheus module source code [1]


[1] 
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/prometheus/module.py#L1656-L1676
k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io