[ceph-users] Re: 18.2.2 dashboard really messed up.

2024-03-14 Thread Nizamudeen A
Yup, that does look like a huge difference.

@Pedro Gonzalez Gomez  @Aashish Sharma
 @Ankush Behl   Could you guys help
here? Did we miss any fixes for 18.2.2?

Regards,

On Thu, Mar 14, 2024 at 2:17 AM Harry G Coin  wrote:

> Thanks!  Oddly, all the dashboard checks you suggest appear normal, yet
> the result remains broken.
>
> Before I used your instruction about the dashboard, I have this result:
>
> root@noc3:~# ceph dashboard get-prometheus-api-host
> http://noc3.1.quietfountain.com:9095
> root@noc3:~# netstat -6nlp | grep 9095
> tcp6   0  0 :::9095 :::*
>LISTEN  80963/prometheus
> root@noc3:~#
>
> To check it, I tried setting it to something random, the browser aimed at
> the dashboard site reported no connection.  The error message ended when I
> restored the above.  But the graphs remain empty, the numbers 1 and 0.5 on
> each.
>
> Regarding the used storage, notice the overall usage is 43.6 of 111
> TiB.Seems quite a distance from the trigger warning points of 85 and
> 95?  The default values are in use.  All the OSDs are between 37% to 42%
> usage.   What am I missing?
>
> Thanks!
>
>
>
> On 3/12/24 02:07, Nizamudeen A wrote:
>
> Hi,
>
> The warning and danger indicator in the capacity chart points to the
> nearful and full ratio set to the cluster and
> the default values for them are 85% and 95% respectively. You can do a
> `ceph osd dump | grep ratio` and see those.
>
> When this got introduced, there was a blog post
> explaining
> how this is mapped in the chart. But when your used storage
> crosses that 85% mark, the chart is colored with yellow to indicate the
> user, and when it crosses 95% (or the full ratio) the
> chart is colored with red to tell that. But that doesn't mean the cluster
> is in bad shape but its a visual indicator to tell you
> you are running out of storage.
>
> Regarding the Cluster Utilization chart, it gets metrics directly from
> prometheus so that it can be used to show a time-series
> data in UI rather than the metrics at current point in time (which was
> used before). So if you have prometheus configured in
> dashboard and its url is provided in the dashboard settings `ceph
> dashboard set-prometheus-api-host `
> then you should be able to see the metrics.
>
> In case you need to read more about the new page you can check here
> 
> .
>
> Regards,
> Nizam
>
>
>
> On Mon, Mar 11, 2024 at 11:47 PM Harry G Coin  wrote:
>
>> Looking at ceph -s, all is well.  Looking at the dashboard, 85% of my
>> capacity is 'warned', and 95% is 'in danger'.   There is no hint given
>> as to the nature of the danger or reason for the warning.  Though
>> apparently with merely 5% of my ceph world 'normal', the cluster reports
>> 'ok'.  Which, you know, seems contradictory.  I've used just under 40%
>> of capacity.
>>
>> Further down the dashboard, all the subsections of 'Cluster Utilization'
>> are '1' and '0.5' with nothing whatever in the graphics area.
>>
>> Previous versions of ceph presented a normal dashboard.
>>
>> It's just a little half rack, 5 hosts, a few physical drives each, been
>> running ceph for a couple years now.  Orchestrator is cephadm.  It's
>> just about as 'plain vanilla' at it gets.  I've had to mute one alert,
>> because cephadm refresh aborts when it finds drives on any host that
>> have nothing to do with ceph that don't have a blkid_ip 'TYPE' key.
>> Seems unrelated to a totally messed up dashboard.  (The tracker for that
>> is here: https://tracker.ceph.com/issues/63502 ).
>>
>> Any idea what the steps are to get useful stuff back on the dashboard?
>> Any idea where I can learn what my 85% danger and 95% warning is
>> 'about'?  (You'd think 'danger' (The volcano is blowing up now!)  would
>> be worse than 'warning' (the volcano might blow up soon) , so how can
>> warning+danger > 100%, or if not additive how can warning < danger?)
>>
>>   Here's a bit of detail:
>>
>> root@noc1:~# ceph -s
>>   cluster:
>> id: 4067126d-01cb-40af-824a-881c130140f8
>> health: HEALTH_OK
>> (muted: CEPHADM_REFRESH_FAILED)
>>
>>   services:
>> mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 70m)
>> mgr: noc2.yhyuxd(active, since 82m), standbys: noc4.tvhgac,
>> noc3.sybsfb, noc1.jtteqg
>> mds: 1/1 daemons up, 3 standby
>> osd: 27 osds: 27 up (since 20m), 27 in (since 2d)
>>
>>   data:
>> volumes: 1/1 healthy
>> pools:   16 pools, 1809 pgs
>> objects: 12.29M objects, 17 TiB
>> usage:   44 TiB used, 67 TiB / 111 TiB avail
>> pgs: 1793 active+clean
>>  9active+clean+scrubbing
>>  7active+clean+scrubbing+deep
>>
>>   io:
>> client:   5.6 MiB/s rd, 273 KiB/s wr, 41 op/s rd, 58 op/s wr
>>
>> ___
>> ceph-users mailing list -- ceph-users@c

[ceph-users] Re: 18.2.2 dashboard really messed up.

2024-03-13 Thread Harry G Coin
Thanks!  Oddly, all the dashboard checks you suggest appear normal, yet 
the result remains broken.


Before I used your instruction about the dashboard, I have this result:

root@noc3:~# ceph dashboard get-prometheus-api-host
http://noc3.1.quietfountain.com:9095
root@noc3:~# netstat -6nlp | grep 9095
tcp6   0  0 :::9095:::* 
   LISTEN  80963/prometheus

root@noc3:~#

To check it, I tried setting it to something random, the browser aimed 
at the dashboard site reported no connection.  The error message ended 
when I restored the above.  But the graphs remain empty, the numbers 1 
and 0.5 on each.


Regarding the used storage, notice the overall usage is 43.6 of 111 
TiB.    Seems quite a distance from the trigger warning points of 85 and 
95?  The default values are in use.  All the OSDs are between 37% to 42% 
usage.   What am I missing?


Thanks!



On 3/12/24 02:07, Nizamudeen A wrote:

Hi,

The warning and danger indicator in the capacity chart points to the 
nearful and full ratio set to the cluster and
the default values for them are 85% and 95% respectively. You can do a 
`ceph osd dump | grep ratio` and see those.


When this got introduced, there was a blog post 
explaining 
how this is mapped in the chart. But when your used storage
crosses that 85% mark, the chart is colored with yellow to indicate 
the user, and when it crosses 95% (or the full ratio) the
chart is colored with red to tell that. But that doesn't mean the 
cluster is in bad shape but its a visual indicator to tell you

you are running out of storage.

Regarding the Cluster Utilization chart, it gets metrics directly from 
prometheus so that it can be used to show a time-series
data in UI rather than the metrics at current point in time (which was 
used before). So if you have prometheus configured in
dashboard and its url is provided in the dashboard settings `ceph 
dashboard set-prometheus-api-host `

then you should be able to see the metrics.

In case you need to read more about the new page you can check here 
.


Regards,
Nizam



On Mon, Mar 11, 2024 at 11:47 PM Harry G Coin  wrote:

Looking at ceph -s, all is well.  Looking at the dashboard, 85% of my
capacity is 'warned', and 95% is 'in danger'.   There is no hint
given
as to the nature of the danger or reason for the warning. Though
apparently with merely 5% of my ceph world 'normal', the cluster
reports
'ok'.  Which, you know, seems contradictory.  I've used just under
40%
of capacity.

Further down the dashboard, all the subsections of 'Cluster
Utilization'
are '1' and '0.5' with nothing whatever in the graphics area.

Previous versions of ceph presented a normal dashboard.

It's just a little half rack, 5 hosts, a few physical drives each,
been
running ceph for a couple years now.  Orchestrator is cephadm.  It's
just about as 'plain vanilla' at it gets.  I've had to mute one
alert,
because cephadm refresh aborts when it finds drives on any host that
have nothing to do with ceph that don't have a blkid_ip 'TYPE' key.
Seems unrelated to a totally messed up dashboard.  (The tracker
for that
is here: https://tracker.ceph.com/issues/63502 ).

Any idea what the steps are to get useful stuff back on the
dashboard?
Any idea where I can learn what my 85% danger and 95% warning is
'about'?  (You'd think 'danger' (The volcano is blowing up now!) 
would
be worse than 'warning' (the volcano might blow up soon) , so how can
warning+danger > 100%, or if not additive how can warning < danger?)

  Here's a bit of detail:

root@noc1:~# ceph -s
  cluster:
id: 4067126d-01cb-40af-824a-881c130140f8
health: HEALTH_OK
(muted: CEPHADM_REFRESH_FAILED)

  services:
mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 70m)
mgr: noc2.yhyuxd(active, since 82m), standbys: noc4.tvhgac,
noc3.sybsfb, noc1.jtteqg
mds: 1/1 daemons up, 3 standby
osd: 27 osds: 27 up (since 20m), 27 in (since 2d)

  data:
volumes: 1/1 healthy
pools:   16 pools, 1809 pgs
objects: 12.29M objects, 17 TiB
usage:   44 TiB used, 67 TiB / 111 TiB avail
pgs: 1793 active+clean
 9    active+clean+scrubbing
 7    active+clean+scrubbing+deep

  io:
client:   5.6 MiB/s rd, 273 KiB/s wr, 41 op/s rd, 58 op/s wr

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 18.2.2 dashboard really messed up.

2024-03-12 Thread Nizamudeen A
Hi,

The warning and danger indicator in the capacity chart points to the
nearful and full ratio set to the cluster and
the default values for them are 85% and 95% respectively. You can do a
`ceph osd dump | grep ratio` and see those.

When this got introduced, there was a blog post
explaining
how this is mapped in the chart. But when your used storage
crosses that 85% mark, the chart is colored with yellow to indicate the
user, and when it crosses 95% (or the full ratio) the
chart is colored with red to tell that. But that doesn't mean the cluster
is in bad shape but its a visual indicator to tell you
you are running out of storage.

Regarding the Cluster Utilization chart, it gets metrics directly from
prometheus so that it can be used to show a time-series
data in UI rather than the metrics at current point in time (which was used
before). So if you have prometheus configured in
dashboard and its url is provided in the dashboard settings `ceph dashboard
set-prometheus-api-host `
then you should be able to see the metrics.

In case you need to read more about the new page you can check here

.

Regards,
Nizam



On Mon, Mar 11, 2024 at 11:47 PM Harry G Coin  wrote:

> Looking at ceph -s, all is well.  Looking at the dashboard, 85% of my
> capacity is 'warned', and 95% is 'in danger'.   There is no hint given
> as to the nature of the danger or reason for the warning.  Though
> apparently with merely 5% of my ceph world 'normal', the cluster reports
> 'ok'.  Which, you know, seems contradictory.  I've used just under 40%
> of capacity.
>
> Further down the dashboard, all the subsections of 'Cluster Utilization'
> are '1' and '0.5' with nothing whatever in the graphics area.
>
> Previous versions of ceph presented a normal dashboard.
>
> It's just a little half rack, 5 hosts, a few physical drives each, been
> running ceph for a couple years now.  Orchestrator is cephadm.  It's
> just about as 'plain vanilla' at it gets.  I've had to mute one alert,
> because cephadm refresh aborts when it finds drives on any host that
> have nothing to do with ceph that don't have a blkid_ip 'TYPE' key.
> Seems unrelated to a totally messed up dashboard.  (The tracker for that
> is here: https://tracker.ceph.com/issues/63502 ).
>
> Any idea what the steps are to get useful stuff back on the dashboard?
> Any idea where I can learn what my 85% danger and 95% warning is
> 'about'?  (You'd think 'danger' (The volcano is blowing up now!)  would
> be worse than 'warning' (the volcano might blow up soon) , so how can
> warning+danger > 100%, or if not additive how can warning < danger?)
>
>   Here's a bit of detail:
>
> root@noc1:~# ceph -s
>   cluster:
> id: 4067126d-01cb-40af-824a-881c130140f8
> health: HEALTH_OK
> (muted: CEPHADM_REFRESH_FAILED)
>
>   services:
> mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 70m)
> mgr: noc2.yhyuxd(active, since 82m), standbys: noc4.tvhgac,
> noc3.sybsfb, noc1.jtteqg
> mds: 1/1 daemons up, 3 standby
> osd: 27 osds: 27 up (since 20m), 27 in (since 2d)
>
>   data:
> volumes: 1/1 healthy
> pools:   16 pools, 1809 pgs
> objects: 12.29M objects, 17 TiB
> usage:   44 TiB used, 67 TiB / 111 TiB avail
> pgs: 1793 active+clean
>  9active+clean+scrubbing
>  7active+clean+scrubbing+deep
>
>   io:
> client:   5.6 MiB/s rd, 273 KiB/s wr, 41 op/s rd, 58 op/s wr
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io