[ceph-users] Re: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients

2022-05-20 Thread Denis Polom

Hi,

no pool is EC.

Primary affinity works in Octopus on replicated pool.

Nautilus EC pool works.


On 5/20/22 19:25, denispo...@gmail.com wrote:

Hi,

no pool is EC.

20. 5. 2022 18:19:22 Dan van der Ster :

Hi,

Just a curiosity... It looks like you're comparing an EC pool in
octopus to a replicated pool in nautilus. Does primary affinity
work for you in octopus on a replicated pool? And does a nautilus
EC pool work?

.. Dan



On Fri., May 20, 2022, 13:53 Denis Polom, 
wrote:

Hi

I observed high latencies and mount points hanging since
Octopus release
and it's still observed on Pacific latest while draining OSD.

Cluster setup:

Ceph Pacific 16.2.7

Cephfs with EC data pool

EC profile setup:

crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=10
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Description:

If we have broken drive, we are removing it from Ceph cluster by
draining it first. That means changing its crush weight to 0

ceph osd crush reweight osd.1 0

Normally on Nautilus it didn't affected clients. But after
upgrade to
Octopus (and since Octopus till current Pacific release) I can
observe
very high IO latencies on clients while OSD being drained
(10sec and
higher).

By debugging I found out that drained OSD is still listed as
ACTIVE_PRIMARY and that happens only on EC pools and only
since Octopus.
I tested it back on Nautilus, to be sure, where behavior is
correct and
drained OSD is not listed under UP and ACTIVE OSDs for PGs.

Even if setting up primary-affinity for given OSD to 0 this
doesn't have
any effect on EC pool.

Bellow are my debugs:

Buggy behavior on Octopus and Pacific:

Before draining osd.70:

PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED UNFOUND
BYTES   OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG
STATE  STATE_STAMP VERSION
REPORTED   UP UP_PRIMARY ACTING
ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB
DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
16.1fff 2269   0 0  0 0
8955297727    0   0  2449 2449
active+clean 2022-05-19T08:41:55.241734+0200 19403690'275685
19407588:19607199    [70,206,216,375,307,57]  70
[70,206,216,375,307,57]  70    19384365'275621
2022-05-19T08:41:55.241493+0200    19384365'275621
2022-05-19T08:41:55.241493+0200  0
dumped pgs


after setting osd.70 crush weight to 0 (osd.70 is still acting
primary):

  UP UP_PRIMARY ACTING
ACTING_PRIMARY  LAST_SCRUB SCRUB_STAMP
LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
16.1fff 2269   0 0   2269 0
8955297727    0   0  2449  2449
active+remapped+backfill_wait 2022-05-20T08:51:54.249071+0200
19403690'275685  19407668:19607289
[71,206,216,375,307,57]  71
[70,206,216,375,307,57]  70    19384365'275621
2022-05-19T08:41:55.241493+0200    19384365'275621
2022-05-19T08:41:55.241493+0200  0
dumped pgs


Correct behavior on Nautilus:

Before draining osd.10:

PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND
BYTES
OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP
VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY
LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP
SNAPTRIMQ_LEN
2.4e  2  0    0 0   0
8388608   0  0   2    2 active+clean
2022-05-20
02:13:47.432104    61'2    75:40   [10,0,7] 10 [10,0,7]
10    0'0 2022-05-20 01:44:36.217286 0'0
2022-05-20
01:44:36.217286 0

after setting osd.10 crush weight to 0 (behavior is correct,
osd.10 is
not listed, not used):


root@nautilus1:~# ceph pg dump pgs | head -2
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND
BYTES
OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE
STATE_STAMP    VERSION REPORTED UP UP_PRIMARY
ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP
LAST_DEEP_SCRUB DEEP_SCRUB_STAMP   SNAPTRIMQ_LEN
2.4e 14  0    0 0   0
58720256   0  0  18   18 active+clean
2022-05-20
02:18:59.414812   

[ceph-users] Re: prometheus retention

2022-05-20 Thread Eugen Block

Hi,

I found this request [1] for version 18, it seems as if that’s not  
easily possible at the moment.


[1] https://tracker.ceph.com/issues/54308

Zitat von Vladimir Brik :


Hello

Is it possible to increase to increase the retention period of the  
prometheus service deployed with cephadm?



Thanks

Vlad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients

2022-05-20 Thread denispolom
Hi,

no pool is EC.

20. 5. 2022 18:19:22 Dan van der Ster :

> Hi,
> 
> Just a curiosity... It looks like you're comparing an EC pool in octopus to a 
> replicated pool in nautilus. Does primary affinity work for you in octopus on 
> a replicated pool? And does a nautilus EC pool work?
> 
> .. Dan
> 
> 
> 
> On Fri., May 20, 2022, 13:53 Denis Polom,  wrote:
>> Hi
>> 
>> I observed high latencies and mount points hanging since Octopus release
>> and it's still observed on Pacific latest while draining OSD.
>> 
>> Cluster setup:
>> 
>> Ceph Pacific 16.2.7
>> 
>> Cephfs with EC data pool
>> 
>> EC profile setup:
>> 
>> crush-device-class=
>> crush-failure-domain=host
>> crush-root=default
>> jerasure-per-chunk-alignment=false
>> k=10
>> m=2
>> plugin=jerasure
>> technique=reed_sol_van
>> w=8
>> 
>> Description:
>> 
>> If we have broken drive, we are removing it from Ceph cluster by
>> draining it first. That means changing its crush weight to 0
>> 
>> ceph osd crush reweight osd.1 0
>> 
>> Normally on Nautilus it didn't affected clients. But after upgrade to
>> Octopus (and since Octopus till current Pacific release) I can observe
>> very high IO latencies on clients while OSD being drained (10sec and
>> higher).
>> 
>> By debugging I found out that drained OSD is still listed as
>> ACTIVE_PRIMARY and that happens only on EC pools and only since Octopus.
>> I tested it back on Nautilus, to be sure, where behavior is correct and
>> drained OSD is not listed under UP and ACTIVE OSDs for PGs.
>> 
>> Even if setting up primary-affinity for given OSD to 0 this doesn't have
>> any effect on EC pool.
>> 
>> Bellow are my debugs:
>> 
>> Buggy behavior on Octopus and Pacific:
>> 
>> Before draining osd.70:
>> 
>> PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED UNFOUND 
>> BYTES   OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG
>> STATE  STATE_STAMP VERSION   
>> REPORTED   UP UP_PRIMARY  ACTING
>> ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB   
>> DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
>> 16.1fff 2269   0 0  0 0 
>> 8955297727    0   0  2449 2449  
>> active+clean 2022-05-19T08:41:55.241734+0200    19403690'275685
>> 19407588:19607199    [70,206,216,375,307,57]  70
>> [70,206,216,375,307,57]  70    19384365'275621
>> 2022-05-19T08:41:55.241493+0200    19384365'275621
>> 2022-05-19T08:41:55.241493+0200  0
>> dumped pgs
>> 
>> 
>> after setting osd.70 crush weight to 0 (osd.70 is still acting primary):
>> 
>>   UP UP_PRIMARY ACTING
>> ACTING_PRIMARY  LAST_SCRUB SCRUB_STAMP 
>> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
>> 16.1fff 2269   0 0   2269 0 
>> 8955297727    0   0  2449  2449
>> active+remapped+backfill_wait  2022-05-20T08:51:54.249071+0200
>> 19403690'275685  19407668:19607289 [71,206,216,375,307,57]  71
>> [70,206,216,375,307,57]  70    19384365'275621
>> 2022-05-19T08:41:55.241493+0200    19384365'275621
>> 2022-05-19T08:41:55.241493+0200  0
>> dumped pgs
>> 
>> 
>> Correct behavior on Nautilus:
>> 
>> Before draining osd.10:
>> 
>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES   
>> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP   
>> VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY
>> LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP  
>> SNAPTRIMQ_LEN
>> 2.4e  2  0    0 0   0
>> 8388608   0  0   2    2 active+clean 2022-05-20
>> 02:13:47.432104    61'2    75:40   [10,0,7] 10   [10,0,7]
>> 10    0'0 2022-05-20 01:44:36.217286 0'0 2022-05-20
>> 01:44:36.217286 0
>> 
>> after setting osd.10 crush weight to 0 (behavior is correct, osd.10 is
>> not listed, not used):
>> 
>> 
>> root@nautilus1:~# ceph pg dump pgs | head -2
>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
>> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE
>> STATE_STAMP    VERSION REPORTED UP UP_PRIMARY
>> ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP   
>> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP   SNAPTRIMQ_LEN
>> 2.4e 14  0    0 0   0
>> 58720256   0  0  18   18 active+clean 2022-05-20
>> 02:18:59.414812   75'18    80:43 [22,0,7] 22  
>> [22,0,7] 22    0'0 2022-05-20
>> 01:44:36.217286 0'0 2022-05-20 01:44:36.217286 0
>> 
>> 
>> Now question is if is it some implemented feature?
>> 
>> Or is it a bug?
>> 
>> Thank you!
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To 

[ceph-users] Re: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients

2022-05-20 Thread Dan van der Ster
Hi,

Just a curiosity... It looks like you're comparing an EC pool in octopus to
a replicated pool in nautilus. Does primary affinity work for you in
octopus on a replicated pool? And does a nautilus EC pool work?

.. Dan



On Fri., May 20, 2022, 13:53 Denis Polom,  wrote:

> Hi
>
> I observed high latencies and mount points hanging since Octopus release
> and it's still observed on Pacific latest while draining OSD.
>
> Cluster setup:
>
> Ceph Pacific 16.2.7
>
> Cephfs with EC data pool
>
> EC profile setup:
>
> crush-device-class=
> crush-failure-domain=host
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=10
> m=2
> plugin=jerasure
> technique=reed_sol_van
> w=8
>
> Description:
>
> If we have broken drive, we are removing it from Ceph cluster by
> draining it first. That means changing its crush weight to 0
>
> ceph osd crush reweight osd.1 0
>
> Normally on Nautilus it didn't affected clients. But after upgrade to
> Octopus (and since Octopus till current Pacific release) I can observe
> very high IO latencies on clients while OSD being drained (10sec and
> higher).
>
> By debugging I found out that drained OSD is still listed as
> ACTIVE_PRIMARY and that happens only on EC pools and only since Octopus.
> I tested it back on Nautilus, to be sure, where behavior is correct and
> drained OSD is not listed under UP and ACTIVE OSDs for PGs.
>
> Even if setting up primary-affinity for given OSD to 0 this doesn't have
> any effect on EC pool.
>
> Bellow are my debugs:
>
> Buggy behavior on Octopus and Pacific:
>
> Before draining osd.70:
>
> PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED UNFOUND
> BYTES   OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG
> STATE  STATE_STAMP VERSION
> REPORTED   UP UP_PRIMARY  ACTING
> ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB
> DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
> 16.1fff 2269   0 0  0 0
> 89552977270   0  2449 2449
> active+clean 2022-05-19T08:41:55.241734+020019403690'275685
> 19407588:19607199[70,206,216,375,307,57]  70
> [70,206,216,375,307,57]  7019384365'275621
> 2022-05-19T08:41:55.241493+020019384365'275621
> 2022-05-19T08:41:55.241493+0200  0
> dumped pgs
>
>
> after setting osd.70 crush weight to 0 (osd.70 is still acting primary):
>
>   UP UP_PRIMARY ACTING
> ACTING_PRIMARY  LAST_SCRUB SCRUB_STAMP
> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
> 16.1fff 2269   0 0   2269 0
> 89552977270   0  2449  2449
> active+remapped+backfill_wait  2022-05-20T08:51:54.249071+0200
> 19403690'275685  19407668:19607289 [71,206,216,375,307,57]  71
> [70,206,216,375,307,57]  7019384365'275621
> 2022-05-19T08:41:55.241493+020019384365'275621
> 2022-05-19T08:41:55.241493+0200  0
> dumped pgs
>
>
> Correct behavior on Nautilus:
>
> Before draining osd.10:
>
> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP
> VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY
> LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP
> SNAPTRIMQ_LEN
> 2.4e  2  00 0   0
> 8388608   0  0   22 active+clean 2022-05-20
> 02:13:47.43210461'275:40   [10,0,7] 10   [10,0,7]
> 100'0 2022-05-20 01:44:36.217286 0'0 2022-05-20
> 01:44:36.217286 0
>
> after setting osd.10 crush weight to 0 (behavior is correct, osd.10 is
> not listed, not used):
>
>
> root@nautilus1:~# ceph pg dump pgs | head -2
> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE
> STATE_STAMPVERSION REPORTED UP UP_PRIMARY
> ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP
> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP   SNAPTRIMQ_LEN
> 2.4e 14  00 0   0
> 58720256   0  0  18   18 active+clean 2022-05-20
> 02:18:59.414812   75'1880:43 [22,0,7] 22
> [22,0,7] 220'0 2022-05-20
> 01:44:36.217286 0'0 2022-05-20 01:44:36.217286 0
>
>
> Now question is if is it some implemented feature?
>
> Or is it a bug?
>
> Thank you!
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients

2022-05-20 Thread denispolom
Hi,
yes, I had to change the procedure also.
1. Stop osd daemon
2. mark osd out in crush map

But as you are writing, that makes PGs degraded.

However it still looks like bug to me.

20. 5. 2022 17:25:47 Wesley Dillingham :

> This sounds similar to an inquiry I submitted a couple years ago [1] whereby 
> I discovered that the choose_acting function does not consider primary 
> affinity when choosing the primary osd. I had made the assumption it would 
> when developing my procedure for replacing failing disks. After that 
> discovery I change my process to stop the OSD daemon failing (degraded pgs) 
> to ensure its not participating in PG anymore. Not sure if any of the 
> relevant code regarding this has changed since that initial submit but what 
> you describe here seems similar. 
> 
>  [1] https://tracker.ceph.com/issues/44400
> 
> Respectfully,
> 
> *Wes Dillingham*
> w...@wesdillingham.com
> *LinkedIn[http://www.linkedin.com/in/wesleydillingham]*
> 
> 
> On Fri, May 20, 2022 at 7:53 AM Denis Polom  wrote:
>> Hi
>> 
>> I observed high latencies and mount points hanging since Octopus release
>> and it's still observed on Pacific latest while draining OSD.
>> 
>> Cluster setup:
>> 
>> Ceph Pacific 16.2.7
>> 
>> Cephfs with EC data pool
>> 
>> EC profile setup:
>> 
>> crush-device-class=
>> crush-failure-domain=host
>> crush-root=default
>> jerasure-per-chunk-alignment=false
>> k=10
>> m=2
>> plugin=jerasure
>> technique=reed_sol_van
>> w=8
>> 
>> Description:
>> 
>> If we have broken drive, we are removing it from Ceph cluster by
>> draining it first. That means changing its crush weight to 0
>> 
>> ceph osd crush reweight osd.1 0
>> 
>> Normally on Nautilus it didn't affected clients. But after upgrade to
>> Octopus (and since Octopus till current Pacific release) I can observe
>> very high IO latencies on clients while OSD being drained (10sec and
>> higher).
>> 
>> By debugging I found out that drained OSD is still listed as
>> ACTIVE_PRIMARY and that happens only on EC pools and only since Octopus.
>> I tested it back on Nautilus, to be sure, where behavior is correct and
>> drained OSD is not listed under UP and ACTIVE OSDs for PGs.
>> 
>> Even if setting up primary-affinity for given OSD to 0 this doesn't have
>> any effect on EC pool.
>> 
>> Bellow are my debugs:
>> 
>> Buggy behavior on Octopus and Pacific:
>> 
>> Before draining osd.70:
>> 
>> PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED UNFOUND 
>> BYTES   OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG
>> STATE  STATE_STAMP VERSION   
>> REPORTED   UP UP_PRIMARY  ACTING
>> ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB   
>> DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
>> 16.1fff 2269   0 0  0 0 
>> 8955297727    0   0  2449 2449  
>> active+clean 2022-05-19T08:41:55.241734+0200    19403690'275685
>> 19407588:19607199    [70,206,216,375,307,57]  70
>> [70,206,216,375,307,57]  70    19384365'275621
>> 2022-05-19T08:41:55.241493+0200    19384365'275621
>> 2022-05-19T08:41:55.241493+0200  0
>> dumped pgs
>> 
>> 
>> after setting osd.70 crush weight to 0 (osd.70 is still acting primary):
>> 
>>   UP UP_PRIMARY ACTING
>> ACTING_PRIMARY  LAST_SCRUB SCRUB_STAMP 
>> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
>> 16.1fff 2269   0 0   2269 0 
>> 8955297727    0   0  2449  2449
>> active+remapped+backfill_wait  2022-05-20T08:51:54.249071+0200
>> 19403690'275685  19407668:19607289 [71,206,216,375,307,57]  71
>> [70,206,216,375,307,57]  70    19384365'275621
>> 2022-05-19T08:41:55.241493+0200    19384365'275621
>> 2022-05-19T08:41:55.241493+0200  0
>> dumped pgs
>> 
>> 
>> Correct behavior on Nautilus:
>> 
>> Before draining osd.10:
>> 
>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES   
>> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP   
>> VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY
>> LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP  
>> SNAPTRIMQ_LEN
>> 2.4e  2  0    0 0   0
>> 8388608   0  0   2    2 active+clean 2022-05-20
>> 02:13:47.432104    61'2    75:40   [10,0,7] 10   [10,0,7]
>> 10    0'0 2022-05-20 01:44:36.217286 0'0 2022-05-20
>> 01:44:36.217286 0
>> 
>> after setting osd.10 crush weight to 0 (behavior is correct, osd.10 is
>> not listed, not used):
>> 
>> 
>> root@nautilus1:~# ceph pg dump pgs | head -2
>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
>> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE
>> STATE_STAMP    VERSION REPORTED UP UP_PRIMARY
>> 

[ceph-users] Re: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients

2022-05-20 Thread Wesley Dillingham
This sounds similar to an inquiry I submitted a couple years ago [1]
whereby I discovered that the choose_acting function does not consider
primary affinity when choosing the primary osd. I had made the assumption
it would when developing my procedure for replacing failing disks. After
that discovery I change my process to stop the OSD daemon failing (degraded
pgs) to ensure its not participating in PG anymore. Not sure if any of the
relevant code regarding this has changed since that initial submit but what
you describe here seems similar.

 [1] https://tracker.ceph.com/issues/44400

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Fri, May 20, 2022 at 7:53 AM Denis Polom  wrote:

> Hi
>
> I observed high latencies and mount points hanging since Octopus release
> and it's still observed on Pacific latest while draining OSD.
>
> Cluster setup:
>
> Ceph Pacific 16.2.7
>
> Cephfs with EC data pool
>
> EC profile setup:
>
> crush-device-class=
> crush-failure-domain=host
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=10
> m=2
> plugin=jerasure
> technique=reed_sol_van
> w=8
>
> Description:
>
> If we have broken drive, we are removing it from Ceph cluster by
> draining it first. That means changing its crush weight to 0
>
> ceph osd crush reweight osd.1 0
>
> Normally on Nautilus it didn't affected clients. But after upgrade to
> Octopus (and since Octopus till current Pacific release) I can observe
> very high IO latencies on clients while OSD being drained (10sec and
> higher).
>
> By debugging I found out that drained OSD is still listed as
> ACTIVE_PRIMARY and that happens only on EC pools and only since Octopus.
> I tested it back on Nautilus, to be sure, where behavior is correct and
> drained OSD is not listed under UP and ACTIVE OSDs for PGs.
>
> Even if setting up primary-affinity for given OSD to 0 this doesn't have
> any effect on EC pool.
>
> Bellow are my debugs:
>
> Buggy behavior on Octopus and Pacific:
>
> Before draining osd.70:
>
> PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED UNFOUND
> BYTES   OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG
> STATE  STATE_STAMP VERSION
> REPORTED   UP UP_PRIMARY  ACTING
> ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB
> DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
> 16.1fff 2269   0 0  0 0
> 89552977270   0  2449 2449
> active+clean 2022-05-19T08:41:55.241734+020019403690'275685
> 19407588:19607199[70,206,216,375,307,57]  70
> [70,206,216,375,307,57]  7019384365'275621
> 2022-05-19T08:41:55.241493+020019384365'275621
> 2022-05-19T08:41:55.241493+0200  0
> dumped pgs
>
>
> after setting osd.70 crush weight to 0 (osd.70 is still acting primary):
>
>   UP UP_PRIMARY ACTING
> ACTING_PRIMARY  LAST_SCRUB SCRUB_STAMP
> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
> 16.1fff 2269   0 0   2269 0
> 89552977270   0  2449  2449
> active+remapped+backfill_wait  2022-05-20T08:51:54.249071+0200
> 19403690'275685  19407668:19607289 [71,206,216,375,307,57]  71
> [70,206,216,375,307,57]  7019384365'275621
> 2022-05-19T08:41:55.241493+020019384365'275621
> 2022-05-19T08:41:55.241493+0200  0
> dumped pgs
>
>
> Correct behavior on Nautilus:
>
> Before draining osd.10:
>
> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP
> VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY
> LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP
> SNAPTRIMQ_LEN
> 2.4e  2  00 0   0
> 8388608   0  0   22 active+clean 2022-05-20
> 02:13:47.43210461'275:40   [10,0,7] 10   [10,0,7]
> 100'0 2022-05-20 01:44:36.217286 0'0 2022-05-20
> 01:44:36.217286 0
>
> after setting osd.10 crush weight to 0 (behavior is correct, osd.10 is
> not listed, not used):
>
>
> root@nautilus1:~# ceph pg dump pgs | head -2
> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE
> STATE_STAMPVERSION REPORTED UP UP_PRIMARY
> ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP
> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP   SNAPTRIMQ_LEN
> 2.4e 14  00 0   0
> 58720256   0  0  18   18 active+clean 2022-05-20
> 02:18:59.414812   75'1880:43 [22,0,7] 22
> [22,0,7] 220'0 2022-05-20
> 01:44:36.217286 0'0 2022-05-20 01:44:36.217286 0
>
>
> Now question is if is it some implemented feature?
>
> Or is it a bug?
>
> Thank you!
>
> ___

[ceph-users] Re: [ext] Re: Rename / change host names set with `ceph orch host add`

2022-05-20 Thread Adam King
To clarify a bit, "ceph orch host rm  --force" won't actually
touch any of the daemons on the host. It just stops cephadm from managing
the host. I.e. it won't add/remove daemons on the host. If you remove the
host then re-add it with the new host name nothing should actually happen
to the daemons there. The only possible exception is if you have services
whose placement uses count and one of the daemons from that service is on
the host being temporarily removed. It's possible it could try to deploy
that daemon on another host in the interim. However, OSDs are never like
that so there would never be any need for flags like no-out or no-backfill.
The worst case would be it moving a mon or mgr around. If you make sure all
the important services are deployed by label, explicit hosts etc. (just not
count) then there should be no risk of any daemons moving at all and this
is a pretty safe operation.

On Fri, May 20, 2022 at 3:36 AM Kuhring, Mathias <
mathias.kuhr...@bih-charite.de> wrote:

> Hey Adam,
>
> thanks for your fast reply.
>
> That's a bit more invasive and risky than I was hoping for.
> But if this is the only way, I guess we need to do this.
>
> Would it be advisable to put some maintenance flags like noout,
> nobackfill, norebalance?
> And maybe stop the ceph target on the host I'm re-adding to pause all
> daemons?
>
> Best, Mathias
> On 5/19/2022 8:14 PM, Adam King wrote:
>
> cephadm just takes the hostname given in the "ceph orch host add" commands
> and assumes it won't change. The FQDN names (or whatever "ceph orch host
> ls" shows in any scenario) are from whatever input was given in those
> commands. Cephadm will even try to verify the hostname matches what is
> given when adding the host. As for where it is stored, we keep that info in
> the mon key store and it isn't meant to be manually updated (ceph
> config-key get mgr/cephadm/inventory). Although, there have occasionally
> been people running into issues related to a mismatch between an FQDN and a
> shortname. There's no built-in command for changing a hostname because of
> the expectation that it won't change. However, you should be able to fix
> this by removing and re-adding the host. E.g. "ceph orch host rm
> osd-mirror-1.our.domain.org" followed by "ceph orch host add osd-mirror-1
> 172.16.62.22 --labels rgw --labels osd". If you're on a late enough version
> that it requests you drain the host before we'll remove it (it was some
> pacific dot release, don't remember which one) you can pass --force to the
> host rm command. Generally it's not a good idea to remove hosts from
> cephadm's control while there are still cephadm deployed daemons on it like
> that but this is a special case. Anyway, removing and re-adding the host is
> the only (reasonable) way to change what it has stored for the hostname
> that I can remember.
>
> Let me know if that doesn't work,
>  - Adam King
>
> On Thu, May 19, 2022 at 1:41 PM Kuhring, Mathias <
> mathias.kuhr...@bih-charite.de> wrote:
>
>> Dear ceph users,
>>
>> one of our cluster is complaining about plenty of stray hosts and
>> daemons. Pretty much all of them.
>>
>> [WRN] CEPHADM_STRAY_HOST: 6 stray host(s) with 280 daemon(s) not managed
>> by cephadm
>>  stray host osd-mirror-1 has 47 stray daemons:
>> ['mgr.osd-mirror-1.ltmyyh', 'mon.osd-mirror-1', 'osd.1', ...]
>>  stray host osd-mirror-2 has 46 stray daemons: ['mon.osd-mirror-2',
>> 'osd.0', ...]
>>  stray host osd-mirror-3 has 48 stray daemons:
>> ['cephfs-mirror.osd-mirror-3.qzcuvv', 'mgr.osd-mirror-3',
>> 'mon.osd-mirror-3', 'osd.101', ...]
>>  stray host osd-mirror-4 has 47 stray daemons:
>> ['mds.cephfs.osd-mirror-4.omjlxu', 'mgr.osd-mirror-4', 'osd.103', ...]
>>  stray host osd-mirror-5 has 46 stray daemons: ['mgr.osd-mirror-5',
>> 'osd.139', ...]
>>  stray host osd-mirror-6 has 46 stray daemons:
>> ['mds.cephfs.osd-mirror-6.hobjsy', 'osd.141', ...]
>>
>> It all seems to boil down to host names from `ceph orch host ls` not
>> matching with other configurations.
>>
>> ceph orch host ls
>> HOSTADDR  LABELS STATUS
>> osd-mirror-1.our.domain.org  172.16.62.22  rgw osd
>> osd-mirror-2.our.domain.org  172.16.62.23  rgw osd
>> osd-mirror-3.our.domain.org  172.16.62.24  rgw osd
>> osd-mirror-4.our.domain.org  172.16.62.25  rgw mds osd
>> osd-mirror-5.our.domain.org  172.16.62.32  rgw osd
>> osd-mirror-6.our.domain.org  172.16.62.33  rgw mds osd
>>
>> hostname
>> osd-mirror-6
>>
>> hostname -f
>> osd-mirror-6.our.domain.org
>>
>> 0|0[root@osd-mirror-6 ~]# ceph mon metadata | grep "\"hostname\""
>>  "hostname": "osd-mirror-1",
>>  "hostname": "osd-mirror-3",
>>  "hostname": "osd-mirror-2",
>>
>> 0|1[root@osd-mirror-6 ~]# ceph mgr metadata | grep "\"hostname\""
>>  "hostname": "osd-mirror-1",
>>  "hostname": "osd-mirror-3",
>>  "hostname": "osd-mirror-4",
>>  "hostname": "osd-mirror-5",
>>
>>
>> The documentation states, that 

[ceph-users] Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients

2022-05-20 Thread Denis Polom

Hi

I observed high latencies and mount points hanging since Octopus release 
and it's still observed on Pacific latest while draining OSD.


Cluster setup:

Ceph Pacific 16.2.7

Cephfs with EC data pool

EC profile setup:

crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=10
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Description:

If we have broken drive, we are removing it from Ceph cluster by 
draining it first. That means changing its crush weight to 0


ceph osd crush reweight osd.1 0

Normally on Nautilus it didn't affected clients. But after upgrade to 
Octopus (and since Octopus till current Pacific release) I can observe 
very high IO latencies on clients while OSD being drained (10sec and 
higher).


By debugging I found out that drained OSD is still listed as 
ACTIVE_PRIMARY and that happens only on EC pools and only since Octopus. 
I tested it back on Nautilus, to be sure, where behavior is correct and 
drained OSD is not listed under UP and ACTIVE OSDs for PGs.


Even if setting up primary-affinity for given OSD to 0 this doesn't have 
any effect on EC pool.


Bellow are my debugs:

Buggy behavior on Octopus and Pacific:

Before draining osd.70:

PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED UNFOUND  
BYTES   OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG 
STATE  STATE_STAMP VERSION    
REPORTED   UP UP_PRIMARY  ACTING 
ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB    
DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
16.1fff 2269   0 0  0 0  
8955297727    0   0  2449 2449   
active+clean 2022-05-19T08:41:55.241734+0200    19403690'275685 
19407588:19607199    [70,206,216,375,307,57]  70 
[70,206,216,375,307,57]  70    19384365'275621 
2022-05-19T08:41:55.241493+0200    19384365'275621 
2022-05-19T08:41:55.241493+0200  0

dumped pgs


after setting osd.70 crush weight to 0 (osd.70 is still acting primary):

 UP UP_PRIMARY ACTING 
ACTING_PRIMARY  LAST_SCRUB SCRUB_STAMP  
LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
16.1fff 2269   0 0   2269 0  
8955297727    0   0  2449  2449 
active+remapped+backfill_wait  2022-05-20T08:51:54.249071+0200 
19403690'275685  19407668:19607289 [71,206,216,375,307,57]  71 
[70,206,216,375,307,57]  70    19384365'275621 
2022-05-19T08:41:55.241493+0200    19384365'275621 
2022-05-19T08:41:55.241493+0200  0

dumped pgs


Correct behavior on Nautilus:

Before draining osd.10:

PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES    
OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP    
VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY 
LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP   
SNAPTRIMQ_LEN
2.4e  2  0    0 0   0 
8388608   0  0   2    2 active+clean 2022-05-20 
02:13:47.432104    61'2    75:40   [10,0,7] 10   [10,0,7] 
10    0'0 2022-05-20 01:44:36.217286 0'0 2022-05-20 
01:44:36.217286 0


after setting osd.10 crush weight to 0 (behavior is correct, osd.10 is 
not listed, not used):



root@nautilus1:~# ceph pg dump pgs | head -2
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES 
OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE 
STATE_STAMP    VERSION REPORTED UP UP_PRIMARY 
ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP    
LAST_DEEP_SCRUB DEEP_SCRUB_STAMP   SNAPTRIMQ_LEN
2.4e 14  0    0 0   0 
58720256   0  0  18   18 active+clean 2022-05-20 
02:18:59.414812   75'18    80:43 [22,0,7] 22   
[22,0,7] 22    0'0 2022-05-20 
01:44:36.217286 0'0 2022-05-20 01:44:36.217286 0



Now question is if is it some implemented feature?

Or is it a bug?

Thank you!

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [ext] Re: Rename / change host names set with `ceph orch host add`

2022-05-20 Thread Kuhring, Mathias
Hey Adam,

thanks for your fast reply.

That's a bit more invasive and risky than I was hoping for.
But if this is the only way, I guess we need to do this.

Would it be advisable to put some maintenance flags like noout, nobackfill, 
norebalance?
And maybe stop the ceph target on the host I'm re-adding to pause all daemons?

Best, Mathias

On 5/19/2022 8:14 PM, Adam King wrote:
cephadm just takes the hostname given in the "ceph orch host add" commands and 
assumes it won't change. The FQDN names (or whatever "ceph orch host ls" shows 
in any scenario) are from whatever input was given in those commands. Cephadm 
will even try to verify the hostname matches what is given when adding the 
host. As for where it is stored, we keep that info in the mon key store and it 
isn't meant to be manually updated (ceph config-key get mgr/cephadm/inventory). 
Although, there have occasionally been people running into issues related to a 
mismatch between an FQDN and a shortname. There's no built-in command for 
changing a hostname because of the expectation that it won't change. However, 
you should be able to fix this by removing and re-adding the host. E.g. "ceph 
orch host rm osd-mirror-1.our.domain.org" 
followed by "ceph orch host add osd-mirror-1 172.16.62.22 --labels rgw --labels 
osd". If you're on a late enough version that it requests you drain the host 
before we'll remove it (it was some pacific dot release, don't remember which 
one) you can pass --force to the host rm command. Generally it's not a good 
idea to remove hosts from cephadm's control while there are still cephadm 
deployed daemons on it like that but this is a special case. Anyway, removing 
and re-adding the host is the only (reasonable) way to change what it has 
stored for the hostname that I can remember.

Let me know if that doesn't work,
 - Adam King

On Thu, May 19, 2022 at 1:41 PM Kuhring, Mathias 
mailto:mathias.kuhr...@bih-charite.de>> wrote:
Dear ceph users,

one of our cluster is complaining about plenty of stray hosts and
daemons. Pretty much all of them.

[WRN] CEPHADM_STRAY_HOST: 6 stray host(s) with 280 daemon(s) not managed
by cephadm
 stray host osd-mirror-1 has 47 stray daemons:
['mgr.osd-mirror-1.ltmyyh', 'mon.osd-mirror-1', 'osd.1', ...]
 stray host osd-mirror-2 has 46 stray daemons: ['mon.osd-mirror-2',
'osd.0', ...]
 stray host osd-mirror-3 has 48 stray daemons:
['cephfs-mirror.osd-mirror-3.qzcuvv', 'mgr.osd-mirror-3',
'mon.osd-mirror-3', 'osd.101', ...]
 stray host osd-mirror-4 has 47 stray daemons:
['mds.cephfs.osd-mirror-4.omjlxu', 'mgr.osd-mirror-4', 'osd.103', ...]
 stray host osd-mirror-5 has 46 stray daemons: ['mgr.osd-mirror-5',
'osd.139', ...]
 stray host osd-mirror-6 has 46 stray daemons:
['mds.cephfs.osd-mirror-6.hobjsy', 'osd.141', ...]

It all seems to boil down to host names from `ceph orch host ls` not
matching with other configurations.

ceph orch host ls
HOSTADDR  LABELS STATUS
osd-mirror-1.our.domain.org  172.16.62.22  
rgw osd
osd-mirror-2.our.domain.org  172.16.62.23  
rgw osd
osd-mirror-3.our.domain.org  172.16.62.24  
rgw osd
osd-mirror-4.our.domain.org  172.16.62.25  
rgw mds osd
osd-mirror-5.our.domain.org  172.16.62.32  
rgw osd
osd-mirror-6.our.domain.org  172.16.62.33  
rgw mds osd

hostname
osd-mirror-6

hostname -f
osd-mirror-6.our.domain.org

0|0[root@osd-mirror-6 ~]# ceph mon metadata | grep "\"hostname\""
 "hostname": "osd-mirror-1",
 "hostname": "osd-mirror-3",
 "hostname": "osd-mirror-2",

0|1[root@osd-mirror-6 ~]# ceph mgr metadata | grep "\"hostname\""
 "hostname": "osd-mirror-1",
 "hostname": "osd-mirror-3",
 "hostname": "osd-mirror-4",
 "hostname": "osd-mirror-5",


The documentation states, that "cephadm demands that the name of host
given via `ceph orch host add` equals the output of `hostname` on remote
hosts.".

https://docs.ceph.com/en/latest/cephadm/host-management/#fully-qualified-domain-names-vs-bare-host-names

https://docs.ceph.com/en/octopus/cephadm/concepts/?#fully-qualified-domain-names-vs-bare-host-names

But it seems our cluster wasn't setup like this.

How can I now change the host names which were assigend when adding the
hosts with `ceph orch host add HOSTNAME`?

I can't seem to find any documentation on changing the host names which
are listed by `ceph orch host ls`.
All I can find is related to changing the actual name of the host in the
system.
The crush map also just contains the bare host names.
So, where are these FQDN names actually registered?

Thank you for help.

Best regards,
Mathias
___
ceph-users mailing 

[ceph-users] Re: Ceph RBD pool copy?

2022-05-20 Thread Janne Johansson
Den tors 19 maj 2022 kl 21:00 skrev Eugen Block :
> I haven’t dealt with this for some time, it used to be a problem in
> earlier releases. But can’t you just change the ruleset of the glance
> pool to use the „better“ OSDs?

Yes, changing the crush rule to one that uses ssds will suffice and
the cluster will do all the work for you.

> >
> > We have a ceph cluster with integration to Openstack. We are thinking about
> > migrating the glance (images) pool to a new pool with better SSD disks. I
> > see there is a "rados cppool" command. Will that work with snapshots in
> > this rbd pool?
> >



-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io