[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-23 Thread Frédéric Nass

Considering 
https://github.com/ceph/ceph/blob/f6edcef6efe209e8947887752bd2b833d0ca13b7/src/osd/OSD.cc#L10086,
 the OSD:

- always sets and updates its per osd osd_mclock_max_capacity_iops_{hdd,ssd} 
when the benchmark occurs and its measured iops is below or equal to 
osd_mclock_iops_capacity_threshold_{hdd,ssd}
but
- doesn't remove osd_mclock_max_capacity_iops_{hdd,ssd} when the measured iops 
exceeds osd_mclock_iops_capacity_threshold_{hdd,ssd} (500 for HDD and 80.000 
for SSD) and the current value for osd_mclock_max_capacity_iops_{hdd,ssd} is 
set below its default (315 for HDD and 21500 for SSD)

Thus per osd osd_mclock_max_capacity_iops_hdd sometimes being set as low as 
0.145327 (as per Michel's post) and never being updated afterwards leading to 
performance issues.
The idea of a minimum threshold below which 
osd_mclock_iops_capacity_threshold_{hdd,ssd} should not be set seems relevant.

CC'ing Sridhar to have his thoughts.

Cheers,
Frédéric.

- Le 22 Mar 24, à 19:37, Kai Stian Olstad ceph+l...@olstad.com a écrit :

> On Fri, Mar 22, 2024 at 06:51:44PM +0100, Frédéric Nass wrote:
>>
>>> The OSD run bench and update osd_mclock_max_capacity_iops_{hdd,ssd} every 
>>> time
>>> the OSD is started.
>>> If you check the OSD log you'll see it does the bench.
>> 
>>Are you sure about the update on every start? Does the update happen only if 
>>the
>>benchmark result is < 500 iops?
>> 
>>Looks like the OSD does not remove any set configuration when the benchmark
>>result is > 500 iops. Otherwise, the extremely low value that Michel reported
>>earlier (less than 1 iops) would have been updated over time.
>>I guess.
> 
> I'm not completely sure, it's a couple a month since I used mclock, have 
> switch
> back to wpq because of a nasty bug in mclock that can freeze cluster I/O.
> 
> It could be because I was testing osd_mclock_force_run_benchmark_on_init.
> The OSD had DB on SSD and data on HDD, so the measured to about 1700 IOPS and
> was ignored because of the 500 limit.
> So only the SSD got the osd_mclock_max_capacity_iops_ssd set.
> 
> --
> Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Kai Stian Olstad

On Fri, Mar 22, 2024 at 06:51:44PM +0100, Frédéric Nass wrote:



The OSD run bench and update osd_mclock_max_capacity_iops_{hdd,ssd} every time 
the OSD is started.
If you check the OSD log you'll see it does the bench.

 
Are you sure about the update on every start? Does the update happen only if the 
benchmark result is < 500 iops?
 
Looks like the OSD does not remove any set configuration when the benchmark result 
is > 500 iops. Otherwise, the extremely low value that Michel reported earlier 
(less than 1 iops) would have been updated over time.
I guess.


I'm not completely sure, it's a couple a month since I used mclock, have switch
back to wpq because of a nasty bug in mclock that can freeze cluster I/O.

It could be because I was testing osd_mclock_force_run_benchmark_on_init.
The OSD had DB on SSD and data on HDD, so the measured to about 1700 IOPS and
was ignored because of the 500 limit.
So only the SSD got the osd_mclock_max_capacity_iops_ssd set.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Frédéric Nass
 
 
 
  
 
> The OSD run bench and update osd_mclock_max_capacity_iops_{hdd,ssd} every 
> time the OSD is started. 
> If you check the OSD log you'll see it does the bench.  
  
Are you sure about the update on every start? Does the update happen only if 
the benchmark result is < 500 iops? 
  
Looks like the OSD does not remove any set configuration when the benchmark 
result is > 500 iops. Otherwise, the extremely low value that Michel reported 
earlier (less than 1 iops) would have been updated over time. 
I guess. 
  
 
 
Frédéric.  

 
 
 
 

-Message original-

De: Kai 
à: Frédéric 
Cc: Michel ; Pierre ; 
ceph-users 
Envoyé: vendredi 22 mars 2024 18:32 CET
Sujet : Re: [ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed 
for 1 month

On Fri, Mar 22, 2024 at 04:29:21PM +0100, Frédéric Nass wrote: 
>A/ these incredibly low values were calculated a while back with an unmature 
>version of the code or under some specific hardware conditions and you can 
>hope this won't happen again 

The OSD run bench and update osd_mclock_max_capacity_iops_{hdd,ssd} every time 
the OSD is started. 
If you check the OSD log you'll see it does the bench. 

-- 
Kai Stian Olstad 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Frédéric Nass
 
 
 
 
 
Michel, 
  
Log says that osd.29 is providing 2792 '4k' iops at 10.910 MiB/s. These figures 
suggest that a controller write-back cache is in use along the IO path. Is that 
right? 
  
Since 2792 is above 500, osd_mclock_max_capacity_iops_hdd falls back to 315 and 
OSD is suggesting running a benchmark and setting 
osd_mclock_max_capacity_iops_[hdd|ssd] accordingly. 
Removing any per osd osd_mclock_max_capacity_iops_hdd and restarting all 
concerned OSDs, checking that no osd_mclock_max_capacity_iops_hdd is set 
anymore should be enough for the time being. 
  
No sure why these OSDs had such pretty bad performance in the past. Maybe a 
controller firmware issue at that time. 
  
Regarding the write-back cache, be carefull to not set 
osd_mclock_max_capacity_iops_hdd too high as OSDs may not always benefit from 
the controller's write-back cache, especially during large IO workloads filling 
up the cache or would this cache be disabled due to controller's battery 
becoming defective. 
  
I'll be interested in what you decide for osd_mclock_max_capacity_iops_hdd in 
such configuration. 
  
Cheers, 
Frédéric.

 
 
 
 

-Message original-

De: Michel 
à: ceph-users 
Envoyé: vendredi 22 mars 2024 17:20 CET
Sujet : [ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 
month

Hi, 

The attempt to rerun the bench was not really a success. I got the 
following messages: 

- 

Mar 22 14:48:36 idr-osd2 ceph-osd[326854]: osd.29 83873 
maybe_override_max_osd_capacity_for_qos osd bench result - bandwidth 
(MiB/sec): 10.910 iops: 2792.876 elapsed_sec: 1.074 
Mar 22 14:48:36 idr-osd2 ceph-osd[326854]: log_channel(cluster) log 
[WRN] : OSD bench result of 2792.876456 IOPS exceeded the threshold 
limit of 500.00 IOPS for osd.29. IOPS capacity is unchanged at 
0.00 IOPS. The recommendation is to establish the osd's IOPS 
capacity using other benchmark tools (e.g. Fio) and then override 
osd_mclock_max_capacity_iops_[hdd|ssd]. 
- 

I decided as a first step to raise the osd_mclock_max_capacity_iops_hdd 
for the suspect OSD to 50. It was magic! I already managed to get 16 
over 17 scrubs/deep scrubs to be run and the last one is in progress. 

I now have to understand why this OSD had such bad perfs that 
osd_mclock_max_capacity_iops_hdd was set to such a low value... I have 
12 OSDs with an entry for their osd_mclock_max_capacity_iops_hdd and 
they are mostly on one server (with 2 OSDs on another one). I suspect 
there was a problem on these servers at some points. It is unclear why 
it is not enough to just rerun the benchmark and why a crazy value for 
an HDD is found... 

Best regards, 

Michel 

Le 22/03/2024 à 14:44, Michel Jouvin a écrit : 
> Hi Frédéric, 
> 
> I think you raise the right point, sorry if I misunderstood Pierre's 
> suggestion to look at OSD performances. Just before reading your 
> email, I was implementing Pierre's suggestion for max_osd_scrubs and I 
> saw the osd_mclock_max_capacity_iops_hdd for a few OSDs (I guess those 
> with a value different from the default). For the suspect OSD, the 
> value is very low, 0.145327, and I suspect it is the cause of the 
> problem. A few others have a value ~5 which I find also very low (all 
> OSDs are using the same recent HW/HDD). 
> 
> Thanks for these informations. I'll follow your suggestions to rerun 
> the benchmark and report if it improved the situation. 
> 
> Best regards, 
> 
> Michel 
> 
> Le 22/03/2024 à 12:18, Frédéric Nass a écrit : 
>> Hello Michel, 
>> 
>> Pierre also suggested checking the performance of this OSD's 
>> device(s) which can be done by running a ceph tell osd.x bench. 
>> 
>> One think I can think of is how the scrubbing speed of this very OSD 
>> could be influenced by mclock sheduling, would the max iops capacity 
>> calculated by this OSD during its initialization be significantly 
>> lower than other OSDs's. 
>> 
>> What I would do is check (from this OSD's log) the calculated value 
>> for max iops capacity and compare it to other OSDs. Eventually force 
>> a recalculation by setting 'ceph config set osd.x 
>> osd_mclock_force_run_benchmark_on_init true' and restart this OSD. 
>> 
>> Also I would: 
>> 
>> - compare running OSD's mclock values (cephadm shell ceph daemon 
>> osd.x config show | grep mclock) to other OSDs's. 
>> - compare ceph tell osd.x bench to other OSDs's benchmarks. 
>> - compare the rotational status of this OSD's db and data devices to 
>> other OSDs, to make sure things are in order. 
>> 
>> Bests, 
>> Frédéric. 
>> 
>> PS: If mclock is the culprit here, then setting osd_op_queue back to 
>> mpq for this only OSD would probably rev

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Kai Stian Olstad

On Fri, Mar 22, 2024 at 04:29:21PM +0100, Frédéric Nass wrote:

A/ these incredibly low values were calculated a while back with an unmature 
version of the code or under some specific hardware conditions and you can hope 
this won't happen again


The OSD run bench and update osd_mclock_max_capacity_iops_{hdd,ssd} every time 
the OSD is started.
If you check the OSD log you'll see it does the bench.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Anthony D'Atri
Perhaps emitting an extremely low value could have value for identifying a 
compromised drive?

> On Mar 22, 2024, at 12:49, Michel Jouvin  
> wrote:
> 
> Frédéric,
> 
> We arrived at the same conclusions! I agree that an insane low value would be 
> a good addition: the idea would be that the benchmark emits a warning about 
> the value but the it will not put a value lower than the minimum defined. I 
> don't have a precise idea of the possible bad side effects of such an 
> approach...
> 
> Thanks for your help.
> 
> Michel
> 
> Le 22/03/2024 à 16:29, Frédéric Nass a écrit :
>> Michel,
>> Glad to know that was it.
>> I was wondering when would per OSD osd_mclock_max_capacity_iops_hdd value be 
>> set in cluster's config database since I don't have any set in my lab.
>> Turns out the per OSD osd_mclock_max_capacity_iops_hdd is only set when the 
>> calculated value is below osd_mclock_iops_capacity_threshold_hdd, otherwise 
>> the OSD uses the default value of 315.
>> Probably to rule out any insanely high calculated values. Would have been 
>> nice to also rule out any insanely low measured values. :-)
>> Now either:
>> A/ these incredibly low values were calculated a while back with an unmature 
>> version of the code or under some specific hardware conditions and you can 
>> hope this won't happen again
>> OR
>> B/ you don't want to rely on hope to much and you'll prefer to disable 
>> automatic calculation (osd_mclock_skip_benchmark = true) and set 
>> osd_mclock_max_capacity_iops_[hdd,ssd] by yourself (globally or using a 
>> rack/host mask) after a precise evaluation of the performance of your OSDs.
>> B/ would be more deterministic :-)
>> Cheers,
>> Frédéric.
>> 
>>----------------------------
>>*De: *Michel 
>>*à: *Frédéric 
>>*Cc: *Pierre ; ceph-users 
>>*Envoyé: *vendredi 22 mars 2024 14:44 CET
>>*Sujet : *Re: [ceph-users] Re: Reef (18.2): Some PG not
>>scrubbed/deep scrubbed for 1 month
>> 
>>Hi Frédéric,
>> 
>>I think you raise the right point, sorry if I misunderstood Pierre's
>>suggestion to look at OSD performances. Just before reading your
>>email,
>>I was implementing Pierre's suggestion for max_osd_scrubs and I
>>saw the
>>osd_mclock_max_capacity_iops_hdd for a few OSDs (I guess those with a
>>value different from the default). For the suspect OSD, the value is
>>very low, 0.145327, and I suspect it is the cause of the problem.
>>A few
>>others have a value ~5 which I find also very low (all OSDs are using
>>the same recent HW/HDD).
>> 
>>Thanks for these informations. I'll follow your suggestions to
>>rerun the
>>benchmark and report if it improved the situation.
>> 
>>Best regards,
>> 
>>Michel
>> 
>>Le 22/03/2024 à 12:18, Frédéric Nass a écrit :
>>> Hello Michel,
>>>
>>> Pierre also suggested checking the performance of this OSD's
>>device(s) which can be done by running a ceph tell osd.x bench.
>>>
>>> One think I can think of is how the scrubbing speed of this very
>>OSD could be influenced by mclock sheduling, would the max iops
>>capacity calculated by this OSD during its initialization be
>>significantly lower than other OSDs's.
>>>
>>> What I would do is check (from this OSD's log) the calculated
>>value for max iops capacity and compare it to other OSDs.
>>Eventually force a recalculation by setting 'ceph config set osd.x
>>osd_mclock_force_run_benchmark_on_init true' and restart this OSD.
>>>
>>> Also I would:
>>>
>>> - compare running OSD's mclock values (cephadm shell ceph daemon
>>osd.x config show | grep mclock) to other OSDs's.
>>> - compare ceph tell osd.x bench to other OSDs's benchmarks.
>>> - compare the rotational status of this OSD's db and data
>>devices to other OSDs, to make sure things are in order.
>>>
>>> Bests,
>>> Frédéric.
>>>
>>> PS: If mclock is the culprit here, then setting osd_op_queue
>>back to mpq for this only OSD would probably reveal it. Not sure
>>about the implication of having a signel OSD running a different
>>scheduler in the cluster though.
>>>
>>>
>>> - Le 22 

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Michel Jouvin

Frédéric,

We arrived at the same conclusions! I agree that an insane low value 
would be a good addition: the idea would be that the benchmark emits a 
warning about the value but the it will not put a value lower than the 
minimum defined. I don't have a precise idea of the possible bad side 
effects of such an approach...


Thanks for your help.

Michel

Le 22/03/2024 à 16:29, Frédéric Nass a écrit :

Michel,
Glad to know that was it.
I was wondering when would per OSD osd_mclock_max_capacity_iops_hdd 
value be set in cluster's config database since I don't have any set 
in my lab.
Turns out the per OSD osd_mclock_max_capacity_iops_hdd is only set 
when the calculated value is below 
osd_mclock_iops_capacity_threshold_hdd, otherwise the OSD uses the 
default value of 315.
Probably to rule out any insanely high calculated values. Would have 
been nice to also rule out any insanely low measured values. :-)

Now either:
A/ these incredibly low values were calculated a while back with an 
unmature version of the code or under some specific hardware 
conditions and you can hope this won't happen again

OR
B/ you don't want to rely on hope to much and you'll prefer to disable 
automatic calculation (osd_mclock_skip_benchmark = true) and set 
osd_mclock_max_capacity_iops_[hdd,ssd] by yourself (globally or using 
a rack/host mask) after a precise evaluation of the performance of 
your OSDs.

B/ would be more deterministic :-)
Cheers,
Frédéric.


*De: *Michel 
*à: *Frédéric 
*Cc: *Pierre ; ceph-users 
*Envoyé: *vendredi 22 mars 2024 14:44 CET
    *Sujet : *Re: [ceph-users] Re: Reef (18.2): Some PG not
    scrubbed/deep scrubbed for 1 month

Hi Frédéric,

I think you raise the right point, sorry if I misunderstood Pierre's
suggestion to look at OSD performances. Just before reading your
email,
I was implementing Pierre's suggestion for max_osd_scrubs and I
saw the
osd_mclock_max_capacity_iops_hdd for a few OSDs (I guess those with a
value different from the default). For the suspect OSD, the value is
very low, 0.145327, and I suspect it is the cause of the problem.
A few
others have a value ~5 which I find also very low (all OSDs are using
the same recent HW/HDD).

Thanks for these informations. I'll follow your suggestions to
rerun the
benchmark and report if it improved the situation.

Best regards,

Michel

Le 22/03/2024 à 12:18, Frédéric Nass a écrit :
> Hello Michel,
>
> Pierre also suggested checking the performance of this OSD's
device(s) which can be done by running a ceph tell osd.x bench.
>
> One think I can think of is how the scrubbing speed of this very
OSD could be influenced by mclock sheduling, would the max iops
capacity calculated by this OSD during its initialization be
significantly lower than other OSDs's.
>
> What I would do is check (from this OSD's log) the calculated
value for max iops capacity and compare it to other OSDs.
Eventually force a recalculation by setting 'ceph config set osd.x
osd_mclock_force_run_benchmark_on_init true' and restart this OSD.
>
> Also I would:
>
> - compare running OSD's mclock values (cephadm shell ceph daemon
osd.x config show | grep mclock) to other OSDs's.
> - compare ceph tell osd.x bench to other OSDs's benchmarks.
> - compare the rotational status of this OSD's db and data
devices to other OSDs, to make sure things are in order.
>
> Bests,
> Frédéric.
>
> PS: If mclock is the culprit here, then setting osd_op_queue
back to mpq for this only OSD would probably reveal it. Not sure
about the implication of having a signel OSD running a different
scheduler in the cluster though.
>
>
> - Le 22 Mar 24, à 10:11, Michel Jouvin
michel.jou...@ijclab.in2p3.fr a écrit :
>
>> Pierre,
>>
>> Yes, as mentioned in my initial email, I checked the OSD state
and found
>> nothing wrong either in the OSD logs or in the system logs
(SMART errors).
>>
>> Thanks for the advice of increasing osd_max_scrubs, I may try
it, but I
>> doubt it is a contention problem because it really only affects
a fixed
>> set of PGs (no new PGS have a "stucked scrub") and there is a
>> significant scrubbing activity going on continuously (~10K PGs
in the
>> cluster).
>>
>> Again, it is not a problem for me to try to kick out the
suspect OSD and
>> see it fixes the issue but as this cluster is pretty simple/low
in terms
>> of activity and I see nothing that may ex

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Michel Jouvin
log is still flooded with "scrub starts" and i have no
     clue
     > why these OSDs are causing the problems.
 > Will investigate further.
 >
 > Best regards,
 > Gunnar
 >
 > ===
 >
 >  Gunnar Bandelow
 >  Universitätsrechenzentrum (URZ)
 >  Universität Greifswald
 >  Felix-Hausdorff-Straße 18
 >  17489 Greifswald
 >  Germany
 >
 >  Tel.: +49 3834 420 1450
 >
 >
 > --- Original Nachricht ---
 > *Betreff: *[ceph-users] Re: Reef (18.2): Some PG not 
scrubbed/deep

 > scrubbed for 1 month
 > *Von: *"Michel Jouvin"  <mailto:michel.jou...@ijclab.in2p3.fr>>
 > *An: *ceph-users@ceph.io <mailto:ceph-users@ceph.io>
 > *Datum: *21-03-2024 23:40
 >
 >
 >
 >     Hi,
 >
 >     Today we decided to upgrade from 18.2.0 to 18.2.2. No real
 hope of a
 >     direct impact (nothing in the change log related to 
something

 >     similar)
 >     but at least all daemons were restarted so we thought that
 may be
 >     this
 >     will clear the problem at least temporarily. Unfortunately
 it has not
 >     been the case. The same pages are still stuck, despite
 continuous
 >     activity of scrubbing/deep scrubbing in the cluster...
 >
 >     I'm happy to provide more information if somebody tells me
 what to
 >     look
 >     at...
 >
 >     Cheers,
 >
 >     Michel
 >
 >     Le 21/03/2024 à 14:40, Bernhard Krieger a écrit :
 >     > Hi,
 >     >
 >     > i have the same issues.
 >     > Deep scrub havent finished the jobs on some PGs.
 >     >
 >     > Using ceph 18.2.2.
 >     > Initial installed version was 18.0.0
 >     >
 >     >
 >     > In the logs i see a lot of scrub/deep-scrub starts
 >     >
 >     > Mar 21 14:21:09 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 13.b deep-scrubstarts
 >     > Mar 21 14:21:10 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 13.1a deep-scrubstarts
 >     > Mar 21 14:21:17 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 13.1c deep-scrubstarts
 >     > Mar 21 14:21:19 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 11.1 scrubstarts
 >     > Mar 21 14:21:27 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 14.6 scrubstarts
 >     > Mar 21 14:21:30 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 10.c deep-scrubstarts
 >     > Mar 21 14:21:35 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 12.3 deep-scrubstarts
 >     > Mar 21 14:21:41 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 6.0 scrubstarts
 >     > Mar 21 14:21:44 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 8.5 deep-scrubstarts
 >     > Mar 21 14:21:45 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 5.66 deep-scrubstarts
 >     > Mar 21 14:21:49 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 5.30 deep-scrubstarts
 >     > Mar 21 14:21:50 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 13.b deep-scrubstarts
 >     > Mar 21 14:21:52 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 13.1a deep-scrubstarts
 >     > Mar 21 14:21:54 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 13.1c deep-scrubstarts
 >     > Mar 21 14:21:55 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 11.1 scrubstarts
 >     > Mar 21 14:21:58 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 14.6 scrubstarts
 >     > Mar 21 14:22:01 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 10.c deep-scrubstarts
 >     > Mar 21 14:22:04 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 12.3 scrubstarts
 >     > Mar 21 14:22:13 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 6.0 scrubstarts
 >     > Mar 21 14:22:15 ceph-node10 ceph-osd[3804193]:
 log_channel(cluster)
 >     > log [DBG] : 8.5 deep-scrubstarts
 >     > Mar 

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Frédéric Nass
 
 
 
Michel, 
  
Glad to know that was it. 
  
I was wondering when would per OSD osd_mclock_max_capacity_iops_hdd value be 
set in cluster's config database since I don't have any set in my lab. 
Turns out the per OSD osd_mclock_max_capacity_iops_hdd is only set when the 
calculated value is below osd_mclock_iops_capacity_threshold_hdd, otherwise the 
OSD uses the default value of 315. 
  
Probably to rule out any insanely high calculated values. Would have been nice 
to also rule out any insanely low measured values. :-) 
  
Now either: 
  
A/ these incredibly low values were calculated a while back with an unmature 
version of the code or under some specific hardware conditions and you can hope 
this won't happen again 
  
OR 
  
B/ you don't want to rely on hope to much and you'll prefer to disable 
automatic calculation (osd_mclock_skip_benchmark = true) and set 
osd_mclock_max_capacity_iops_[hdd,ssd] by yourself (globally or using a 
rack/host mask) after a precise evaluation of the performance of your OSDs. 
  
B/ would be more deterministic :-) 
  
Cheers, 
Frédéric.   
 
 
 
 
 

-Message original-

De: Michel 
à: Frédéric 
Cc: Pierre ; ceph-users 
Envoyé: vendredi 22 mars 2024 14:44 CET
Sujet : Re: [ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed 
for 1 month

Hi Frédéric, 

I think you raise the right point, sorry if I misunderstood Pierre's 
suggestion to look at OSD performances. Just before reading your email, 
I was implementing Pierre's suggestion for max_osd_scrubs and I saw the 
osd_mclock_max_capacity_iops_hdd for a few OSDs (I guess those with a 
value different from the default). For the suspect OSD, the value is 
very low, 0.145327, and I suspect it is the cause of the problem. A few 
others have a value ~5 which I find also very low (all OSDs are using 
the same recent HW/HDD). 

Thanks for these informations. I'll follow your suggestions to rerun the 
benchmark and report if it improved the situation. 

Best regards, 

Michel 

Le 22/03/2024 à 12:18, Frédéric Nass a écrit : 
> Hello Michel, 
> 
> Pierre also suggested checking the performance of this OSD's device(s) which 
> can be done by running a ceph tell osd.x bench. 
> 
> One think I can think of is how the scrubbing speed of this very OSD could be 
> influenced by mclock sheduling, would the max iops capacity calculated by 
> this OSD during its initialization be significantly lower than other OSDs's. 
> 
> What I would do is check (from this OSD's log) the calculated value for max 
> iops capacity and compare it to other OSDs. Eventually force a recalculation 
> by setting 'ceph config set osd.x osd_mclock_force_run_benchmark_on_init 
> true' and restart this OSD. 
> 
> Also I would: 
> 
> - compare running OSD's mclock values (cephadm shell ceph daemon osd.x config 
> show | grep mclock) to other OSDs's. 
> - compare ceph tell osd.x bench to other OSDs's benchmarks. 
> - compare the rotational status of this OSD's db and data devices to other 
> OSDs, to make sure things are in order. 
> 
> Bests, 
> Frédéric. 
> 
> PS: If mclock is the culprit here, then setting osd_op_queue back to mpq for 
> this only OSD would probably reveal it. Not sure about the implication of 
> having a signel OSD running a different scheduler in the cluster though. 
> 
> 
> - Le 22 Mar 24, à 10:11, Michel Jouvin michel.jou...@ijclab.in2p3.fr a 
> écrit : 
> 
>> Pierre, 
>> 
>> Yes, as mentioned in my initial email, I checked the OSD state and found 
>> nothing wrong either in the OSD logs or in the system logs (SMART errors). 
>> 
>> Thanks for the advice of increasing osd_max_scrubs, I may try it, but I 
>> doubt it is a contention problem because it really only affects a fixed 
>> set of PGs (no new PGS have a "stucked scrub") and there is a 
>> significant scrubbing activity going on continuously (~10K PGs in the 
>> cluster). 
>> 
>> Again, it is not a problem for me to try to kick out the suspect OSD and 
>> see it fixes the issue but as this cluster is pretty simple/low in terms 
>> of activity and I see nothing that may explain why we have this 
>> situation on a pretty new cluster (9 months, created in Quincy) and not 
>> on our 2 other production clusters, much more used, one of them being 
>> the backend storage of a significant OpenStack clouds, a cluster created 
>> 10 years ago with Infernetis and upgraded since then, a better candidate 
>> for this kind of problems! So, I'm happy to contribute to 
>> troubleshooting a potential issue in Reef if somebody finds it useful 
>> and can help. Else I'll try the approach that worked for Gunnar. 
>> 
>> Best regards, 
>

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Michel Jouvin
18.6   15501   0 0  0 0
 63959444676    0   0   2068  3000 2068
 active+clean+scrubbing+deep 2024-03-22T02:29:24.508889+
 81688'663900  83812:1272160 [187,29,211] 187
 [187,29,211] 187 52735'663878
 2024-03-06T16:36:32.080259+ 52735'663878
 2024-03-06T16:36:32.080259+  0 684445  deep scrubbing
 for 20373s 449    0
 16.15  0   0 0  0 0
 0    0   0  0 0 0
 active+clean 2024-03-21T18:20:29.632554+   0'0
 83812:104893    [29,165,85]  29 [29,165,85]
 29   0'0 2024-02-17T06:54:06.370647+  0'0
 2024-02-17T06:54:06.370647+  0 28  queued for deep
 scrub
 0    0
 25.45  0   0 0  0 0
 0    0   0  0  1036 0
 active+clean 2024-03-21T18:10:24.125134+ 39159'561
 83812:93649 [29,13,58]  29 [29,13,58]
 29 39159'512 2024-02-27T12:27:35.728176+ 39159'512
 2024-02-27T12:27:35.728176+  0 1  queued for deep
 scrub
 0    0
 29.249   260   0 0  0 0
 1090519040    0   0   1970   500
 1970 active+clean 2024-03-21T18:29:22.588805+
 39202'2470    83812:96016 [29,191,18,143]  29
 [29,191,18,143]  29 39202'2470
 2024-02-17T13:32:42.910335+   39202'2470
 2024-02-17T13:32:42.910335+  0 1  queued for deep
 scrub
 0    0
 29.25a   248   0 0  0 0
 1040187392    0   0   1952   600
 1952 active+clean 2024-03-21T18:20:29.623422+
 39202'2552    83812:99157 [29,200,85,164]  29
 [29,200,85,164]  29 39202'2552
 2024-02-17T08:33:14.326087+   39202'2552
 2024-02-17T08:33:14.326087+  0 1  queued for deep
 scrub
 0    0
 25.3cf 0   0 0  0 0
 0    0   0  0  1343 0
 active+clean 2024-03-21T18:16:00.933375+ 46253'598
 83812:91659    [29,75,175]  29 [29,75,175]
 29 46253'598 2024-02-17T11:48:51.840600+ 46253'598
 2024-02-17T11:48:51.840600+  0 28  queued for deep
 scrub
 0    0
 29.4ec   243   0 0  0 0
 1019215872    0   0   1933   500
 1933 active+clean 2024-03-21T18:15:35.389598+
 39202'2433   83812:101501 [29,206,63,17]  29
 [29,206,63,17]  29 39202'2433
 2024-02-17T15:10:41.027755+   39202'2433
 2024-02-17T15:10:41.027755+  0 3  queued for deep
 scrub
 0    0


 Le 22/03/2024 à 08:16, Bandelow, Gunnar a écrit :
 > Hi Michael,
 >
 > i think yesterday i found the culprit in my case.
 >
 > After inspecting "ceph pg dump" and especially the column
 > "last_scrub_duration". I found, that any PG without proper
 scrubbing
 > was located on one of three OSDs (and all these OSDs share the same
 > SSD for their DB). I put them on "out" and now after backfill and
 > remapping everything seems to be fine.
 >
 > Only the log is still flooded with "scrub starts" and i have no
 clue
     > why these OSDs are causing the problems.
 > Will investigate further.
 >
 > Best regards,
 > Gunnar
 >
 > ===
 >
 >  Gunnar Bandelow
 >  Universitätsrechenzentrum (URZ)
 >  Universität Greifswald
 >  Felix-Hausdorff-Straße 18
 >  17489 Greifswald
 >  Germany
 >
 >  Tel.: +49 3834 420 1450
 >
 >
 > --- Original Nachricht ---
 > *Betreff: *[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep
 > scrubbed for 1 month
 > *Von: *"Michel Jouvin"  <mailto:michel.jou...@ijclab.in2p3.fr>>
 > *An: *ceph-users@ceph.io <mailto:ceph-users@ceph.io>
 > *Datum: *21-03-2024 23:40
 >
 >
 >
 >     Hi,
 >
 >     Today we decided to upgrade from 18.2.0 to 18.2.2. No real
 hope of a
 >     direct impact (nothing in the change log related to something
 >     similar)
 >     but at least all daemons were restarted so we thought that
 may be
 >     this
 >     will

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Frédéric Nass
 0 0  0 0
>> 1073741824    0   0   1986   600 1986
>> active+clean+scrubbing+deep 2024-03-22T08:09:12.849868+
>> 39202'2586   83812:603625 [22,150,29,56]  22
>> [22,150,29,56]  22 39202'2586
>> 2024-03-07T18:53:22.952868+   39202'2586
>> 2024-03-07T18:53:22.952868+  0 1  queued for deep
>> scrub
>> 0    0
>> 18.6   15501   0 0  0 0
>> 63959444676    0   0   2068  3000 2068
>> active+clean+scrubbing+deep 2024-03-22T02:29:24.508889+
>> 81688'663900  83812:1272160 [187,29,211] 187
>> [187,29,211] 187 52735'663878
>> 2024-03-06T16:36:32.080259+ 52735'663878
>> 2024-03-06T16:36:32.080259+  0 684445  deep scrubbing
>> for 20373s 449    0
>> 16.15  0   0 0  0 0
>> 0    0   0  0 0 0
>> active+clean 2024-03-21T18:20:29.632554+   0'0
>> 83812:104893    [29,165,85]  29 [29,165,85]
>> 29   0'0 2024-02-17T06:54:06.370647+  0'0
>> 2024-02-17T06:54:06.370647+  0 28  queued for deep
>> scrub
>> 0    0
>> 25.45  0   0 0  0 0
>> 0    0   0  0  1036 0
>> active+clean 2024-03-21T18:10:24.125134+ 39159'561
>> 83812:93649 [29,13,58]  29 [29,13,58]
>> 29 39159'512 2024-02-27T12:27:35.728176+ 39159'512
>> 2024-02-27T12:27:35.728176+  0 1  queued for deep
>> scrub
>> 0    0
>> 29.249   260   0 0  0 0
>> 1090519040    0   0   1970   500
>> 1970 active+clean 2024-03-21T18:29:22.588805+
>> 39202'2470    83812:96016 [29,191,18,143]  29
>> [29,191,18,143]  29 39202'2470
>> 2024-02-17T13:32:42.910335+   39202'2470
>> 2024-02-17T13:32:42.910335+  0 1  queued for deep
>> scrub
>> 0    0
>> 29.25a   248   0 0  0 0
>> 1040187392    0   0   1952   600
>> 1952 active+clean 2024-03-21T18:20:29.623422+
>> 39202'2552    83812:99157 [29,200,85,164]  29
>> [29,200,85,164]  29 39202'2552
>> 2024-02-17T08:33:14.326087+   39202'2552
>> 2024-02-17T08:33:14.326087+  0 1  queued for deep
>> scrub
>> 0    0
>> 25.3cf 0   0 0  0 0
>> 0    0   0  0  1343 0
>> active+clean 2024-03-21T18:16:00.933375+ 46253'598
>> 83812:91659    [29,75,175]  29 [29,75,175]
>> 29 46253'598 2024-02-17T11:48:51.840600+ 46253'598
>> 2024-02-17T11:48:51.840600+  0 28  queued for deep
>> scrub
>> 0    0
>>     29.4ec   243           0     0      0 0
>> 1019215872    0   0   1933   500
>> 1933 active+clean 2024-03-21T18:15:35.389598+
>> 39202'2433   83812:101501 [29,206,63,17]  29
>> [29,206,63,17]  29 39202'2433
>> 2024-02-17T15:10:41.027755+   39202'2433
>> 2024-02-17T15:10:41.027755+  0 3  queued for deep
>> scrub
>> 0    0
>>
>>
>> Le 22/03/2024 à 08:16, Bandelow, Gunnar a écrit :
>> > Hi Michael,
>> >
>> > i think yesterday i found the culprit in my case.
>> >
>> > After inspecting "ceph pg dump" and especially the column
>> > "last_scrub_duration". I found, that any PG without proper
>> scrubbing
>> > was located on one of three OSDs (and all these OSDs share the same
>> > SSD for their DB). I put them on "out" and now after backfill and
>> > remapping everything seems to be fine.
>> >
>> > Only the log is still flooded with "scrub starts" and i have no
>> clue
>> > why thes

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Michel Jouvin
  0 1  queued for deep
scrub
0    0
25.3cf 0   0 0  0 0
0    0   0  0  1343 0
active+clean 2024-03-21T18:16:00.933375+ 46253'598
83812:91659    [29,75,175]  29 [29,75,175]
29 46253'598 2024-02-17T11:48:51.840600+ 46253'598
2024-02-17T11:48:51.840600+  0 28  queued for deep
scrub
0    0
29.4ec   243   0 0  0 0
1019215872    0   0   1933   500
1933 active+clean 2024-03-21T18:15:35.389598+
39202'2433   83812:101501 [29,206,63,17]  29
[29,206,63,17]  29 39202'2433
2024-02-17T15:10:41.027755+   39202'2433
2024-02-17T15:10:41.027755+  0 3  queued for deep
scrub
0    0


Le 22/03/2024 à 08:16, Bandelow, Gunnar a écrit :
> Hi Michael,
>
> i think yesterday i found the culprit in my case.
>
> After inspecting "ceph pg dump" and especially the column
> "last_scrub_duration". I found, that any PG without proper
scrubbing
> was located on one of three OSDs (and all these OSDs share the same
> SSD for their DB). I put them on "out" and now after backfill and
> remapping everything seems to be fine.
>
> Only the log is still flooded with "scrub starts" and i have no
clue
> why these OSDs are causing the problems.
    > Will investigate further.
    >
> Best regards,
    > Gunnar
>
> ===
>
>  Gunnar Bandelow
>  Universitätsrechenzentrum (URZ)
>  Universität Greifswald
>  Felix-Hausdorff-Straße 18
>  17489 Greifswald
>  Germany
>
>  Tel.: +49 3834 420 1450
>
>
> --- Original Nachricht ---
> *Betreff: *[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep
> scrubbed for 1 month
> *Von: *"Michel Jouvin"  <mailto:michel.jou...@ijclab.in2p3.fr>>
> *An: *ceph-users@ceph.io <mailto:ceph-users@ceph.io>
> *Datum: *21-03-2024 23:40
>
>
>
>     Hi,
>
>     Today we decided to upgrade from 18.2.0 to 18.2.2. No real
hope of a
>     direct impact (nothing in the change log related to something
>     similar)
>     but at least all daemons were restarted so we thought that
may be
>     this
>     will clear the problem at least temporarily. Unfortunately
it has not
>     been the case. The same pages are still stuck, despite
continuous
>     activity of scrubbing/deep scrubbing in the cluster...
>
>     I'm happy to provide more information if somebody tells me
what to
>     look
>     at...
>
>     Cheers,
>
>     Michel
>
>     Le 21/03/2024 à 14:40, Bernhard Krieger a écrit :
>     > Hi,
>     >
>     > i have the same issues.
>     > Deep scrub havent finished the jobs on some PGs.
>     >
>     > Using ceph 18.2.2.
>     > Initial installed version was 18.0.0
>     >
>     >
>     > In the logs i see a lot of scrub/deep-scrub starts
>     >
>     > Mar 21 14:21:09 ceph-node10 ceph-osd[3804193]:
log_channel(cluster)
>     > log [DBG] : 13.b deep-scrubstarts
>     > Mar 21 14:21:10 ceph-node10 ceph-osd[3804193]:
log_channel(cluster)
>     > log [DBG] : 13.1a deep-scrubstarts
>     > Mar 21 14:21:17 ceph-node10 ceph-osd[3804193]:
log_channel(cluster)
>     > log [DBG] : 13.1c deep-scrubstarts
>     > Mar 21 14:21:19 ceph-node10 ceph-osd[3804193]:
log_channel(cluster)
>     > log [DBG] : 11.1 scrubstarts
>     > Mar 21 14:21:27 ceph-node10 ceph-osd[3804193]:
log_channel(cluster)
>     > log [DBG] : 14.6 scrubstarts
>     > Mar 21 14:21:30 ceph-node10 ceph-osd[3804193]:
log_channel(cluster)
>     > log [DBG] : 10.c deep-scrubstarts
>     > Mar 21 14:21:35 ceph-node10 ceph-osd[3804193]:
log_channel(cluster)
>     > log [DBG] : 12.3 deep-scrubstarts
>     > Mar 21 14:21:41 ceph-node10 ceph-osd[3804193]:
log_channel(cluster)
>     > log [DBG] : 6.0 scrubstarts
>     > Mar 21 14:21:44 ceph-node10 ceph-osd[3804193]:
log_channel(cluster)
>     > log [DBG] : 8.5 deep-scrubstarts
>     > Mar 21 14:21:45 ceph-node10 ceph-osd[3804193]:
log_channel(cluster)
>     > log [DBG] : 5.66 deep-scrubstarts

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Pierre Riteau
 think yesterday i found the culprit in my case.
> >
> > After inspecting "ceph pg dump" and especially the column
> > "last_scrub_duration". I found, that any PG without proper scrubbing
> > was located on one of three OSDs (and all these OSDs share the same
> > SSD for their DB). I put them on "out" and now after backfill and
> > remapping everything seems to be fine.
> >
> > Only the log is still flooded with "scrub starts" and i have no clue
> > why these OSDs are causing the problems.
> > Will investigate further.
> >
> > Best regards,
> > Gunnar
> >
> > ===
> >
> >  Gunnar Bandelow
> >  Universitätsrechenzentrum (URZ)
> >  Universität Greifswald
> >  Felix-Hausdorff-Straße 18
> >  17489 Greifswald
> >  Germany
> >
> >  Tel.: +49 3834 420 1450
> >
> >
> > --- Original Nachricht ---
> > *Betreff: *[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep
> > scrubbed for 1 month
> > *Von: *"Michel Jouvin"  > <mailto:michel.jou...@ijclab.in2p3.fr>>
> > *An: *ceph-users@ceph.io <mailto:ceph-users@ceph.io>
> > *Datum: *21-03-2024 23:40
> >
> >
> >
> > Hi,
> >
> > Today we decided to upgrade from 18.2.0 to 18.2.2. No real hope of a
> > direct impact (nothing in the change log related to something
> > similar)
> > but at least all daemons were restarted so we thought that may be
> > this
> > will clear the problem at least temporarily. Unfortunately it has not
> > been the case. The same pages are still stuck, despite continuous
> > activity of scrubbing/deep scrubbing in the cluster...
> >
> > I'm happy to provide more information if somebody tells me what to
> > look
> > at...
> >
> > Cheers,
> >
> > Michel
> >
> > Le 21/03/2024 à 14:40, Bernhard Krieger a écrit :
> > > Hi,
> > >
> > > i have the same issues.
> > > Deep scrub havent finished the jobs on some PGs.
> > >
> > > Using ceph 18.2.2.
> > > Initial installed version was 18.0.0
> > >
> > >
> > > In the logs i see a lot of scrub/deep-scrub starts
> > >
> > > Mar 21 14:21:09 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 13.b deep-scrubstarts
> > > Mar 21 14:21:10 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 13.1a deep-scrubstarts
> > > Mar 21 14:21:17 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 13.1c deep-scrubstarts
> > > Mar 21 14:21:19 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 11.1 scrubstarts
> > > Mar 21 14:21:27 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 14.6 scrubstarts
> > > Mar 21 14:21:30 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 10.c deep-scrubstarts
> > > Mar 21 14:21:35 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 12.3 deep-scrubstarts
> > > Mar 21 14:21:41 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 6.0 scrubstarts
> > > Mar 21 14:21:44 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 8.5 deep-scrubstarts
> > > Mar 21 14:21:45 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 5.66 deep-scrubstarts
> > > Mar 21 14:21:49 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 5.30 deep-scrubstarts
> > > Mar 21 14:21:50 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 13.b deep-scrubstarts
> > > Mar 21 14:21:52 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 13.1a deep-scrubstarts
> > > Mar 21 14:21:54 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 13.1c deep-scrubstarts
> > > Mar 21 14:21:55 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 11.1 scrubstarts
> > > Mar 21 14:21:58 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 14.6 scrubstarts
> > > Mar 21 14:22:01 ceph-node10 ceph-osd[3804193]: log_channel(cluster)
> > > log [DBG] : 10.c deep-scrubstarts
> > > Mar 21 14:22:04 ceph-node10 ceph-osd[3804193]: log_ch

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Michel Jouvin
.881545+    46253'545 
2024-03-07T11:12:45.881545+  0 28  queued for deep scrub 
0    0
25.55a 0   0 0  0 0    
0    0   0  0  1022 0 
active+clean 2024-03-21T18:10:24.124914+ 46253'565 
83812:89876    [29,58,195]  29 [29,58,195]  
29 46253'561 2024-02-17T06:54:35.320454+    46253'561 
2024-02-17T06:54:35.320454+  0 28  queued for deep scrub 
0    0
29.c0    256   0 0  0 0   
1073741824    0   0   1986   600 1986  
active+clean+scrubbing+deep 2024-03-22T08:09:12.849868+    
39202'2586   83812:603625 [22,150,29,56]  22 
[22,150,29,56]  22 39202'2586  
2024-03-07T18:53:22.952868+   39202'2586 
2024-03-07T18:53:22.952868+  0 1  queued for deep scrub 
0    0
18.6   15501   0 0  0 0  
63959444676    0   0   2068  3000 2068  
active+clean+scrubbing+deep 2024-03-22T02:29:24.508889+  
81688'663900  83812:1272160 [187,29,211] 187   
[187,29,211] 187 52735'663878  
2024-03-06T16:36:32.080259+ 52735'663878 
2024-03-06T16:36:32.080259+  0 684445  deep scrubbing 
for 20373s 449    0
16.15  0   0 0  0 0    
0    0   0  0 0 0 
active+clean 2024-03-21T18:20:29.632554+   0'0 
83812:104893    [29,165,85]  29 [29,165,85]  
29   0'0 2024-02-17T06:54:06.370647+  0'0 
2024-02-17T06:54:06.370647+  0 28  queued for deep scrub 
0    0
25.45  0   0 0  0 0    
0    0   0  0  1036 0 
active+clean 2024-03-21T18:10:24.125134+ 39159'561 
83812:93649 [29,13,58]  29 [29,13,58]  
29 39159'512 2024-02-27T12:27:35.728176+    39159'512 
2024-02-27T12:27:35.728176+  0 1  queued for deep scrub 
0    0
29.249   260   0 0  0 0   
1090519040    0   0   1970   500 
1970 active+clean 2024-03-21T18:29:22.588805+    
39202'2470    83812:96016 [29,191,18,143]  29    
[29,191,18,143]  29 39202'2470  
2024-02-17T13:32:42.910335+   39202'2470 
2024-02-17T13:32:42.910335+  0 1  queued for deep scrub 
0    0
29.25a   248   0 0  0 0   
1040187392    0   0   1952   600 
1952 active+clean 2024-03-21T18:20:29.623422+    
39202'2552    83812:99157 [29,200,85,164]  29    
[29,200,85,164]  29 39202'2552  
2024-02-17T08:33:14.326087+   39202'2552 
2024-02-17T08:33:14.326087+  0 1  queued for deep scrub 
0    0
25.3cf 0   0 0  0 0    
0    0   0  0  1343 0 
active+clean 2024-03-21T18:16:00.933375+ 46253'598 
83812:91659    [29,75,175]  29 [29,75,175]  
29 46253'598 2024-02-17T11:48:51.840600+    46253'598 
2024-02-17T11:48:51.840600+  0 28  queued for deep scrub 
0    0
29.4ec   243   0 0  0 0   
1019215872    0   0   1933   500 
1933 active+clean 2024-03-21T18:15:35.389598+    
39202'2433   83812:101501 [29,206,63,17]  29 
[29,206,63,17]  29 39202'2433  
2024-02-17T15:10:41.027755+   39202'2433 
2024-02-17T15:10:41.027755+  0 3  queued for deep scrub 
0    0



Le 22/03/2024 à 08:16, Bandelow, Gunnar a écrit :

Hi Michael,

i think yesterday i found the culprit in my case.

After inspecting "ceph pg dump" and especially the column 
"last_scrub_duration". I found, that any PG without proper scrubbing 
was located on one of three OSDs (and all these OSDs share the same 
SSD for their DB). I put them on "out" and now after backfill and 
remapping everything seems to be fine.


Only the log is still flooded with "scrub starts" and i have no clue 
why these OSDs are causing the problems.

Will investigate further.

Best regards,
Gunnar

===

 Gunnar Bandelow
 Universitätsrechenzentrum (URZ)
 Universität Greifswald
 Felix-Hausdorff-Straße 18
 17489 Greifswald
 Germany

 Tel.: +49 3834 420 1450


--- Original Nachricht ---
*Betreff: *[ceph-user

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Bandelow, Gunnar
Hi Michael,

i think yesterday i found the culprit in my case.

After inspecting "ceph pg dump" and especially the column
"last_scrub_duration". I found, that any PG without proper scrubbing
was located on one of three OSDs (and all these OSDs share the same
SSD for their DB). I put them on "out" and now after backfill and
remapping everything seems to be fine. 


Only the log is still flooded with "scrub starts" and i have no clue
why these OSDs are causing the problems.
Will investigate further.


Best regards,
Gunnar

===


 Gunnar Bandelow
 Universitätsrechenzentrum (URZ)
 Universität Greifswald
 Felix-Hausdorff-Straße 18
 17489 Greifswald
 Germany


 Tel.: +49 3834 420 1450

--- Original Nachricht ---
Betreff: [ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep
scrubbed for 1 month
Von: "Michel Jouvin" 
An: ceph-users@ceph.io
Datum: 21-03-2024 23:40






Hi,

Today we decided to upgrade from 18.2.0 to 18.2.2. No real hope of a 
direct impact (nothing in the change log related to something similar)

but at least all daemons were restarted so we thought that may be this

will clear the problem at least temporarily. Unfortunately it has not 
been the case. The same pages are still stuck, despite continuous 
activity of scrubbing/deep scrubbing in the cluster...

I'm happy to provide more information if somebody tells me what to
look 
at...

Cheers,

Michel

Le 21/03/2024 à 14:40, Bernhard Krieger a écrit :
> Hi,
>
> i have the same issues.
> Deep scrub havent finished the jobs on some PGs.
>
> Using ceph 18.2.2.
> Initial installed version was 18.0.0
>
>
> In the logs i see a lot of scrub/deep-scrub starts
>
> Mar 21 14:21:09 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 13.b deep-scrubstarts
> Mar 21 14:21:10 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 13.1a deep-scrubstarts
> Mar 21 14:21:17 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 13.1c deep-scrubstarts
> Mar 21 14:21:19 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 11.1 scrubstarts
> Mar 21 14:21:27 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 14.6 scrubstarts
> Mar 21 14:21:30 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 10.c deep-scrubstarts
> Mar 21 14:21:35 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 12.3 deep-scrubstarts
> Mar 21 14:21:41 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 6.0 scrubstarts
> Mar 21 14:21:44 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 8.5 deep-scrubstarts
> Mar 21 14:21:45 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 5.66 deep-scrubstarts
> Mar 21 14:21:49 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 5.30 deep-scrubstarts
> Mar 21 14:21:50 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 13.b deep-scrubstarts
> Mar 21 14:21:52 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 13.1a deep-scrubstarts
> Mar 21 14:21:54 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 13.1c deep-scrubstarts
> Mar 21 14:21:55 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 11.1 scrubstarts
> Mar 21 14:21:58 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 14.6 scrubstarts
> Mar 21 14:22:01 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 10.c deep-scrubstarts
> Mar 21 14:22:04 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 12.3 scrubstarts
> Mar 21 14:22:13 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 6.0 scrubstarts
> Mar 21 14:22:15 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 8.5 deep-scrubstarts
> Mar 21 14:22:20 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 5.66 deep-scrubstarts
> Mar 21 14:22:27 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 5.30 scrubstarts
> Mar 21 14:22:30 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 13.b deep-scrubstarts
> Mar 21 14:22:32 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 13.1a deep-scrubstarts
> Mar 21 14:22:33 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 13.1c deep-scrubstarts
> Mar 21 14:22:35 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 11.1 deep-scrubstarts
> Mar 21 14:22:37 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 14.6 scrubstarts
> Mar 21 14:22:38 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 10.c scrubstarts
> Mar 21 14:22:39 ceph-node10 ceph-osd[3804193]: log_channel(cluster) 
> log [DBG] : 12.3 scrubstarts
> Mar 2

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-21 Thread Michel Jouvin
   53 active+clean+scrubbing+deep
28 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs: 207 active+clean
54 active+clean+scrubbing+deep
28 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs: 202 active+clean
56 active+clean+scrubbing+deep
31 active+clean+scrubbing
[root@ceph-node10 ~]# ceph -s | grep active+clean
   pgs: 213 active+clean
45 active+clean+scrubbing+deep
31 active+clean+scrubbing

ceph pg dump showing PGs which are not deep scrubbed since january.
Some PGs deep scrubbing  over 70 seconds.

*[ceph: root@ceph-node10 /]#  ceph pg dump pgs | grep -e 'scrubbing f'
5.6e  221223   0 0  0    0 
 927795290112    0   0  4073  3000  4073 
 active+clean+scrubbing+deep  2024-03-20T01:07:21.196293+
  128383'15766927  128383:20517419   [2,4,18,16,14,21]   2 
  [2,4,18,16,14,21]   2  125519'12328877 
 2024-01-23T11:25:35.503811+  124844'11873951  2024-01-21T22:
24:12.620693+  0    5  deep scrubbing 
for 270790s 53772 
   0
5.6c  221317   0 0  0    0 
 928173256704    0   0  6332 0  6332 
 active+clean+scrubbing+deep  2024-03-18T09:29:29.233084+
  128382'15788196  128383:20727318 [6,9,12,14,1,4]   6 
[6,9,12,14,1,4]   6  127180'14709746 
 2024-03-06T12:47:57.741921+  124817'11821502  2024-01-20T20:
59:40.566384+  0    13452  deep scrubbing 
for 273519s    122803 
   0
5.6a  221325   0 0  0    0 
 928184565760    0   0  4649  3000  4649 
 active+clean+scrubbing+deep  2024-03-13T03:48:54.065125+
  128382'16031499  128383:21221685 [13,11,1,2,9,8]  13 
[13,11,1,2,9,8]  13  127181'14915404 
 2024-03-06T13:16:58.635982+  125967'12517899  2024-01-28T09:
13:08.276930+  0    10078  deep scrubbing 
for 726001s    184819 
   0
5.54  221050   0 0  0    0 
 927036203008    0   0  4864  3000  4864 
 active+clean+scrubbing+deep  2024-03-18T00:17:48.086231+
  128383'15584012  128383:20293678  [0,20,18,19,11,12]   0 
 [0,20,18,19,11,12]   0  127195'14651908 
 2024-03-07T09:22:31.078448+  124816'11813857  2024-01-20T16:
43:15.755200+  0 9808  deep scrubbing 
for 306667s    142126 
   0
5.47  220849   0 0  0    0 
 926233448448    0   0  5592 0  5592 
 active+clean+scrubbing+deep  2024-03-12T08:10:39.413186+
  128382'15653864  128383:20403071  [16,15,20,0,13,21]  16 
 [16,15,20,0,13,21]  16  127183'14600433 
 2024-03-06T18:21:03.057165+  124809'11792397  2024-01-20T05:
27:07.617799+  0    13066  deep scrubbing 
for 796697s    209193 
   0

dumped pgs


*


regards
Bernhard






On 20/03/2024 21:12, Bandelow, Gunnar wrote:

Hi,

i just wanted to mention, that i am running a cluster with reef 
18.2.1 with the same issue.


4 PGs start to deepscrub but dont finish since mid february. In the 
pg dump they are shown as scheduled for deep scrub. They sometimes 
change their status from active+clean to active+clean+scrubbing+deep 
and back.


Best regards,
Gunnar

===

Gunnar Bandelow
Universitätsrechenzentrum (URZ)
Universität Greifswald
Felix-Hausdorff-Straße 18
17489 Greifswald
Germany

Tel.: +49 3834 420 1450




--- Original Nachricht ---
*Betreff: *[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep 
scrubbed for 1 month
*Von: *"Michel Jouvin" <mailto:michel.jou...@ijclab.in2p3.fr>>

*An: *ceph-users@ceph.io <mailto:ceph-users@ceph.io>
*Datum: *20-03-2024 20:00



    Hi Rafael,

    Good to know I am not alone!

    Additional information ~6h after the OSD restart: over the 20 PGs
    impacted, 2 have been processed successfully... I don't have a clear
    picture on how Ceph prioritize the scrub of one PG over another, I
    had
    thought that the oldest/expired scrubs are taken first but it may
    not be
    the case. Anyway, I have seen a very significant decrese of the 
scrub

    activity this afternoon and the cluster is not loaded at all
    (almost no
    users yet)...

    Michel

    Le 20/03/2024 à 17:55, quag

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-21 Thread Bernhard Krieger
 showing PGs which are not deep scrubbed since january.
Some PGs deep scrubbing  over 70 seconds.

*[ceph: root@ceph-node10 /]#  ceph pg dump pgs | grep -e 'scrubbing f'
5.6e  221223   0 0  0    0 
 927795290112    0   0  4073  3000  4073 
 active+clean+scrubbing+deep  2024-03-20T01:07:21.196293+
  128383'15766927  128383:20517419   [2,4,18,16,14,21]   2 
  [2,4,18,16,14,21]   2  125519'12328877 
 2024-01-23T11:25:35.503811+  124844'11873951  2024-01-21T22:
24:12.620693+  0    5  deep scrubbing 
for 270790s 53772 
   0
5.6c  221317   0 0  0    0 
 928173256704    0   0  6332 0  6332 
 active+clean+scrubbing+deep  2024-03-18T09:29:29.233084+
  128382'15788196  128383:20727318 [6,9,12,14,1,4]   6 
[6,9,12,14,1,4]   6  127180'14709746 
 2024-03-06T12:47:57.741921+  124817'11821502  2024-01-20T20:
59:40.566384+  0    13452  deep scrubbing 
for 273519s    122803 
   0
5.6a  221325   0 0  0    0 
 928184565760    0   0  4649  3000  4649 
 active+clean+scrubbing+deep  2024-03-13T03:48:54.065125+
  128382'16031499  128383:21221685 [13,11,1,2,9,8]  13 
[13,11,1,2,9,8]  13  127181'14915404 
 2024-03-06T13:16:58.635982+  125967'12517899  2024-01-28T09:
13:08.276930+  0    10078  deep scrubbing 
for 726001s    184819 
   0
5.54  221050   0 0  0    0 
 927036203008    0   0  4864  3000  4864 
 active+clean+scrubbing+deep  2024-03-18T00:17:48.086231+
  128383'15584012  128383:20293678  [0,20,18,19,11,12]   0 
 [0,20,18,19,11,12]   0  127195'14651908 
 2024-03-07T09:22:31.078448+  124816'11813857  2024-01-20T16:
43:15.755200+  0 9808  deep scrubbing 
for 306667s    142126 
   0
5.47  220849   0 0  0    0 
 926233448448    0   0  5592 0  5592 
 active+clean+scrubbing+deep  2024-03-12T08:10:39.413186+
  128382'15653864  128383:20403071  [16,15,20,0,13,21]  16 
 [16,15,20,0,13,21]  16  127183'14600433 
 2024-03-06T18:21:03.057165+  124809'11792397  2024-01-20T05:
27:07.617799+  0    13066  deep scrubbing 
for 796697s    209193 
   0

dumped pgs


*


regards
Bernhard






On 20/03/2024 21:12, Bandelow, Gunnar wrote:

Hi,

i just wanted to mention, that i am running a cluster with reef 18.2.1 
with the same issue.


4 PGs start to deepscrub but dont finish since mid february. In the pg 
dump they are shown as scheduled for deep scrub. They sometimes change 
their status from active+clean to active+clean+scrubbing+deep and back.


Best regards,
Gunnar

===

Gunnar Bandelow
Universitätsrechenzentrum (URZ)
Universität Greifswald
Felix-Hausdorff-Straße 18
17489 Greifswald
Germany

Tel.: +49 3834 420 1450




--- Original Nachricht ---
*Betreff: *[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep 
scrubbed for 1 month
*Von: *"Michel Jouvin" <mailto:michel.jou...@ijclab.in2p3.fr>>

*An: *ceph-users@ceph.io <mailto:ceph-users@ceph.io>
*Datum: *20-03-2024 20:00



Hi Rafael,

Good to know I am not alone!

Additional information ~6h after the OSD restart: over the 20 PGs
impacted, 2 have been processed successfully... I don't have a clear
picture on how Ceph prioritize the scrub of one PG over another, I
had
thought that the oldest/expired scrubs are taken first but it may
not be
the case. Anyway, I have seen a very significant decrese of the scrub
activity this afternoon and the cluster is not loaded at all
(almost no
users yet)...

Michel

Le 20/03/2024 à 17:55, quag...@bol.com.br
<mailto:quag...@bol.com.br> a écrit :
> Hi,
>      I upgraded a cluster 2 weeks ago here. The situation is the
same
> as Michel.
>      A lot of PGs no scrubbed/deep-scrubed.
>
> Rafael.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
<mailto:ceph-users@ceph.io>
> To unsubscribe send an email to ceph-users-le...@ceph.io
<mailto:ceph-users-le...@ceph.io>

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread Bandelow, Gunnar
Hi,

i just wanted to mention, that i am running a cluster with reef 18.2.1
with the same issue.

4 PGs start to deepscrub but dont finish since mid february. In the pg
dump they are shown as scheduled for deep scrub. They sometimes change
their status from active+clean to active+clean+scrubbing+deep and
back.


Best regards,
Gunnar 


===


Gunnar Bandelow

Universitätsrechenzentrum (URZ)
Universität Greifswald

Felix-Hausdorff-Straße 18
17489 GreifswaldGermany

Tel.: +49 3834 420 1450  



--- Original Nachricht ---
Betreff: [ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep
scrubbed for 1 month
Von: "Michel Jouvin" 
An: ceph-users@ceph.io
Datum: 20-03-2024 20:00






Hi Rafael,

Good to know I am not alone!

Additional information ~6h after the OSD restart: over the 20 PGs 
impacted, 2 have been processed successfully... I don't have a clear 
picture on how Ceph prioritize the scrub of one PG over another, I had

thought that the oldest/expired scrubs are taken first but it may not
be 
the case. Anyway, I have seen a very significant decrese of the scrub 
activity this afternoon and the cluster is not loaded at all (almost
no 
users yet)...

Michel

Le 20/03/2024 à 17:55, quag...@bol.com.br a écrit :
> Hi,
>      I upgraded a cluster 2 weeks ago here. The situation is the
same 
> as Michel.
>      A lot of PGs no scrubbed/deep-scrubed.
>
> Rafael.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread Michel Jouvin

Hi Rafael,

Good to know I am not alone!

Additional information ~6h after the OSD restart: over the 20 PGs 
impacted, 2 have been processed successfully... I don't have a clear 
picture on how Ceph prioritize the scrub of one PG over another, I had 
thought that the oldest/expired scrubs are taken first but it may not be 
the case. Anyway, I have seen a very significant decrese of the scrub 
activity this afternoon and the cluster is not loaded at all (almost no 
users yet)...


Michel

Le 20/03/2024 à 17:55, quag...@bol.com.br a écrit :

Hi,
     I upgraded a cluster 2 weeks ago here. The situation is the same 
as Michel.

     A lot of PGs no scrubbed/deep-scrubed.

Rafael.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread quag...@bol.com.br
Hi,
     I upgraded a cluster 2 weeks ago here. The situation is the same as Michel.
     A lot of PGs no scrubbed/deep-scrubed.

Rafael.___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-20 Thread Anthony D'Atri
Suggest issuing an explicit deep scrub against one of the subject PGs, see if 
it takes.

> On Mar 20, 2024, at 8:20 AM, Michel Jouvin  
> wrote:
> 
> Hi,
> 
> We have a Reef cluster that started to complain a couple of weeks ago about 
> ~20 PGs (over 10K) not scrubbed/deep-scrubbed in time. Looking at it since a 
> few days, I saw this affect only those PGs that could not be scrubbed since 
> mid-February. Old the other PGs are regularly scrubbed.
> 
> I decided to look if one OSD was present in all these PGs and found one! I 
> restarted this OSD but it had no effect. Looking at the logs for the suspect 
> OSD, I found nothing related to abnormal behaviour (but the log is very 
> verbose at restart time so easy to miss something...). And there is no error 
> associated with the OSD disk.
> 
> Any advice about where to look for some useful information would be 
> appreciated! Should I try to destroy the OSD and readd it? I'll be more 
> confortable if I was able to find some diagnostics before...
> 
> Best regards,
> 
> Michel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io