Look closely at your output. The PGs with 0 objects. Are only “every other” due 
to how the command happened to order the output.

Note that the empty PGs all have IDs matching “3.*”. The numeric prefix of a PG 
ID reflects the cardinal ID of the pool to which it belongs.   I strongly 
suspect that you have a pool with no data.



>> Strangely, ceph pg dump gives shows every other PG with 0 objects.  An 
>> attempt to perform a deep scrub (or scrub) on one of these PGs does nothing. 
>>   The cluster appears to be running fine, but obviously there’s an issue.   
>> What should my next steps be to troubleshoot ?
>>> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES        
>>> OMAP_BYTES* OMAP_KEYS* LOG  DISK_LOG STATE                       
>>> STATE_STAMP                VERSION       REPORTED       UP            
>>> UP_PRIMARY ACTING        ACTING_PRIMARY LAST_SCRUB    SCRUB_STAMP           
>>>      LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           SNAPTRIMQ_LEN
>>> 3.e9b         0                  0        0         0       0            0  
>>>          0          0    0        0                active+clean 2022-12-31 
>>> 22:49:07.629579           0'0    23686:19820       [28,79]         28       
>>> [28,79]             28           0'0 2022-12-31 22:49:07.629508             
>>> 0'0 2022-12-31 22:49:07.629508             0
>>> 1.e99     60594                  0        0         0       0 177433523272  
>>>          0          0 3046     3046                active+clean 2022-12-21 
>>> 14:35:08.175858  23686'268137  23686:1732399     [178,115]        178     
>>> [178,115]            178  23675'267613 2022-12-21 11:01:10.403525    
>>> 23675'267613 2022-12-21 11:01:10.403525             0
>>> 3.e9a         0                  0        0         0       0            0  
>>>          0          0    0        0                active+clean 2022-12-31 
>>> 09:16:48.644619           0'0    23686:22855      [51,140]         51      
>>> [51,140]             51           0'0 2022-12-31 09:16:48.644568            
>>>  0'0 2022-12-30 02:35:23.367344             0
>>> 1.e98     59962                  0        0         0       0 177218669411  
>>>          0          0 3035     3035                active+clean 2022-12-28 
>>> 14:14:49.908560  23686'265576  23686:1357499       [92,86]         92       
>>> [92,86]             92  23686'265445 2022-12-28 14:14:49.908522    
>>> 23686'265445 2022-12-28 14:14:49.908522             0
>>> 3.e95         0                  0        0         0       0            0  
>>>          0          0    0        0                active+clean 2022-12-31 
>>> 06:09:39.442932           0'0    23686:22757       [48,83]         48       
>>> [48,83]             48           0'0 2022-12-31 06:09:39.442879             
>>> 0'0 2022-12-18 09:33:47.892142             0


As to your PGs not scrubbed in time, what sort of hardware are your OSDs?  Here 
are some thoughts, especially if they’re HDDs.

* If you don’t need that empty pool, delete it, then evaluate how many PGs on 
average your OSDs  hold (eg. `ceph osd df`).  If you have an unusually high 
number of PGs per, maybe just maybe you’re running afoul of 
osd_scrub_extended_sleep / osd_scrub_sleep .  In other words, individual scrubs 
on empty PGs may naturally be very fast, but they may be DoSing because of the 
efforts Ceph makes to spread out the impact of scrubs.

* Do you limit scrubs to certain times via osd_scrub_begin_hour, 
osd_scrub_end_hour, osd_scrub_begin_week_day, osd_scrub_end_week_day?  I’ve 
seen operators who constraint scrubs to only a few overnight / weekend hours, 
but doing so can hobble Ceph’s ability to get through them all in time.

* Similarly, a value of osd_scrub_load_threshold that’s too low can also result 
in starvation.  The load average statistic can be misleading on modern SMP 
systems with lots of cores.  I’ve witnessed 32c/64t OSD nodes report a load 
average of like 40, but with tools like htop one could see that they were 
barely breaking a sweat.

* If you have osd_scrub_during_recovery disabled and experience a lot of 
backfill / recovery / rebalance traffic, that can starve scrubs too.  IMHO with 
recent releases this should almost always be enabled, ymmv.

* Back when I ran busy (read: underspend) HDD clusters I had to bump 
osd_deep_scrub_interval by a factor of 4x due to how slow and seek-bound the 
LFF spinners were.  Of course, the longer one spaces out scrubs, the less 
effective they are at detecting problems before they’re impactful.




_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to