Are you currently doing IO on the relevant pool? Maybe nearfull isn't reported until some pgstats are reported.
Otherwise sorry I haven't seen this. Dan On Wed, Apr 21, 2021, 8:05 PM Konstantin Shalygin <k0...@k0ste.ru> wrote: > Hi, > > On the adopted cluster Prometheus was triggered for "osd full > 90%" > But Ceph itself - not. Actually OSD is drained (see %USE). > > root@host# ceph osd df name osd.696 > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL > %USE VAR PGS STATUS > 696 nvme 0.91199 1.00000 912 GiB 830 GiB 684 GiB 8 KiB 146 GiB 81 GiB > 91.09 1.00 47 up > TOTAL 912 GiB 830 GiB 684 GiB 8.1 KiB 146 GiB 81 GiB > 91.09 > MIN/MAX VAR: 1.00/1.00 STDDEV: 0 > root@host# ceph osd df name osd.696 > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL > %USE VAR PGS STATUS > 696 nvme 0.91199 1.00000 912 GiB 830 GiB 684 GiB 8 KiB 146 GiB 81 GiB > 91.08 1.00 47 up > TOTAL 912 GiB 830 GiB 684 GiB 8.1 KiB 146 GiB 81 GiB > 91.08 > MIN/MAX VAR: 1.00/1.00 STDDEV: 0 > root@host# ceph osd df name osd.696 > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL > %USE VAR PGS STATUS > 696 nvme 0.91199 1.00000 912 GiB 830 GiB 684 GiB 8 KiB 146 GiB 81 GiB > 91.07 1.00 47 up > TOTAL 912 GiB 830 GiB 684 GiB 8.1 KiB 146 GiB 81 GiB > 91.07 > MIN/MAX VAR: 1.00/1.00 STDDEV: 0 > > Pool 18 is another class pool, OSD's of this pool triggered as usual, but > for pool 17 - don't. > > root@host# ceph health detail > HEALTH_WARN noout flag(s) set; Some pool(s) have the nodeep-scrub flag(s) > set; Low space hindering backfill (add storage if this doesn't resolve > itself): 2 pgs backfill_toofull > OSDMAP_FLAGS noout flag(s) set > POOL_SCRUB_FLAGS Some pool(s) have the nodeep-scrub flag(s) set > Pool meta_ru1b has nodeep-scrub flag > Pool data_ru1b has nodeep-scrub flag > PG_BACKFILL_FULL Low space hindering backfill (add storage if this doesn't > resolve itself): 2 pgs backfill_toofull > pg 18.1008 is active+remapped+backfill_wait+backfill_toofull, acting > [336,462,580] > pg 18.27e0 is active+remapped+backfill_wait+backfill_toofull, acting > [401,627,210] > > > On my experience , Ceph triggers when OSD drain on backfillfull_ratio, > then on nearfull_ratio until > usage will drops to 84.99% > I don't think is to possible to configure silence for this > > Current usage: > > root@host# ceph df detail > RAW STORAGE: > CLASS SIZE AVAIL USED RAW USED %RAW USED > hdd 4.3 PiB 1022 TiB 3.3 PiB 3.3 PiB 76.71 > nvme 161 TiB 61 TiB 82 TiB 100 TiB 62.30 > TOTAL 4.4 PiB 1.1 PiB 3.4 PiB 3.4 PiB 76.20 > > POOLS: > POOL ID PGS STORED OBJECTS USED > %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED > COMPR UNDER COMPR > meta_ru1b 17 2048 3.1 TiB 7.15G 82 TiB > 92.77 2.1 TiB N/A N/A 7.15G > 0 B 0 B > data_ru1b 18 16384 1.1 PiB 3.07G 3.3 PiB > 88.29 148 TiB N/A N/A 3.07G > 0 B 0 B > > > Current OSD dump header: > > epoch 270540 > fsid ccf2c233-4adf-423c-b734-236220096d4e > created 2019-02-14 15:30:56.642918 > modified 2021-04-21 20:33:54.481616 > flags noout,sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit > crush_version 7255 > full_ratio 0.95 > backfillfull_ratio 0.9 > nearfull_ratio 0.85 > require_min_compat_client jewel > min_compat_client jewel > require_osd_release nautilus > pool 17 'meta_ru1b' replicated size 3 min_size 2 crush_rule 1 object_hash > rjenkins pg_num 2048 pgp_num 2048 autoscale_mode warn last_change 240836 > lfor 0/0/51990 flags hashpspool,nodeep-scrub stripe_width 0 application > metadata > pool 18 'data_ru1b' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 16384 pgp_num 16384 autoscale_mode warn last_change 270529 > lfor 0/0/52038 flags hashpspool,nodeep-scrub stripe_width 0 application data > max_osd 780 > > > Current versions: > > { > "mon": { > "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11) > nautilus (stable)": 3 > }, > "mgr": { > "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11) > nautilus (stable)": 3 > }, > "osd": { > "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11) > nautilus (stable)": 780 > }, > "mds": {}, > "overall": { > "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11) > nautilus (stable)": 786 > } > } > > > > Dan, maybe there was something like that in your memory? My guess is that > some counter type overflowed > > > > Thanks, > k > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io