Are you sure your not being hit by:
ceph config set osd bluestore_fsck_quick_fix_on_mount false @ https://docs.ceph.com/docs/master/releases/octopus/ Have all your OSD's successfully completed the fsck? Reasons I say that is I can see "20 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats" ---- On Thu, 09 Apr 2020 02:15:02 +0800 Jack <mailto:c...@jack.fr.eu.org> wrote ---- Just to confirm this does not get better: root@backup1:~# ceph status cluster: id: 9cd41f0f-936d-4b59-8e5d-9b679dae9140 health: HEALTH_WARN 20 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats 4/50952060 objects unfound (0.000%) nobackfill,norecover,noscrub,nodeep-scrub flag(s) set 1 osds down 3 nearfull osd(s) Reduced data availability: 826 pgs inactive, 616 pgs down, 185 pgs peering, 158 pgs stale Low space hindering backfill (add storage if this doesn't resolve itself): 93 pgs backfill_toofull Degraded data redundancy: 13285415/101904120 objects degraded (13.037%), 706 pgs degraded, 696 pgs undersized 989 pgs not deep-scrubbed in time 378 pgs not scrubbed in time 10 pool(s) nearfull 2216 slow ops, oldest one blocked for 13905 sec, daemons [osd.1,osd.11,osd.20,osd.24,osd.25,osd.29,osd.31,osd.37,osd.4,osd.5]... have slow ops. services: mon: 1 daemons, quorum backup1 (age 8d) mgr: backup1(active, since 8d) osd: 37 osds: 26 up (since 9m), 27 in (since 2h); 626 remapped pgs flags nobackfill,norecover,noscrub,nodeep-scrub rgw: 1 daemon active (backup1.odiso.net) task status: data: pools: 10 pools, 2785 pgs objects: 50.95M objects, 92 TiB usage: 121 TiB used, 39 TiB / 160 TiB avail pgs: 29.659% pgs not active 13285415/101904120 objects degraded (13.037%) 433992/101904120 objects misplaced (0.426%) 4/50952060 objects unfound (0.000%) 840 active+clean+snaptrim_wait 536 down 490 active+undersized+degraded+remapped+backfilling 326 active+clean 113 peering 88 active+undersized+degraded 83 active+undersized+degraded+remapped+backfill_toofull 79 stale+down 63 stale+peering 51 active+clean+snaptrim 24 activating 22 active+recovering+degraded 19 active+remapped+backfilling 13 stale+active+undersized+degraded 9 remapped+peering 9 active+undersized+remapped+backfilling 9 active+undersized+degraded+remapped+backfill_wait+backfill_toofull 2 stale+active+clean+snaptrim 2 active+undersized 1 stale+active+clean+snaptrim_wait 1 active+remapped+backfill_toofull 1 active+clean+snaptrim_wait+laggy 1 active+recovering+undersized+remapped 1 down+remapped 1 activating+undersized+degraded+remapped 1 active+recovering+laggy On 4/8/20 3:27 PM, Jack wrote: > The CPU is used by userspace, not kernelspace > > Here is the perf top, see attachment > > Rocksdb eats everything :/ > > > On 4/8/20 3:14 PM, Paul Emmerich wrote: >> What's the CPU busy with while spinning at 100%? >> >> Check "perf top" for a quick overview >> >> >> Paul >> > > > _______________________________________________ > ceph-users mailing list -- mailto:ceph-users@ceph.io > To unsubscribe send an email to mailto:ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- mailto:ceph-users@ceph.io To unsubscribe send an email to mailto:ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io