On Wed, 18 Dec 2019, Bryan Stillwell wrote: > After upgrading one of our clusters from Nautilus 14.2.2 to Nautilus 14.2.5 > I'm seeing 100% CPU usage by a single ceph-mgr thread (found using 'top -H'). > Attaching to the thread with strace shows a lot of mmap and munmap calls. > Here's the distribution after watching it for a few minutes: > > 48.73% - mmap > 49.48% - munmap > 1.75% - futex > 0.05% - madvise > > I've upgraded 3 other clusters so far (120 OSDs, 30 OSDs, 200 OSDs), but this > is the only one which has seen the problem (355 OSDs). Perhaps it has > something to do with its size? > > I was suspecting it might have to do with one of the modules misbehaving, so > I disabled all of them: > > # ceph mgr module ls | jq -r '.enabled_modules' > [] > > But that didn't help (I restarted the mgrs after disabling the modules too). > > I also tried setting debug_mgr and debug_mgrc to 20, but nothing popped out > at me as being the cause of the problem. > > It only seems to affect the active mgr. If I stop the active mgr the problem > moves to one of the other mgrs. > > Any guesses or tips on what next steps I should take to figure out what's > going on?
What are the balancer modes on the affected and unaffected cluster(s)? sage _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io