On Wed, 18 Dec 2019, Bryan Stillwell wrote:
> After upgrading one of our clusters from Nautilus 14.2.2 to Nautilus 14.2.5 
> I'm seeing 100% CPU usage by a single ceph-mgr thread (found using 'top -H'). 
>  Attaching to the thread with strace shows a lot of mmap and munmap calls.  
> Here's the distribution after watching it for a few minutes:
> 
> 48.73% - mmap
> 49.48% - munmap
> 1.75% - futex
> 0.05% - madvise
> 
> I've upgraded 3 other clusters so far (120 OSDs, 30 OSDs, 200 OSDs), but this 
> is the only one which has seen the problem (355 OSDs).  Perhaps it has 
> something to do with its size?
> 
> I was suspecting it might have to do with one of the modules misbehaving, so 
> I disabled all of them:
> 
> # ceph mgr module ls | jq -r '.enabled_modules'
> []
> 
> But that didn't help (I restarted the mgrs after disabling the modules too).
> 
> I also tried setting debug_mgr and debug_mgrc to 20, but nothing popped out 
> at me as being the cause of the problem.
> 
> It only seems to affect the active mgr.  If I stop the active mgr the problem 
> moves to one of the other mgrs.
> 
> Any guesses or tips on what next steps I should take to figure out what's 
> going on?

What are the balancer modes on the affected and unaffected cluster(s)?

sage
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to