Re: OSD memory leaks?

Dave Spano Tue, 12 Mar 2013 14:15:55 -0700

I'd rather shut the cloud down and copy the pool to a new one than take any 
chances of corruption by using an experimental feature. My guess is that there 
cannot be any i/o to the pool while copying, otherwise you'll lose the changes 
that are happening during the copy, correct?


Dave Spano 
Optogenics 
Systems Administrator 



----- Original Message ----- 

From: "Greg Farnum" <g...@inktank.com> 
To: "Sébastien Han" <han.sebast...@gmail.com> 
Cc: "Dave Spano" <dsp...@optogenics.com>, "ceph-devel" 
<ceph-devel@vger.kernel.org>, "Sage Weil" <s...@inktank.com>, "Wido den 
Hollander" <w...@42on.com>, "Sylvain Munaut" <s.mun...@whatever-company.com>, 
"Samuel Just" <sam.j...@inktank.com>, "Vladislav Gorbunov" <vadi...@gmail.com> 
Sent: Tuesday, March 12, 2013 4:20:13 PM 
Subject: Re: OSD memory leaks? 

On Tuesday, March 12, 2013 at 1:10 PM, Sébastien Han wrote: 
> Well to avoid un necessary data movement, there is also an 
> _experimental_ feature to change on fly the number of PGs in a pool. 
> 
> ceph osd pool set <poolname> pg_num <numpgs> --allow-experimental-feature 
Don't do that. We've got a set of 3 patches which fix bugs we know about that 
aren't in bobtail yet, and I'm sure there's more we aren't aware of… 
-Greg 

Software Engineer #42 @ http://inktank.com | http://ceph.com 

> 
> Cheers! 
> -- 
> Regards, 
> Sébastien Han. 
> 
> 
> On Tue, Mar 12, 2013 at 7:09 PM, Dave Spano <dsp...@optogenics.com 
> (mailto:dsp...@optogenics.com)> wrote: 
> > Disregard my previous question. I found my answer in the post below. 
> > Absolutely brilliant! I thought I was screwed! 
> > 
> > http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/8924 
> > 
> > Dave Spano 
> > Optogenics 
> > Systems Administrator 
> > 
> > 
> > 
> > ----- Original Message ----- 
> > 
> > From: "Dave Spano" <dsp...@optogenics.com (mailto:dsp...@optogenics.com)> 
> > To: "Sébastien Han" <han.sebast...@gmail.com 
> > (mailto:han.sebast...@gmail.com)> 
> > Cc: "Sage Weil" <s...@inktank.com (mailto:s...@inktank.com)>, "Wido den 
> > Hollander" <w...@42on.com (mailto:w...@42on.com)>, "Gregory Farnum" 
> > <g...@inktank.com (mailto:g...@inktank.com)>, "Sylvain Munaut" 
> > <s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com)>, 
> > "ceph-devel" <ceph-devel@vger.kernel.org 
> > (mailto:ceph-devel@vger.kernel.org)>, "Samuel Just" <sam.j...@inktank.com 
> > (mailto:sam.j...@inktank.com)>, "Vladislav Gorbunov" <vadi...@gmail.com 
> > (mailto:vadi...@gmail.com)> 
> > Sent: Tuesday, March 12, 2013 1:41:21 PM 
> > Subject: Re: OSD memory leaks? 
> > 
> > 
> > If one were stupid enough to have their pg_num and pgp_num set to 8 on two 
> > of their pools, how could you fix that? 
> > 
> > 
> > Dave Spano 
> > 
> > 
> > 
> > ----- Original Message ----- 
> > 
> > From: "Sébastien Han" <han.sebast...@gmail.com 
> > (mailto:han.sebast...@gmail.com)> 
> > To: "Vladislav Gorbunov" <vadi...@gmail.com (mailto:vadi...@gmail.com)> 
> > Cc: "Sage Weil" <s...@inktank.com (mailto:s...@inktank.com)>, "Wido den 
> > Hollander" <w...@42on.com (mailto:w...@42on.com)>, "Gregory Farnum" 
> > <g...@inktank.com (mailto:g...@inktank.com)>, "Sylvain Munaut" 
> > <s.mun...@whatever-company.com (mailto:s.mun...@whatever-company.com)>, 
> > "Dave Spano" <dsp...@optogenics.com (mailto:dsp...@optogenics.com)>, 
> > "ceph-devel" <ceph-devel@vger.kernel.org 
> > (mailto:ceph-devel@vger.kernel.org)>, "Samuel Just" <sam.j...@inktank.com 
> > (mailto:sam.j...@inktank.com)> 
> > Sent: Tuesday, March 12, 2013 9:43:44 AM 
> > Subject: Re: OSD memory leaks? 
> > 
> > > Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd 
> > > dump | grep 'rep size'" 
> > 
> > 
> > 
> > Well it's still 450 each... 
> > 
> > > The default pg_num value 8 is NOT suitable for big cluster. 
> > 
> > Thanks I know, I'm not new with Ceph. What's your point here? I 
> > already said that pg_num was 450... 
> > -- 
> > Regards, 
> > Sébastien Han. 
> > 
> > 
> > On Tue, Mar 12, 2013 at 2:00 PM, Vladislav Gorbunov <vadi...@gmail.com 
> > (mailto:vadi...@gmail.com)> wrote: 
> > > Sorry, i mean pg_num and pgp_num on all pools. Shown by the "ceph osd 
> > > dump | grep 'rep size'" 
> > > The default pg_num value 8 is NOT suitable for big cluster. 
> > > 
> > > 2013/3/13 Sébastien Han <han.sebast...@gmail.com 
> > > (mailto:han.sebast...@gmail.com)>: 
> > > > Replica count has been set to 2. 
> > > > 
> > > > Why? 
> > > > -- 
> > > > Regards, 
> > > > Sébastien Han. 
> > > > 
> > > > 
> > > > On Tue, Mar 12, 2013 at 12:45 PM, Vladislav Gorbunov <vadi...@gmail.com 
> > > > (mailto:vadi...@gmail.com)> wrote: 
> > > > > > FYI I'm using 450 pgs for my pools. 
> > > > > 
> > > > > 
> > > > > Please, can you show the number of object replicas? 
> > > > > 
> > > > > ceph osd dump | grep 'rep size' 
> > > > > 
> > > > > Vlad Gorbunov 
> > > > > 
> > > > > 2013/3/5 Sébastien Han <han.sebast...@gmail.com 
> > > > > (mailto:han.sebast...@gmail.com)>: 
> > > > > > FYI I'm using 450 pgs for my pools. 
> > > > > > 
> > > > > > -- 
> > > > > > Regards, 
> > > > > > Sébastien Han. 
> > > > > > 
> > > > > > 
> > > > > > On Fri, Mar 1, 2013 at 8:10 PM, Sage Weil <s...@inktank.com 
> > > > > > (mailto:s...@inktank.com)> wrote: 
> > > > > > > 
> > > > > > > On Fri, 1 Mar 2013, Wido den Hollander wrote: 
> > > > > > > > On 02/23/2013 01:44 AM, Sage Weil wrote: 
> > > > > > > > > On Fri, 22 Feb 2013, S?bastien Han wrote: 
> > > > > > > > > > Hi all, 
> > > > > > > > > > 
> > > > > > > > > > I finally got a core dump. 
> > > > > > > > > > 
> > > > > > > > > > I did it with a kill -SEGV on the OSD process. 
> > > > > > > > > > 
> > > > > > > > > > https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
> > > > > > > > > >  
> > > > > > > > > > 
> > > > > > > > > > Hope we will get something out of it :-). 
> > > > > > > > > 
> > > > > > > > > AHA! We have a theory. The pg log isnt trimmed during scrub 
> > > > > > > > > (because teh 
> > > > > > > > > old scrub code required that), but the new (deep) scrub can 
> > > > > > > > > take a very 
> > > > > > > > > long time, which means the pg log will eat ram in the 
> > > > > > > > > meantime.. 
> > > > > > > > > especially under high iops. 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Does the number of PGs influence the memory leak? So my theory 
> > > > > > > > is that when 
> > > > > > > > you have a high number of PGs with a low number of objects per 
> > > > > > > > PG you don't 
> > > > > > > > see the memory leak. 
> > > > > > > > 
> > > > > > > > I saw the memory leak on a RBD system where a pool had just 8 
> > > > > > > > PGs, but after 
> > > > > > > > going to 1024 PGs in a new pool it seemed to be resolved. 
> > > > > > > > 
> > > > > > > > I've asked somebody else to try your patch since he's still 
> > > > > > > > seeing it on his 
> > > > > > > > systems. Hopefully that gives us some results. 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > The PGs were active+clean when you saw the leak? There is a 
> > > > > > > problem (that 
> > > > > > > we just fixed in master) where pg logs aren't trimmed for 
> > > > > > > degraded PGs. 
> > > > > > > 
> > > > > > > sage 
> > > > > > > 
> > > > > > > > 
> > > > > > > > Wido 
> > > > > > > > 
> > > > > > > > > Can you try wip-osd-log-trim (which is bobtail + a simple 
> > > > > > > > > patch) and see 
> > > > > > > > > if that seems to work? Note that that patch shouldn't be run 
> > > > > > > > > in a mixed 
> > > > > > > > > argonaut+bobtail cluster, since it isn't properly checking if 
> > > > > > > > > the scrub is 
> > > > > > > > > class or chunky/deep. 
> > > > > > > > > 
> > > > > > > > > Thanks! 
> > > > > > > > > sage 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > -- 
> > > > > > > > > > Regards, 
> > > > > > > > > > S?bastien Han. 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum 
> > > > > > > > > > <g...@inktank.com (mailto:g...@inktank.com)> wrote: 
> > > > > > > > > > > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han 
> > > > > > > > > > > <han.sebast...@gmail.com 
> > > > > > > > > > > (mailto:han.sebast...@gmail.com)> 
> > > > > > > > > > > wrote: 
> > > > > > > > > > > > > Is osd.1 using the heap profiler as well? Keep in 
> > > > > > > > > > > > > mind that active 
> > > > > > > > > > > > > use 
> > > > > > > > > > > > > of the memory profiler will itself cause memory usage 
> > > > > > > > > > > > > to increase ? 
> > > > > > > > > > > > > this sounds a bit like that to me since it's staying 
> > > > > > > > > > > > > stable at a 
> > > > > > > > > > > > > large 
> > > > > > > > > > > > > but finite portion of total memory. 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Well, the memory consumption was already high before 
> > > > > > > > > > > > the profiler was 
> > > > > > > > > > > > started. So yes with the memory profiler enable an OSD 
> > > > > > > > > > > > might consume 
> > > > > > > > > > > > more memory but this doesn't cause the memory leaks. 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > My concern is that maybe you saw a leak but when you 
> > > > > > > > > > > restarted with 
> > > > > > > > > > > the memory profiling you lost whatever conditions caused 
> > > > > > > > > > > it. 
> > > > > > > > > > > 
> > > > > > > > > > > > Any ideas? Nothing to say about my scrumbing theory? 
> > > > > > > > > > > I like it, but Sam indicates that without some heap dumps 
> > > > > > > > > > > which 
> > > > > > > > > > > capture the actual leak then scrub is too large to 
> > > > > > > > > > > effectively code 
> > > > > > > > > > > review for leaks. :( 
> > > > > > > > > > > -Greg 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > -- 
> > > > > > > > > > To unsubscribe from this list: send the line "unsubscribe 
> > > > > > > > > > ceph-devel" in 
> > > > > > > > > > the body of a message to majord...@vger.kernel.org 
> > > > > > > > > > (mailto:majord...@vger.kernel.org) 
> > > > > > > > > > More majordomo info at 
> > > > > > > > > > http://vger.kernel.org/majordomo-info.html 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > -- 
> > > > > > > > > To unsubscribe from this list: send the line "unsubscribe 
> > > > > > > > > ceph-devel" in 
> > > > > > > > > the body of a message to majord...@vger.kernel.org 
> > > > > > > > > (mailto:majord...@vger.kernel.org) 
> > > > > > > > > More majordomo info at 
> > > > > > > > > http://vger.kernel.org/majordomo-info.html 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > Wido den Hollander 
> > > > > > > > 42on B.V. 
> > > > > > > > 
> > > > > > > > Phone: +31 (0)20 700 9902 
> > > > > > > > Skype: contact42on 
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > > -- 
> > > > > > To unsubscribe from this list: send the line "unsubscribe 
> > > > > > ceph-devel" in 
> > > > > > the body of a message to majord...@vger.kernel.org 
> > > > > > (mailto:majord...@vger.kernel.org) 
> > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html 
> > > > > 
> > > > 
> > > 
> > 
> > 
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > the body of a message to majord...@vger.kernel.org 
> > (mailto:majord...@vger.kernel.org) 
> > More majordomo info at http://vger.kernel.org/majordomo-info.html 
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: OSD memory leaks?

Reply via email to