Re: maintanance on osd host

2013-03-01 Thread John Wilkins
thanks. I've updated the docs accordingly. The change should be up in
a few minutes.

On Fri, Mar 1, 2013 at 12:38 PM, Sylvain Munaut
 wrote:
> Hi,
>
>> I have it documented here:
>>
>> http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#stopping-w-out-rebalancing
>
> That looks wrong to me
>
> AFAIU it should be 'noout'. You want it marked down ASAP.
>
> Cheers,
>
> Sylvain



-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: maintanance on osd host

2013-03-01 Thread Sylvain Munaut
Hi,

> I have it documented here:
>
> http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#stopping-w-out-rebalancing

That looks wrong to me

AFAIU it should be 'noout'. You want it marked down ASAP.

Cheers,

Sylvain
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: maintanance on osd host

2013-03-01 Thread John Wilkins
I have it documented here:

http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#stopping-w-out-rebalancing

Let me know if this works for you.

On Thu, Feb 28, 2013 at 8:14 AM, Gregory Farnum  wrote:
> On Tue, Feb 26, 2013 at 11:37 PM, Stefan Priebe - Profihost AG
>  wrote:
>> Hi Greg,
>>   Hi Sage,
>>
>> Am 26.02.2013 21:27, schrieb Gregory Farnum:
>>> On Tue, Feb 26, 2013 at 11:44 AM, Stefan Priebe  
>>> wrote:
>>> "out" and "down" are quite different — are you sure you tried "down"
>>> and not "out"? (You reference out in your first email, rather than
>>> down.)
>>> -Greg
>>
>> sorry that's it i misread down / out. Sorry. Wouldn't it make sense to
>> mark the osd automatically down when shutting down via the init script?
>> It doesn't seem to make sense to hope for the automatic detection when
>> somebody uses the init script.
>
> Yes, yes it would. http://tracker.ceph.com/issues/4267 :)
> -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The Ceph Census

2013-03-01 Thread Ross David Turk

Hi!  The results of our first Census have been posted:

http://ceph.com/community/results-from-the-ceph-census/

In total, we received 81 responses representing 21 clusters in production. The 
blog post contains some high-level analysis and raw data is available for those 
who might be interested in digging deeper.

Thank you to all those who participated!  Let's do it again soon, ya?

Cheers,
Ross

On Feb 15, 2013, at 9:33 AM, Ross David Turk  wrote:
> 
> Hey folks.  We've gotten nearly 50 responses so far, and the data is proving 
> to be quite interesting!  I will share it on the blog early next week.
> 
> The survey will be open until next Monday so that everyone has an opportunity 
> to participate.  If you haven't gotten around to adding your cluster, you 
> still can - it's a pretty short list of questions, shouldn't take more than a 
> minute or two.
> 
> http://ceph.com/census
> 
> Thanks,
> Ross


--
Ross Turk
Community, Inktank

@rossturk @inktank @ceph

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-03-01 Thread Sage Weil
On Fri, 1 Mar 2013, Wido den Hollander wrote:
> On 02/23/2013 01:44 AM, Sage Weil wrote:
> > On Fri, 22 Feb 2013, S?bastien Han wrote:
> > > Hi all,
> > > 
> > > I finally got a core dump.
> > > 
> > > I did it with a kill -SEGV on the OSD process.
> > > 
> > > https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
> > > 
> > > Hope we will get something out of it :-).
> > 
> > AHA!  We have a theory.  The pg log isnt trimmed during scrub (because teh
> > old scrub code required that), but the new (deep) scrub can take a very
> > long time, which means the pg log will eat ram in the meantime..
> > especially under high iops.
> > 
> 
> Does the number of PGs influence the memory leak? So my theory is that when
> you have a high number of PGs with a low number of objects per PG you don't
> see the memory leak.
> 
> I saw the memory leak on a RBD system where a pool had just 8 PGs, but after
> going to 1024 PGs in a new pool it seemed to be resolved.
> 
> I've asked somebody else to try your patch since he's still seeing it on his
> systems. Hopefully that gives us some results.

The PGs were active+clean when you saw the leak?  There is a problem (that 
we just fixed in master) where pg logs aren't trimmed for degraded PGs.

sage

> 
> Wido
> 
> > Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
> > if that seems to work?  Note that that patch shouldn't be run in a mixed
> > argonaut+bobtail cluster, since it isn't properly checking if the scrub is
> > class or chunky/deep.
> > 
> > Thanks!
> > sage
> > 
> > 
> >   > --
> > > Regards,
> > > S?bastien Han.
> > > 
> > > 
> > > On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  wrote:
> > > > On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han 
> > > > wrote:
> > > > > > Is osd.1 using the heap profiler as well? Keep in mind that active
> > > > > > use
> > > > > > of the memory profiler will itself cause memory usage to increase ?
> > > > > > this sounds a bit like that to me since it's staying stable at a
> > > > > > large
> > > > > > but finite portion of total memory.
> > > > > 
> > > > > Well, the memory consumption was already high before the profiler was
> > > > > started. So yes with the memory profiler enable an OSD might consume
> > > > > more memory but this doesn't cause the memory leaks.
> > > > 
> > > > My concern is that maybe you saw a leak but when you restarted with
> > > > the memory profiling you lost whatever conditions caused it.
> > > > 
> > > > > Any ideas? Nothing to say about my scrumbing theory?
> > > > I like it, but Sam indicates that without some heap dumps which
> > > > capture the actual leak then scrub is too large to effectively code
> > > > review for leaks. :(
> > > > -Greg
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
> -- 
> Wido den Hollander
> 42on B.V.
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-03-01 Thread Samuel Just
That pattern would seem to support the log trimming theory of the leak.
-Sam

On Fri, Mar 1, 2013 at 7:51 AM, Wido den Hollander  wrote:
> On 02/23/2013 01:44 AM, Sage Weil wrote:
>>
>> On Fri, 22 Feb 2013, S?bastien Han wrote:
>>>
>>> Hi all,
>>>
>>> I finally got a core dump.
>>>
>>> I did it with a kill -SEGV on the OSD process.
>>>
>>>
>>> https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008
>>>
>>> Hope we will get something out of it :-).
>>
>>
>> AHA!  We have a theory.  The pg log isnt trimmed during scrub (because teh
>> old scrub code required that), but the new (deep) scrub can take a very
>> long time, which means the pg log will eat ram in the meantime..
>> especially under high iops.
>>
>
> Does the number of PGs influence the memory leak? So my theory is that when
> you have a high number of PGs with a low number of objects per PG you don't
> see the memory leak.
>
> I saw the memory leak on a RBD system where a pool had just 8 PGs, but after
> going to 1024 PGs in a new pool it seemed to be resolved.
>
> I've asked somebody else to try your patch since he's still seeing it on his
> systems. Hopefully that gives us some results.
>
> Wido
>
>
>> Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
>> if that seems to work?  Note that that patch shouldn't be run in a mixed
>> argonaut+bobtail cluster, since it isn't properly checking if the scrub is
>> class or chunky/deep.
>>
>> Thanks!
>> sage
>>
>>
>>   > --
>>>
>>> Regards,
>>> S?bastien Han.
>>>
>>>
>>> On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  wrote:

 On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han 
 wrote:
>>
>> Is osd.1 using the heap profiler as well? Keep in mind that active use
>> of the memory profiler will itself cause memory usage to increase ?
>> this sounds a bit like that to me since it's staying stable at a large
>> but finite portion of total memory.
>
>
> Well, the memory consumption was already high before the profiler was
> started. So yes with the memory profiler enable an OSD might consume
> more memory but this doesn't cause the memory leaks.


 My concern is that maybe you saw a leak but when you restarted with
 the memory profiling you lost whatever conditions caused it.

> Any ideas? Nothing to say about my scrumbing theory?

 I like it, but Sam indicates that without some heap dumps which
 capture the actual leak then scrub is too large to effectively code
 review for leaks. :(
 -Greg
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> Wido den Hollander
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2013-03-01 Thread Wido den Hollander

On 02/23/2013 01:44 AM, Sage Weil wrote:

On Fri, 22 Feb 2013, S?bastien Han wrote:

Hi all,

I finally got a core dump.

I did it with a kill -SEGV on the OSD process.

https://www.dropbox.com/s/ahv6hm0ipnak5rf/core-ceph-osd-11-0-0-20100-1361539008

Hope we will get something out of it :-).


AHA!  We have a theory.  The pg log isnt trimmed during scrub (because teh
old scrub code required that), but the new (deep) scrub can take a very
long time, which means the pg log will eat ram in the meantime..
especially under high iops.



Does the number of PGs influence the memory leak? So my theory is that 
when you have a high number of PGs with a low number of objects per PG 
you don't see the memory leak.


I saw the memory leak on a RBD system where a pool had just 8 PGs, but 
after going to 1024 PGs in a new pool it seemed to be resolved.


I've asked somebody else to try your patch since he's still seeing it on 
his systems. Hopefully that gives us some results.


Wido


Can you try wip-osd-log-trim (which is bobtail + a simple patch) and see
if that seems to work?  Note that that patch shouldn't be run in a mixed
argonaut+bobtail cluster, since it isn't properly checking if the scrub is
class or chunky/deep.

Thanks!
sage


  > --

Regards,
S?bastien Han.


On Fri, Jan 11, 2013 at 7:13 PM, Gregory Farnum  wrote:

On Fri, Jan 11, 2013 at 6:57 AM, S?bastien Han  wrote:

Is osd.1 using the heap profiler as well? Keep in mind that active use
of the memory profiler will itself cause memory usage to increase ?
this sounds a bit like that to me since it's staying stable at a large
but finite portion of total memory.


Well, the memory consumption was already high before the profiler was
started. So yes with the memory profiler enable an OSD might consume
more memory but this doesn't cause the memory leaks.


My concern is that maybe you saw a leak but when you restarted with
the memory profiling you lost whatever conditions caused it.


Any ideas? Nothing to say about my scrumbing theory?

I like it, but Sam indicates that without some heap dumps which
capture the actual leak then scrub is too large to effectively code
review for leaks. :(
-Greg

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html