Re: [ceph-users] OSD maintenance (ceph osd set noout)

2018-02-28 Thread Andre Goree

On 2018/02/27 4:23 pm, John Spray wrote:

On Tue, Feb 27, 2018 at 6:37 PM, Andre Goree  wrote:
Is it still considered best practice to set 'noout' for OSDs that will 
be
going under maintenance, e.g., rebooting an OSD ndoe for a kernel 
update?


I ask, because I've set this twice now during times which the OSDs 
would
only momentarily be 'out', however each time I've done this, the OSDs 
have

become unusable and I've had to rebuild them.


Can you be more specific about "unusable"?  Marking an OSD noout is of
course not meant to harm it!

John



Sorry I should've been more specific. I believe I run into an issue 
where the journal for a given OSD is corrupt and thus prevents the OSD 
from booting.


I did just find a way to flush a journal from an OSD earlier today (I 
hadn't actually troubleshot much and didn't look into getting the OSD 
back, as I should've) which I probably should've done and wouldn't have 
had to re-deploy anything, lol.


In any case, if I run into issues again if/when I need to try this, I'll 
make my way back to this thread.  For right now there is no issue and 
surely my ignorance with Ceph is showing, haha.


Thanks for the replies.

--
Andre Goree
-=-=-=-=-=-
Email - andre at drenet.net
Website   - http://blog.drenet.net
PGP key   - http://www.drenet.net/pubkey.html
-=-=-=-=-=-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD maintenance (ceph osd set noout)

2018-02-28 Thread David Turner
Like John says, noout prevents an osd being marked out in the cluster. It
does not impede it from being marked down and back up which is the desired
behavior when restarting a server. What are you seeing with your osds
becoming unusable and needing to rebuild them?

When rebooting a server if it takes too long to come back up then the osds
will get marked out and data will start backfilling to replace the copies
on the osds that are no longer "in" in the cluster.

Once those osds come back, not only do they need to backfill to catch up on
what they missed while they were down, but the cluster now needs to undo
all of the data migration it was doing to recover from them being marked
out.

On Tue, Feb 27, 2018, 4:24 PM John Spray  wrote:

> On Tue, Feb 27, 2018 at 6:37 PM, Andre Goree  wrote:
> > Is it still considered best practice to set 'noout' for OSDs that will be
> > going under maintenance, e.g., rebooting an OSD ndoe for a kernel update?
> >
> > I ask, because I've set this twice now during times which the OSDs would
> > only momentarily be 'out', however each time I've done this, the OSDs
> have
> > become unusable and I've had to rebuild them.
>
> Can you be more specific about "unusable"?  Marking an OSD noout is of
> course not meant to harm it!
>
> John
>
> > Also, when I _do not_ set 'noout', it would seem that once the node
> reboots
> > the OSDs come back online without issue _and_ there is very _little_
> > recovery i/o -- I'd expect to see lots of recovery i/o if a node goes
> down
> > as the cluster tries to replace the PGs on other OSD nodes.  This further
> > makes me believe that setting 'noout' is no longer necessary.
> >
> > I'm running version 12.2.2-12.2.4 (in the middle of upgrading).
> >
> > Thanks in advance.
> >
> > --
> > Andre Goree
> > -=-=-=-=-=-
> > Email - andre at drenet.net
> > Website   - http://blog.drenet.net
> > PGP key   - http://www.drenet.net/pubkey.html
> > -=-=-=-=-=-
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD maintenance (ceph osd set noout)

2018-02-27 Thread John Spray
On Tue, Feb 27, 2018 at 6:37 PM, Andre Goree  wrote:
> Is it still considered best practice to set 'noout' for OSDs that will be
> going under maintenance, e.g., rebooting an OSD ndoe for a kernel update?
>
> I ask, because I've set this twice now during times which the OSDs would
> only momentarily be 'out', however each time I've done this, the OSDs have
> become unusable and I've had to rebuild them.

Can you be more specific about "unusable"?  Marking an OSD noout is of
course not meant to harm it!

John

> Also, when I _do not_ set 'noout', it would seem that once the node reboots
> the OSDs come back online without issue _and_ there is very _little_
> recovery i/o -- I'd expect to see lots of recovery i/o if a node goes down
> as the cluster tries to replace the PGs on other OSD nodes.  This further
> makes me believe that setting 'noout' is no longer necessary.
>
> I'm running version 12.2.2-12.2.4 (in the middle of upgrading).
>
> Thanks in advance.
>
> --
> Andre Goree
> -=-=-=-=-=-
> Email - andre at drenet.net
> Website   - http://blog.drenet.net
> PGP key   - http://www.drenet.net/pubkey.html
> -=-=-=-=-=-
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD maintenance (ceph osd set noout)

2018-02-27 Thread Andre Goree
Is it still considered best practice to set 'noout' for OSDs that will 
be going under maintenance, e.g., rebooting an OSD ndoe for a kernel 
update?


I ask, because I've set this twice now during times which the OSDs would 
only momentarily be 'out', however each time I've done this, the OSDs 
have become unusable and I've had to rebuild them.


Also, when I _do not_ set 'noout', it would seem that once the node 
reboots the OSDs come back online without issue _and_ there is very 
_little_ recovery i/o -- I'd expect to see lots of recovery i/o if a 
node goes down as the cluster tries to replace the PGs on other OSD 
nodes.  This further makes me believe that setting 'noout' is no longer 
necessary.


I'm running version 12.2.2-12.2.4 (in the middle of upgrading).

Thanks in advance.

--
Andre Goree
-=-=-=-=-=-
Email - andre at drenet.net
Website   - http://blog.drenet.net
PGP key   - http://www.drenet.net/pubkey.html
-=-=-=-=-=-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com