Re: [ceph-users] OSD maintenance (ceph osd set noout)
On 2018/02/27 4:23 pm, John Spray wrote: On Tue, Feb 27, 2018 at 6:37 PM, Andre Goree wrote: Is it still considered best practice to set 'noout' for OSDs that will be going under maintenance, e.g., rebooting an OSD ndoe for a kernel update? I ask, because I've set this twice now during times which the OSDs would only momentarily be 'out', however each time I've done this, the OSDs have become unusable and I've had to rebuild them. Can you be more specific about "unusable"? Marking an OSD noout is of course not meant to harm it! John Sorry I should've been more specific. I believe I run into an issue where the journal for a given OSD is corrupt and thus prevents the OSD from booting. I did just find a way to flush a journal from an OSD earlier today (I hadn't actually troubleshot much and didn't look into getting the OSD back, as I should've) which I probably should've done and wouldn't have had to re-deploy anything, lol. In any case, if I run into issues again if/when I need to try this, I'll make my way back to this thread. For right now there is no issue and surely my ignorance with Ceph is showing, haha. Thanks for the replies. -- Andre Goree -=-=-=-=-=- Email - andre at drenet.net Website - http://blog.drenet.net PGP key - http://www.drenet.net/pubkey.html -=-=-=-=-=- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD maintenance (ceph osd set noout)
Like John says, noout prevents an osd being marked out in the cluster. It does not impede it from being marked down and back up which is the desired behavior when restarting a server. What are you seeing with your osds becoming unusable and needing to rebuild them? When rebooting a server if it takes too long to come back up then the osds will get marked out and data will start backfilling to replace the copies on the osds that are no longer "in" in the cluster. Once those osds come back, not only do they need to backfill to catch up on what they missed while they were down, but the cluster now needs to undo all of the data migration it was doing to recover from them being marked out. On Tue, Feb 27, 2018, 4:24 PM John Spray wrote: > On Tue, Feb 27, 2018 at 6:37 PM, Andre Goree wrote: > > Is it still considered best practice to set 'noout' for OSDs that will be > > going under maintenance, e.g., rebooting an OSD ndoe for a kernel update? > > > > I ask, because I've set this twice now during times which the OSDs would > > only momentarily be 'out', however each time I've done this, the OSDs > have > > become unusable and I've had to rebuild them. > > Can you be more specific about "unusable"? Marking an OSD noout is of > course not meant to harm it! > > John > > > Also, when I _do not_ set 'noout', it would seem that once the node > reboots > > the OSDs come back online without issue _and_ there is very _little_ > > recovery i/o -- I'd expect to see lots of recovery i/o if a node goes > down > > as the cluster tries to replace the PGs on other OSD nodes. This further > > makes me believe that setting 'noout' is no longer necessary. > > > > I'm running version 12.2.2-12.2.4 (in the middle of upgrading). > > > > Thanks in advance. > > > > -- > > Andre Goree > > -=-=-=-=-=- > > Email - andre at drenet.net > > Website - http://blog.drenet.net > > PGP key - http://www.drenet.net/pubkey.html > > -=-=-=-=-=- > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD maintenance (ceph osd set noout)
On Tue, Feb 27, 2018 at 6:37 PM, Andre Goree wrote: > Is it still considered best practice to set 'noout' for OSDs that will be > going under maintenance, e.g., rebooting an OSD ndoe for a kernel update? > > I ask, because I've set this twice now during times which the OSDs would > only momentarily be 'out', however each time I've done this, the OSDs have > become unusable and I've had to rebuild them. Can you be more specific about "unusable"? Marking an OSD noout is of course not meant to harm it! John > Also, when I _do not_ set 'noout', it would seem that once the node reboots > the OSDs come back online without issue _and_ there is very _little_ > recovery i/o -- I'd expect to see lots of recovery i/o if a node goes down > as the cluster tries to replace the PGs on other OSD nodes. This further > makes me believe that setting 'noout' is no longer necessary. > > I'm running version 12.2.2-12.2.4 (in the middle of upgrading). > > Thanks in advance. > > -- > Andre Goree > -=-=-=-=-=- > Email - andre at drenet.net > Website - http://blog.drenet.net > PGP key - http://www.drenet.net/pubkey.html > -=-=-=-=-=- > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD maintenance (ceph osd set noout)
Is it still considered best practice to set 'noout' for OSDs that will be going under maintenance, e.g., rebooting an OSD ndoe for a kernel update? I ask, because I've set this twice now during times which the OSDs would only momentarily be 'out', however each time I've done this, the OSDs have become unusable and I've had to rebuild them. Also, when I _do not_ set 'noout', it would seem that once the node reboots the OSDs come back online without issue _and_ there is very _little_ recovery i/o -- I'd expect to see lots of recovery i/o if a node goes down as the cluster tries to replace the PGs on other OSD nodes. This further makes me believe that setting 'noout' is no longer necessary. I'm running version 12.2.2-12.2.4 (in the middle of upgrading). Thanks in advance. -- Andre Goree -=-=-=-=-=- Email - andre at drenet.net Website - http://blog.drenet.net PGP key - http://www.drenet.net/pubkey.html -=-=-=-=-=- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com