Re: [ceph-users] Gracefully reboot OSD node
> Op 3 augustus 2017 om 14:14 schreef Hans van den Bogert > : > > > Thanks for answering even before I asked the questions:) > > So bottom line, HEALTH_ERR state is simply part of taking a (bunch of) OSD > down? Is HEALTH_ERR period of 2-4 seconds within normal bounds? For > context, CPUs are 2609v3 per 4 OSDs. (I know; they're far from the fastest > CPUs) > Yes. Prior to Jewel Ceph wouldn't go to ERR if PGs were inactive, where peering or down is a inactive state. It would just stay in WARN which implies nothing is really wrong. You can influence the behavior with mon_pg_min_inactive. It's set to 1 by default, but that controls how many PGs need to be inactive before it goes to ERR. But that's merely suppressing the error. A 1.9Ghz CPU isn't the fastest indeed. And most of the peering work is single threaded, so yes, this behavior is normal. If you would have faster CPUs you could reduce this time. Still, 2 to 4 seconds isn't that bad. Wido > On Thu, Aug 3, 2017 at 1:55 PM, Hans van den Bogert > wrote: > > > What are the implications of this? Because I can see a lot of blocked > > requests piling up when using 'noout' and 'nodown'. That probably makes > > sense though. > > Another thing, no when the OSDs come back online, I again see multiple > > periods of HEALTH_ERR state. Is that to be expected? > > > > On Thu, Aug 3, 2017 at 1:36 PM, linghucongsong > > wrote: > > > >> > >> > >> set the osd noout nodown > >> > >> > >> > >> > >> At 2017-08-03 18:29:47, "Hans van den Bogert" > >> wrote: > >> > >> Hi all, > >> > >> One thing which has bothered since the beginning of using ceph is that a > >> reboot of a single OSD causes a HEALTH_ERR state for the cluster for at > >> least a couple of seconds. > >> > >> In the case of planned reboot of a OSD node, should I do some extra > >> commands in order not to go to HEALTH_ERR state? > >> > >> Thanks, > >> > >> Hans > >> > >> > >> > >> > >> > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Gracefully reboot OSD node
Thanks for answering even before I asked the questions:) So bottom line, HEALTH_ERR state is simply part of taking a (bunch of) OSD down? Is HEALTH_ERR period of 2-4 seconds within normal bounds? For context, CPUs are 2609v3 per 4 OSDs. (I know; they're far from the fastest CPUs) On Thu, Aug 3, 2017 at 1:55 PM, Hans van den Bogert wrote: > What are the implications of this? Because I can see a lot of blocked > requests piling up when using 'noout' and 'nodown'. That probably makes > sense though. > Another thing, no when the OSDs come back online, I again see multiple > periods of HEALTH_ERR state. Is that to be expected? > > On Thu, Aug 3, 2017 at 1:36 PM, linghucongsong > wrote: > >> >> >> set the osd noout nodown >> >> >> >> >> At 2017-08-03 18:29:47, "Hans van den Bogert" >> wrote: >> >> Hi all, >> >> One thing which has bothered since the beginning of using ceph is that a >> reboot of a single OSD causes a HEALTH_ERR state for the cluster for at >> least a couple of seconds. >> >> In the case of planned reboot of a OSD node, should I do some extra >> commands in order not to go to HEALTH_ERR state? >> >> Thanks, >> >> Hans >> >> >> >> >> > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Gracefully reboot OSD node
What are the implications of this? Because I can see a lot of blocked requests piling up when using 'noout' and 'nodown'. That probably makes sense though. Another thing, no when the OSDs come back online, I again see multiple periods of HEALTH_ERR state. Is that to be expected? On Thu, Aug 3, 2017 at 1:36 PM, linghucongsong wrote: > > > set the osd noout nodown > > > > > At 2017-08-03 18:29:47, "Hans van den Bogert" > wrote: > > Hi all, > > One thing which has bothered since the beginning of using ceph is that a > reboot of a single OSD causes a HEALTH_ERR state for the cluster for at > least a couple of seconds. > > In the case of planned reboot of a OSD node, should I do some extra > commands in order not to go to HEALTH_ERR state? > > Thanks, > > Hans > > > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Gracefully reboot OSD node
> Op 3 augustus 2017 om 13:36 schreef linghucongsong : > > > > > set the osd noout nodown > While noout is correct and might help in some situations, never set nodown unless you really need that. It will block I/O since you are taking down OSDs which aren't marked as down. In Hans's case the 'problem' is that the HEALTH_ERR is correct. Since Jewel Ceph's health will go to ERR as soon as PGs are not active. When you take down a node they will re-peer PGs and during that time no I/O can be performed on those PGs and that is a ERR state. Peering can be done faster by having higher clocked CPUs, but there will be a short moment where I/O will block for a set of PGs. Wido > > > > At 2017-08-03 18:29:47, "Hans van den Bogert" wrote: > > Hi all, > > > One thing which has bothered since the beginning of using ceph is that a > reboot of a single OSD causes a HEALTH_ERR state for the cluster for at least > a couple of seconds. > > > > In the case of planned reboot of a OSD node, should I do some extra commands > in order not to go to HEALTH_ERR state? > > > Thanks, > > > Hans > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Gracefully reboot OSD node
set the osd noout nodown At 2017-08-03 18:29:47, "Hans van den Bogert" wrote: Hi all, One thing which has bothered since the beginning of using ceph is that a reboot of a single OSD causes a HEALTH_ERR state for the cluster for at least a couple of seconds. In the case of planned reboot of a OSD node, should I do some extra commands in order not to go to HEALTH_ERR state? Thanks, Hans ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Gracefully reboot OSD node
Hi all, One thing which has bothered since the beginning of using ceph is that a reboot of a single OSD causes a HEALTH_ERR state for the cluster for at least a couple of seconds. In the case of planned reboot of a OSD node, should I do some extra commands in order not to go to HEALTH_ERR state? Thanks, Hans ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com