Re: [ceph-users] rebooting nodes in a ceph cluster
I think my wording was a bit misleading in my last message. Instead of "no re-balancing will happen", I should have said that no OSDs will be marked out of the cluster with the noout flag set. - Mike On 12/21/2013 2:06 PM, Mike Dawson wrote: It is also useful to mention that you can set the noout flag when doing maintenance of any given length needs to exceeds the 'mon osd down out interval'. $ ceph osd set noout ** no re-balancing will happen ** $ ceph osd unset noout ** normal re-balancing rules will resume ** - Mike Dawson On 12/19/2013 7:51 PM, Sage Weil wrote: On Thu, 19 Dec 2013, John-Paul Robinson wrote: What impact does rebooting nodes in a ceph cluster have on the health of the ceph cluster? Can it trigger rebalancing activities that then have to be undone once the node comes back up? I have a 4 node ceph cluster each node has 11 osds. There is a single pool with redundant storage. If it takes 15 minutes for one of my servers to reboot is there a risk that some sort of needless automatic processing will begin? By default, we start rebalancing data after 5 minutes. You can adjust this (to, say, 15 minutes) with mon osd down out interval = 900 in ceph.conf. sage I'm assuming that the ceph cluster can go into a "not ok" state but that in this particular configuration all the data is protected against the single node failure and there is no place for the data to migrate too so nothing "bad" will happen. Thanks for any feedback. ~jpr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rebooting nodes in a ceph cluster
It is also useful to mention that you can set the noout flag when doing maintenance of any given length needs to exceeds the 'mon osd down out interval'. $ ceph osd set noout ** no re-balancing will happen ** $ ceph osd unset noout ** normal re-balancing rules will resume ** - Mike Dawson On 12/19/2013 7:51 PM, Sage Weil wrote: On Thu, 19 Dec 2013, John-Paul Robinson wrote: What impact does rebooting nodes in a ceph cluster have on the health of the ceph cluster? Can it trigger rebalancing activities that then have to be undone once the node comes back up? I have a 4 node ceph cluster each node has 11 osds. There is a single pool with redundant storage. If it takes 15 minutes for one of my servers to reboot is there a risk that some sort of needless automatic processing will begin? By default, we start rebalancing data after 5 minutes. You can adjust this (to, say, 15 minutes) with mon osd down out interval = 900 in ceph.conf. sage I'm assuming that the ceph cluster can go into a "not ok" state but that in this particular configuration all the data is protected against the single node failure and there is no place for the data to migrate too so nothing "bad" will happen. Thanks for any feedback. ~jpr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rebooting nodes in a ceph cluster
On Fri, 20 Dec 2013, Derek Yarnell wrote: > On 12/19/13, 7:51 PM, Sage Weil wrote: > >> If it takes 15 minutes for one of my servers to reboot is there a risk > >> that some sort of needless automatic processing will begin? > > > > By default, we start rebalancing data after 5 minutes. You can adjust > > this (to, say, 15 minutes) with > > > > mon osd down out interval = 900 > > > > in ceph.conf. > > > > Will Ceph detect if the OSDs come back while it is re-balancing and stop? Yep! sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rebooting nodes in a ceph cluster
On 12/19/13, 7:51 PM, Sage Weil wrote: >> If it takes 15 minutes for one of my servers to reboot is there a risk >> that some sort of needless automatic processing will begin? > > By default, we start rebalancing data after 5 minutes. You can adjust > this (to, say, 15 minutes) with > > mon osd down out interval = 900 > > in ceph.conf. > Will Ceph detect if the OSDs come back while it is re-balancing and stop? -- Derek T. Yarnell University of Maryland Institute for Advanced Computer Studies ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rebooting nodes in a ceph cluster
David Clarke writes: > Not directly related to Ceph, but you may want to investigate kexec[0] > ('kexec-tools' package in Debian derived distributions) in order to > get your machines rebooting quicker. It essentially re-loads the > kernel as the last step of the shutdown procedure, skipping over the > lengthy BIOS/UEFI/controller firmware etc boot stages. > [0]: http://en.wikipedia.org/wiki/Kexec I'd like to second that recommendation - I only discovered this recently, and on systems with long BIOS initialization, this cuts down the time to reboot *dramatically*, like from >5 to <1 minute. -- Simon. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rebooting nodes in a ceph cluster
So is it recommended to adjust the rebalance timeout to align with the time to reboot individual nodes? I didn't see this in my pass through the ops manual but maybe I'm not looking in the right place. Thanks, ~jpr > On Dec 19, 2013, at 6:51 PM, "Sage Weil" wrote: > >> On Thu, 19 Dec 2013, John-Paul Robinson wrote: >> What impact does rebooting nodes in a ceph cluster have on the health of >> the ceph cluster? Can it trigger rebalancing activities that then have >> to be undone once the node comes back up? >> >> I have a 4 node ceph cluster each node has 11 osds. There is a single >> pool with redundant storage. >> >> If it takes 15 minutes for one of my servers to reboot is there a risk >> that some sort of needless automatic processing will begin? > > By default, we start rebalancing data after 5 minutes. You can adjust > this (to, say, 15 minutes) with > > mon osd down out interval = 900 > > in ceph.conf. > > sage > >> >> I'm assuming that the ceph cluster can go into a "not ok" state but that >> in this particular configuration all the data is protected against the >> single node failure and there is no place for the data to migrate too so >> nothing "bad" will happen. >> >> Thanks for any feedback. >> >> ~jpr >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rebooting nodes in a ceph cluster
On 20/12/13 13:51, Sage Weil wrote: > On Thu, 19 Dec 2013, John-Paul Robinson wrote: >> What impact does rebooting nodes in a ceph cluster have on the health of >> the ceph cluster? Can it trigger rebalancing activities that then have >> to be undone once the node comes back up? >> >> I have a 4 node ceph cluster each node has 11 osds. There is a single >> pool with redundant storage. >> >> If it takes 15 minutes for one of my servers to reboot is there a risk >> that some sort of needless automatic processing will begin? > > By default, we start rebalancing data after 5 minutes. You can adjust > this (to, say, 15 minutes) with > > mon osd down out interval = 900 > > in ceph.conf. > > sage > >> >> I'm assuming that the ceph cluster can go into a "not ok" state but that >> in this particular configuration all the data is protected against the >> single node failure and there is no place for the data to migrate too so >> nothing "bad" will happen. >> >> Thanks for any feedback. Not directly related to Ceph, but you may want to investigate kexec[0] ('kexec-tools' package in Debian derived distributions) in order to get your machines rebooting quicker. It essentially re-loads the kernel as the last step of the shutdown procedure, skipping over the lengthy BIOS/UEFI/controller firmware etc boot stages. [0]: http://en.wikipedia.org/wiki/Kexec -- David Clarke Systems Architect Catalyst IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rebooting nodes in a ceph cluster
On Thu, 19 Dec 2013, John-Paul Robinson wrote: > What impact does rebooting nodes in a ceph cluster have on the health of > the ceph cluster? Can it trigger rebalancing activities that then have > to be undone once the node comes back up? > > I have a 4 node ceph cluster each node has 11 osds. There is a single > pool with redundant storage. > > If it takes 15 minutes for one of my servers to reboot is there a risk > that some sort of needless automatic processing will begin? By default, we start rebalancing data after 5 minutes. You can adjust this (to, say, 15 minutes) with mon osd down out interval = 900 in ceph.conf. sage > > I'm assuming that the ceph cluster can go into a "not ok" state but that > in this particular configuration all the data is protected against the > single node failure and there is no place for the data to migrate too so > nothing "bad" will happen. > > Thanks for any feedback. > > ~jpr > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rebooting nodes in a ceph cluster
What impact does rebooting nodes in a ceph cluster have on the health of the ceph cluster? Can it trigger rebalancing activities that then have to be undone once the node comes back up? I have a 4 node ceph cluster each node has 11 osds. There is a single pool with redundant storage. If it takes 15 minutes for one of my servers to reboot is there a risk that some sort of needless automatic processing will begin? I'm assuming that the ceph cluster can go into a "not ok" state but that in this particular configuration all the data is protected against the single node failure and there is no place for the data to migrate too so nothing "bad" will happen. Thanks for any feedback. ~jpr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com