Hi, On 11/04/14 07:29, Wido den Hollander wrote: > >> Op 11 april 2014 om 7:13 schreef Greg Poirier <greg.poir...@opower.com>: >> >> >> One thing to note.... >> All of our kvm VMs have to be rebooted. This is something I wasn't >> expecting. Tried waiting for them to recover on their own, but that's not >> happening. Rebooting them restores service immediately. :/ Not ideal. >> > A reboot isn't really required though. It could be that the VM itself is in > trouble, but from a librados/librbd perspective I/O should simply continue as > soon as a osdmap has been received without the "full" flag. > > It could be that you have to wait some time before the VM continues. This can > take up to 15 minutes. With other storage solution you would have to change the timeout-value for each disk, i.e. changing to 180 secs from 60 secs, for the VMs to survive storage problems. Does Ceph handle this differently somehow?
Cheers, Josef > Wido > >> On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier >> <greg.poir...@opower.com>wrote: >> >>> Going to try increasing the full ratio. Disk utilization wasn't really >>> growing at an unreasonable pace. I'm going to keep an eye on it for the >>> next couple of hours and down/out the OSDs if necessary. >>> >>> We have four more machines that we're in the process of adding (which >>> doubles the number of OSDs), but got held up by some networking nonsense. >>> >>> Thanks for the tips. >>> >>> >>> On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil <s...@inktank.com> wrote: >>> >>>> On Thu, 10 Apr 2014, Greg Poirier wrote: >>>>> Hi, >>>>> I have about 200 VMs with a common RBD volume as their root filesystem >>>> and a >>>>> number of additional filesystems on Ceph. >>>>> >>>>> All of them have stopped responding. One of the OSDs in my cluster is >>>> marked >>>>> full. I tried stopping that OSD to force things to rebalance or at >>>> least go >>>>> to degraded mode, but nothing is responding still. >>>>> >>>>> I'm not exactly sure what to do or how to investigate. Suggestions? >>>> Try marking the osd out or partially out (ceph osd reweight N .9) to move >>>> some data off, and/or adjust the full ratio up (ceph pg set_full_ratio >>>> .95). Note that this becomes increasinly dangerous as OSDs get closer to >>>> full; add some disks. >>>> >>>> sage >>> >>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com