Re: [ceph-users] OSD full - All RBD Volumes stopped responding

Josef Johansson Thu, 10 Apr 2014 23:50:24 -0700

Hi,

On 11/04/14 07:29, Wido den Hollander wrote:
>
>> Op 11 april 2014 om 7:13 schreef Greg Poirier <greg.poir...@opower.com>:
>>
>>
>> One thing to note....
>> All of our kvm VMs have to be rebooted. This is something I wasn't
>> expecting.  Tried waiting for them to recover on their own, but that's not
>> happening. Rebooting them restores service immediately. :/ Not ideal.
>>
> A reboot isn't really required though. It could be that the VM itself is in
> trouble, but from a librados/librbd perspective I/O should simply continue as
> soon as a osdmap has been received without the "full" flag.
>
> It could be that you have to wait some time before the VM continues. This can
> take up to 15 minutes.
With other storage solution you would have to change the timeout-value
for each disk, i.e. changing to 180 secs from 60 secs, for the VMs to
survive storage problems.
Does Ceph handle this differently somehow?


Cheers,
Josef
> Wido
>
>> On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier 
>> <greg.poir...@opower.com>wrote:
>>
>>> Going to try increasing the full ratio. Disk utilization wasn't really
>>> growing at an unreasonable pace. I'm going to keep an eye on it for the
>>> next couple of hours and down/out the OSDs if necessary.
>>>
>>> We have four more machines that we're in the process of adding (which
>>> doubles the number of OSDs), but got held up by some networking nonsense.
>>>
>>> Thanks for the tips.
>>>
>>>
>>> On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil <s...@inktank.com> wrote:
>>>
>>>> On Thu, 10 Apr 2014, Greg Poirier wrote:
>>>>> Hi,
>>>>> I have about 200 VMs with a common RBD volume as their root filesystem
>>>> and a
>>>>> number of additional filesystems on Ceph.
>>>>>
>>>>> All of them have stopped responding. One of the OSDs in my cluster is
>>>> marked
>>>>> full. I tried stopping that OSD to force things to rebalance or at
>>>> least go
>>>>> to degraded mode, but nothing is responding still.
>>>>>
>>>>> I'm not exactly sure what to do or how to investigate. Suggestions?
>>>> Try marking the osd out or partially out (ceph osd reweight N .9) to move
>>>> some data off, and/or adjust the full ratio up (ceph pg set_full_ratio
>>>> .95).  Note that this becomes increasinly dangerous as OSDs get closer to
>>>> full; add some disks.
>>>>
>>>> sage
>>>
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD full - All RBD Volumes stopped responding

Reply via email to