Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-11 Thread Wido den Hollander

On 04/11/2014 02:45 PM, Greg Poirier wrote:

So... our storage problems persisted for about 45 minutes. I gave an
entire hypervisor worth of VM's time to recover (approx. 30 vms), and
none of them recovered on their own. In the end, we had to stop and
start every VM (easily done, it was just alarming). Once rebooted, the
VMs of course were fine.



So that's interesting. I'm going to try this myself as well since I 
think they should continue I/O at some point.



I marked the two full OSDs as down and out. I am a little concerned that
these two are full while the cluster, in general, is only at 50%
capacity. It appears we may have a hot spot. I'm going to look into that
later today. Also, I'm not sure how it happened, but pgp_num is lower
than pg_num.  I had not noticed that until last night. Will address that
as well. This probably happened when i last resized placement groups or
potentially when I setup object storage pools.




On Fri, Apr 11, 2014 at 3:49 AM, Wido den Hollander mailto:w...@42on.com>> wrote:

On 04/11/2014 09:23 AM, Josef Johansson wrote:


On 11/04/14 09:07, Wido den Hollander wrote:


Op 11 april 2014 om 8:50 schreef Josef Johansson
mailto:jo...@oderland.se>>:


Hi,

On 11/04/14 07:29, Wido den Hollander wrote:

Op 11 april 2014 om 7:13 schreef Greg Poirier
mailto:greg.poir...@opower.com>>:


One thing to note
All of our kvm VMs have to be rebooted. This is
something I wasn't
expecting.  Tried waiting for them to recover on
their own, but that's not
happening. Rebooting them restores service
immediately. :/ Not ideal.

A reboot isn't really required though. It could be
that the VM itself is in
trouble, but from a librados/librbd perspective I/O
should simply continue
as
soon as a osdmap has been received without the
"full" flag.

It could be that you have to wait some time before
the VM continues. This
can
take up to 15 minutes.

With other storage solution you would have to change the
timeout-value
for each disk, i.e. changing to 180 secs from 60 secs,
for the VMs to
survive storage problems.
Does Ceph handle this differently somehow?

It's not that RBD does it differently. Librados simply
blocks the I/O and thus
dus librbd which then causes Qemu to block.

I've seen VMs survive RBD issues for longer periods then 60
seconds. Gave them
some time and they continued again.

Which exact setting are you talking about? I'm talking about
a Qemu/KVM VM
running with a VirtIO drive.

cat /sys/block/*/device/timeout

(http://kb.vmware.com/__selfservice/microsites/search.__do?language=en_US&cmd=__displayKC&externalId=1009465

)

This file is non-existant for my Ceph-VirtIO-drive however, so
it seems
RBD handles this.


Well, I don't think it's handled by RBD, but VirtIO simply doesn't
have the timeout. That's probably only in the SCSI driver.

Wido


I have just Para-Virtualized VMs to compare with right now, and they
don't have it inside the VM, but that's expected. From my
understanding
it should've been there if it was a HVM. Whenever the timeout was
reached, an error occured and the disk was set in read-only-mode.

Cheers,
Josef

Wido

Cheers,
Josef

Wido

On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier
mailto:greg.poir...@opower.com>>__wrote:

Going to try increasing the full ratio. Disk
utilization wasn't really
growing at an unreasonable pace. I'm going
to keep an eye on it for the
next couple of hours and down/out the OSDs
if necessary.

We have four more machines that we're in the
process of adding (which
doubles the number of OSDs), but got held up
by some networking nonsense.

Thanks for the tips.


On Thu, Apr 10, 2014 at 

Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-11 Thread Greg Poirier
So, setting pgp_num to 2048 to match pg_num had a more serious impact than
I expected. The cluster is rebalancing quite substantially (8.5% of objects
being rebalanced)... which makes sense... Disk utilization is evening out
fairly well which is encouraging.

We are a little stumped as to why a few OSDs being full would cause the
entire cluster to stop serving IO. Is this a configuration issue that we
have?

We're slowly recovering:

 health HEALTH_WARN 135 pgs backfill; 187 pgs backfill_toofull; 151 pgs
backfilling; 2 pgs degraded; 369 pgs stuck unclean; 29 requests are blocked
> 32 sec; recovery 2563902/52390259 objects degraded (4.894%); 4 near full
osd(s)
  pgmap v8363400: 5120 pgs, 3 pools, 22635 GB data, 23872 kobjects
48889 GB used, 45022 GB / 93911 GB avail
2563902/52390259 objects degraded (4.894%)
4751 active+clean
  31 active+remapped+wait_backfill
   1 active+backfill_toofull
 103 active+remapped+wait_backfill+backfill_toofull
   1 active+degraded+wait_backfill+backfill_toofull
 150 active+remapped+backfilling
  82 active+remapped+backfill_toofull
   1 active+degraded+remapped+backfilling
recovery io 362 MB/s, 365 objects/s
  client io 1643 kB/s rd, 6001 kB/s wr, 911 op/s


On Fri, Apr 11, 2014 at 5:45 AM, Greg Poirier wrote:

> So... our storage problems persisted for about 45 minutes. I gave an
> entire hypervisor worth of VM's time to recover (approx. 30 vms), and none
> of them recovered on their own. In the end, we had to stop and start every
> VM (easily done, it was just alarming). Once rebooted, the VMs of course
> were fine.
>
> I marked the two full OSDs as down and out. I am a little concerned that
> these two are full while the cluster, in general, is only at 50% capacity.
> It appears we may have a hot spot. I'm going to look into that later today.
> Also, I'm not sure how it happened, but pgp_num is lower than pg_num.  I
> had not noticed that until last night. Will address that as well. This
> probably happened when i last resized placement groups or potentially when
> I setup object storage pools.
>
>
>
>
> On Fri, Apr 11, 2014 at 3:49 AM, Wido den Hollander  wrote:
>
>> On 04/11/2014 09:23 AM, Josef Johansson wrote:
>>
>>>
>>> On 11/04/14 09:07, Wido den Hollander wrote:
>>>

  Op 11 april 2014 om 8:50 schreef Josef Johansson :
>
>
> Hi,
>
> On 11/04/14 07:29, Wido den Hollander wrote:
>
>> Op 11 april 2014 om 7:13 schreef Greg Poirier <
>>> greg.poir...@opower.com>:
>>>
>>>
>>> One thing to note
>>> All of our kvm VMs have to be rebooted. This is something I wasn't
>>> expecting.  Tried waiting for them to recover on their own, but
>>> that's not
>>> happening. Rebooting them restores service immediately. :/ Not ideal.
>>>
>>>  A reboot isn't really required though. It could be that the VM
>> itself is in
>> trouble, but from a librados/librbd perspective I/O should simply
>> continue
>> as
>> soon as a osdmap has been received without the "full" flag.
>>
>> It could be that you have to wait some time before the VM continues.
>> This
>> can
>> take up to 15 minutes.
>>
> With other storage solution you would have to change the timeout-value
> for each disk, i.e. changing to 180 secs from 60 secs, for the VMs to
> survive storage problems.
> Does Ceph handle this differently somehow?
>
>  It's not that RBD does it differently. Librados simply blocks the I/O
 and thus
 dus librbd which then causes Qemu to block.

 I've seen VMs survive RBD issues for longer periods then 60 seconds.
 Gave them
 some time and they continued again.

 Which exact setting are you talking about? I'm talking about a Qemu/KVM
 VM
 running with a VirtIO drive.

>>> cat /sys/block/*/device/timeout
>>> (http://kb.vmware.com/selfservice/microsites/search.
>>> do?language=en_US&cmd=displayKC&externalId=1009465)
>>>
>>> This file is non-existant for my Ceph-VirtIO-drive however, so it seems
>>> RBD handles this.
>>>
>>>
>> Well, I don't think it's handled by RBD, but VirtIO simply doesn't have
>> the timeout. That's probably only in the SCSI driver.
>>
>> Wido
>>
>>
>>  I have just Para-Virtualized VMs to compare with right now, and they
>>> don't have it inside the VM, but that's expected. From my understanding
>>> it should've been there if it was a HVM. Whenever the timeout was
>>> reached, an error occured and the disk was set in read-only-mode.
>>>
>>> Cheers,
>>> Josef
>>>
 Wido

  Cheers,
> Josef
>
>> Wido
>>
>>  On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier
>>> wrote:
>>>
>>>  Going to try increasing the full ratio. Disk utilization wasn't
 really
 growing at an unreasonable pace. I'm going to ke

Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-11 Thread Greg Poirier
So... our storage problems persisted for about 45 minutes. I gave an entire
hypervisor worth of VM's time to recover (approx. 30 vms), and none of them
recovered on their own. In the end, we had to stop and start every VM
(easily done, it was just alarming). Once rebooted, the VMs of course were
fine.

I marked the two full OSDs as down and out. I am a little concerned that
these two are full while the cluster, in general, is only at 50% capacity.
It appears we may have a hot spot. I'm going to look into that later today.
Also, I'm not sure how it happened, but pgp_num is lower than pg_num.  I
had not noticed that until last night. Will address that as well. This
probably happened when i last resized placement groups or potentially when
I setup object storage pools.




On Fri, Apr 11, 2014 at 3:49 AM, Wido den Hollander  wrote:

> On 04/11/2014 09:23 AM, Josef Johansson wrote:
>
>>
>> On 11/04/14 09:07, Wido den Hollander wrote:
>>
>>>
>>>  Op 11 april 2014 om 8:50 schreef Josef Johansson :


 Hi,

 On 11/04/14 07:29, Wido den Hollander wrote:

> Op 11 april 2014 om 7:13 schreef Greg Poirier > >:
>>
>>
>> One thing to note
>> All of our kvm VMs have to be rebooted. This is something I wasn't
>> expecting.  Tried waiting for them to recover on their own, but
>> that's not
>> happening. Rebooting them restores service immediately. :/ Not ideal.
>>
>>  A reboot isn't really required though. It could be that the VM
> itself is in
> trouble, but from a librados/librbd perspective I/O should simply
> continue
> as
> soon as a osdmap has been received without the "full" flag.
>
> It could be that you have to wait some time before the VM continues.
> This
> can
> take up to 15 minutes.
>
 With other storage solution you would have to change the timeout-value
 for each disk, i.e. changing to 180 secs from 60 secs, for the VMs to
 survive storage problems.
 Does Ceph handle this differently somehow?

  It's not that RBD does it differently. Librados simply blocks the I/O
>>> and thus
>>> dus librbd which then causes Qemu to block.
>>>
>>> I've seen VMs survive RBD issues for longer periods then 60 seconds.
>>> Gave them
>>> some time and they continued again.
>>>
>>> Which exact setting are you talking about? I'm talking about a Qemu/KVM
>>> VM
>>> running with a VirtIO drive.
>>>
>> cat /sys/block/*/device/timeout
>> (http://kb.vmware.com/selfservice/microsites/search.
>> do?language=en_US&cmd=displayKC&externalId=1009465)
>>
>> This file is non-existant for my Ceph-VirtIO-drive however, so it seems
>> RBD handles this.
>>
>>
> Well, I don't think it's handled by RBD, but VirtIO simply doesn't have
> the timeout. That's probably only in the SCSI driver.
>
> Wido
>
>
>  I have just Para-Virtualized VMs to compare with right now, and they
>> don't have it inside the VM, but that's expected. From my understanding
>> it should've been there if it was a HVM. Whenever the timeout was
>> reached, an error occured and the disk was set in read-only-mode.
>>
>> Cheers,
>> Josef
>>
>>> Wido
>>>
>>>  Cheers,
 Josef

> Wido
>
>  On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier
>> wrote:
>>
>>  Going to try increasing the full ratio. Disk utilization wasn't
>>> really
>>> growing at an unreasonable pace. I'm going to keep an eye on it for
>>> the
>>> next couple of hours and down/out the OSDs if necessary.
>>>
>>> We have four more machines that we're in the process of adding (which
>>> doubles the number of OSDs), but got held up by some networking
>>> nonsense.
>>>
>>> Thanks for the tips.
>>>
>>>
>>> On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil  wrote:
>>>
>>>  On Thu, 10 Apr 2014, Greg Poirier wrote:

> Hi,
> I have about 200 VMs with a common RBD volume as their root
> filesystem
>
 and a

> number of additional filesystems on Ceph.
>
> All of them have stopped responding. One of the OSDs in my cluster
> is
>
 marked

> full. I tried stopping that OSD to force things to rebalance or at
>
 least go

> to degraded mode, but nothing is responding still.
>
> I'm not exactly sure what to do or how to investigate. Suggestions?
>
 Try marking the osd out or partially out (ceph osd reweight N .9)
 to move
 some data off, and/or adjust the full ratio up (ceph pg
 set_full_ratio
 .95).  Note that this becomes increasinly dangerous as OSDs get
 closer to
 full; add some disks.

 sage

>>>
>>>  ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-u

Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-11 Thread Wido den Hollander

On 04/11/2014 09:23 AM, Josef Johansson wrote:


On 11/04/14 09:07, Wido den Hollander wrote:



Op 11 april 2014 om 8:50 schreef Josef Johansson :


Hi,

On 11/04/14 07:29, Wido den Hollander wrote:

Op 11 april 2014 om 7:13 schreef Greg Poirier :


One thing to note
All of our kvm VMs have to be rebooted. This is something I wasn't
expecting.  Tried waiting for them to recover on their own, but that's not
happening. Rebooting them restores service immediately. :/ Not ideal.


A reboot isn't really required though. It could be that the VM itself is in
trouble, but from a librados/librbd perspective I/O should simply continue
as
soon as a osdmap has been received without the "full" flag.

It could be that you have to wait some time before the VM continues. This
can
take up to 15 minutes.

With other storage solution you would have to change the timeout-value
for each disk, i.e. changing to 180 secs from 60 secs, for the VMs to
survive storage problems.
Does Ceph handle this differently somehow?


It's not that RBD does it differently. Librados simply blocks the I/O and thus
dus librbd which then causes Qemu to block.

I've seen VMs survive RBD issues for longer periods then 60 seconds. Gave them
some time and they continued again.

Which exact setting are you talking about? I'm talking about a Qemu/KVM VM
running with a VirtIO drive.

cat /sys/block/*/device/timeout
(http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009465)

This file is non-existant for my Ceph-VirtIO-drive however, so it seems
RBD handles this.



Well, I don't think it's handled by RBD, but VirtIO simply doesn't have 
the timeout. That's probably only in the SCSI driver.


Wido


I have just Para-Virtualized VMs to compare with right now, and they
don't have it inside the VM, but that's expected. From my understanding
it should've been there if it was a HVM. Whenever the timeout was
reached, an error occured and the disk was set in read-only-mode.

Cheers,
Josef

Wido


Cheers,
Josef

Wido


On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier
wrote:


Going to try increasing the full ratio. Disk utilization wasn't really
growing at an unreasonable pace. I'm going to keep an eye on it for the
next couple of hours and down/out the OSDs if necessary.

We have four more machines that we're in the process of adding (which
doubles the number of OSDs), but got held up by some networking nonsense.

Thanks for the tips.


On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil  wrote:


On Thu, 10 Apr 2014, Greg Poirier wrote:

Hi,
I have about 200 VMs with a common RBD volume as their root filesystem

and a

number of additional filesystems on Ceph.

All of them have stopped responding. One of the OSDs in my cluster is

marked

full. I tried stopping that OSD to force things to rebalance or at

least go

to degraded mode, but nothing is responding still.

I'm not exactly sure what to do or how to investigate. Suggestions?

Try marking the osd out or partially out (ceph osd reweight N .9) to move
some data off, and/or adjust the full ratio up (ceph pg set_full_ratio
.95).  Note that this becomes increasinly dangerous as OSDs get closer to
full; add some disks.

sage



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-11 Thread Josef Johansson

On 11/04/14 09:07, Wido den Hollander wrote:
>
>> Op 11 april 2014 om 8:50 schreef Josef Johansson :
>>
>>
>> Hi,
>>
>> On 11/04/14 07:29, Wido den Hollander wrote:
 Op 11 april 2014 om 7:13 schreef Greg Poirier :


 One thing to note
 All of our kvm VMs have to be rebooted. This is something I wasn't
 expecting.  Tried waiting for them to recover on their own, but that's not
 happening. Rebooting them restores service immediately. :/ Not ideal.

>>> A reboot isn't really required though. It could be that the VM itself is in
>>> trouble, but from a librados/librbd perspective I/O should simply continue
>>> as
>>> soon as a osdmap has been received without the "full" flag.
>>>
>>> It could be that you have to wait some time before the VM continues. This
>>> can
>>> take up to 15 minutes.
>> With other storage solution you would have to change the timeout-value
>> for each disk, i.e. changing to 180 secs from 60 secs, for the VMs to
>> survive storage problems.
>> Does Ceph handle this differently somehow?
>>
> It's not that RBD does it differently. Librados simply blocks the I/O and thus
> dus librbd which then causes Qemu to block.
>
> I've seen VMs survive RBD issues for longer periods then 60 seconds. Gave them
> some time and they continued again.
>
> Which exact setting are you talking about? I'm talking about a Qemu/KVM VM
> running with a VirtIO drive.
cat /sys/block/*/device/timeout
(http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009465)

This file is non-existant for my Ceph-VirtIO-drive however, so it seems
RBD handles this.

I have just Para-Virtualized VMs to compare with right now, and they
don't have it inside the VM, but that's expected. From my understanding
it should've been there if it was a HVM. Whenever the timeout was
reached, an error occured and the disk was set in read-only-mode.

Cheers,
Josef
> Wido
>
>> Cheers,
>> Josef
>>> Wido
>>>
 On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier
 wrote:

> Going to try increasing the full ratio. Disk utilization wasn't really
> growing at an unreasonable pace. I'm going to keep an eye on it for the
> next couple of hours and down/out the OSDs if necessary.
>
> We have four more machines that we're in the process of adding (which
> doubles the number of OSDs), but got held up by some networking nonsense.
>
> Thanks for the tips.
>
>
> On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil  wrote:
>
>> On Thu, 10 Apr 2014, Greg Poirier wrote:
>>> Hi,
>>> I have about 200 VMs with a common RBD volume as their root filesystem
>> and a
>>> number of additional filesystems on Ceph.
>>>
>>> All of them have stopped responding. One of the OSDs in my cluster is
>> marked
>>> full. I tried stopping that OSD to force things to rebalance or at
>> least go
>>> to degraded mode, but nothing is responding still.
>>>
>>> I'm not exactly sure what to do or how to investigate. Suggestions?
>> Try marking the osd out or partially out (ceph osd reweight N .9) to move
>> some data off, and/or adjust the full ratio up (ceph pg set_full_ratio
>> .95).  Note that this becomes increasinly dangerous as OSDs get closer to
>> full; add some disks.
>>
>> sage
>
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-11 Thread Wido den Hollander


> Op 11 april 2014 om 8:50 schreef Josef Johansson :
> 
> 
> Hi,
> 
> On 11/04/14 07:29, Wido den Hollander wrote:
> >
> >> Op 11 april 2014 om 7:13 schreef Greg Poirier :
> >>
> >>
> >> One thing to note
> >> All of our kvm VMs have to be rebooted. This is something I wasn't
> >> expecting.  Tried waiting for them to recover on their own, but that's not
> >> happening. Rebooting them restores service immediately. :/ Not ideal.
> >>
> > A reboot isn't really required though. It could be that the VM itself is in
> > trouble, but from a librados/librbd perspective I/O should simply continue
> > as
> > soon as a osdmap has been received without the "full" flag.
> >
> > It could be that you have to wait some time before the VM continues. This
> > can
> > take up to 15 minutes.
> With other storage solution you would have to change the timeout-value
> for each disk, i.e. changing to 180 secs from 60 secs, for the VMs to
> survive storage problems.
> Does Ceph handle this differently somehow?
> 

It's not that RBD does it differently. Librados simply blocks the I/O and thus
dus librbd which then causes Qemu to block.

I've seen VMs survive RBD issues for longer periods then 60 seconds. Gave them
some time and they continued again.

Which exact setting are you talking about? I'm talking about a Qemu/KVM VM
running with a VirtIO drive.

Wido

> Cheers,
> Josef
> > Wido
> >
> >> On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier
> >> wrote:
> >>
> >>> Going to try increasing the full ratio. Disk utilization wasn't really
> >>> growing at an unreasonable pace. I'm going to keep an eye on it for the
> >>> next couple of hours and down/out the OSDs if necessary.
> >>>
> >>> We have four more machines that we're in the process of adding (which
> >>> doubles the number of OSDs), but got held up by some networking nonsense.
> >>>
> >>> Thanks for the tips.
> >>>
> >>>
> >>> On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil  wrote:
> >>>
>  On Thu, 10 Apr 2014, Greg Poirier wrote:
> > Hi,
> > I have about 200 VMs with a common RBD volume as their root filesystem
>  and a
> > number of additional filesystems on Ceph.
> >
> > All of them have stopped responding. One of the OSDs in my cluster is
>  marked
> > full. I tried stopping that OSD to force things to rebalance or at
>  least go
> > to degraded mode, but nothing is responding still.
> >
> > I'm not exactly sure what to do or how to investigate. Suggestions?
>  Try marking the osd out or partially out (ceph osd reweight N .9) to move
>  some data off, and/or adjust the full ratio up (ceph pg set_full_ratio
>  .95).  Note that this becomes increasinly dangerous as OSDs get closer to
>  full; add some disks.
> 
>  sage
> >>>
> >>>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-10 Thread Josef Johansson
Hi,

On 11/04/14 07:29, Wido den Hollander wrote:
>
>> Op 11 april 2014 om 7:13 schreef Greg Poirier :
>>
>>
>> One thing to note
>> All of our kvm VMs have to be rebooted. This is something I wasn't
>> expecting.  Tried waiting for them to recover on their own, but that's not
>> happening. Rebooting them restores service immediately. :/ Not ideal.
>>
> A reboot isn't really required though. It could be that the VM itself is in
> trouble, but from a librados/librbd perspective I/O should simply continue as
> soon as a osdmap has been received without the "full" flag.
>
> It could be that you have to wait some time before the VM continues. This can
> take up to 15 minutes.
With other storage solution you would have to change the timeout-value
for each disk, i.e. changing to 180 secs from 60 secs, for the VMs to
survive storage problems.
Does Ceph handle this differently somehow?

Cheers,
Josef
> Wido
>
>> On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier 
>> wrote:
>>
>>> Going to try increasing the full ratio. Disk utilization wasn't really
>>> growing at an unreasonable pace. I'm going to keep an eye on it for the
>>> next couple of hours and down/out the OSDs if necessary.
>>>
>>> We have four more machines that we're in the process of adding (which
>>> doubles the number of OSDs), but got held up by some networking nonsense.
>>>
>>> Thanks for the tips.
>>>
>>>
>>> On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil  wrote:
>>>
 On Thu, 10 Apr 2014, Greg Poirier wrote:
> Hi,
> I have about 200 VMs with a common RBD volume as their root filesystem
 and a
> number of additional filesystems on Ceph.
>
> All of them have stopped responding. One of the OSDs in my cluster is
 marked
> full. I tried stopping that OSD to force things to rebalance or at
 least go
> to degraded mode, but nothing is responding still.
>
> I'm not exactly sure what to do or how to investigate. Suggestions?
 Try marking the osd out or partially out (ceph osd reweight N .9) to move
 some data off, and/or adjust the full ratio up (ceph pg set_full_ratio
 .95).  Note that this becomes increasinly dangerous as OSDs get closer to
 full; add some disks.

 sage
>>>
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-10 Thread Wido den Hollander


> Op 11 april 2014 om 7:13 schreef Greg Poirier :
> 
> 
> One thing to note
> All of our kvm VMs have to be rebooted. This is something I wasn't
> expecting.  Tried waiting for them to recover on their own, but that's not
> happening. Rebooting them restores service immediately. :/ Not ideal.
> 

A reboot isn't really required though. It could be that the VM itself is in
trouble, but from a librados/librbd perspective I/O should simply continue as
soon as a osdmap has been received without the "full" flag.

It could be that you have to wait some time before the VM continues. This can
take up to 15 minutes.

Wido

> 
> On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier wrote:
> 
> > Going to try increasing the full ratio. Disk utilization wasn't really
> > growing at an unreasonable pace. I'm going to keep an eye on it for the
> > next couple of hours and down/out the OSDs if necessary.
> >
> > We have four more machines that we're in the process of adding (which
> > doubles the number of OSDs), but got held up by some networking nonsense.
> >
> > Thanks for the tips.
> >
> >
> > On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil  wrote:
> >
> >> On Thu, 10 Apr 2014, Greg Poirier wrote:
> >> > Hi,
> >> > I have about 200 VMs with a common RBD volume as their root filesystem
> >> and a
> >> > number of additional filesystems on Ceph.
> >> >
> >> > All of them have stopped responding. One of the OSDs in my cluster is
> >> marked
> >> > full. I tried stopping that OSD to force things to rebalance or at
> >> least go
> >> > to degraded mode, but nothing is responding still.
> >> >
> >> > I'm not exactly sure what to do or how to investigate. Suggestions?
> >>
> >> Try marking the osd out or partially out (ceph osd reweight N .9) to move
> >> some data off, and/or adjust the full ratio up (ceph pg set_full_ratio
> >> .95).  Note that this becomes increasinly dangerous as OSDs get closer to
> >> full; add some disks.
> >>
> >> sage
> >
> >
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-10 Thread Greg Poirier
One thing to note
All of our kvm VMs have to be rebooted. This is something I wasn't
expecting.  Tried waiting for them to recover on their own, but that's not
happening. Rebooting them restores service immediately. :/ Not ideal.


On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier wrote:

> Going to try increasing the full ratio. Disk utilization wasn't really
> growing at an unreasonable pace. I'm going to keep an eye on it for the
> next couple of hours and down/out the OSDs if necessary.
>
> We have four more machines that we're in the process of adding (which
> doubles the number of OSDs), but got held up by some networking nonsense.
>
> Thanks for the tips.
>
>
> On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil  wrote:
>
>> On Thu, 10 Apr 2014, Greg Poirier wrote:
>> > Hi,
>> > I have about 200 VMs with a common RBD volume as their root filesystem
>> and a
>> > number of additional filesystems on Ceph.
>> >
>> > All of them have stopped responding. One of the OSDs in my cluster is
>> marked
>> > full. I tried stopping that OSD to force things to rebalance or at
>> least go
>> > to degraded mode, but nothing is responding still.
>> >
>> > I'm not exactly sure what to do or how to investigate. Suggestions?
>>
>> Try marking the osd out or partially out (ceph osd reweight N .9) to move
>> some data off, and/or adjust the full ratio up (ceph pg set_full_ratio
>> .95).  Note that this becomes increasinly dangerous as OSDs get closer to
>> full; add some disks.
>>
>> sage
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-10 Thread Greg Poirier
Going to try increasing the full ratio. Disk utilization wasn't really
growing at an unreasonable pace. I'm going to keep an eye on it for the
next couple of hours and down/out the OSDs if necessary.

We have four more machines that we're in the process of adding (which
doubles the number of OSDs), but got held up by some networking nonsense.

Thanks for the tips.


On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil  wrote:

> On Thu, 10 Apr 2014, Greg Poirier wrote:
> > Hi,
> > I have about 200 VMs with a common RBD volume as their root filesystem
> and a
> > number of additional filesystems on Ceph.
> >
> > All of them have stopped responding. One of the OSDs in my cluster is
> marked
> > full. I tried stopping that OSD to force things to rebalance or at least
> go
> > to degraded mode, but nothing is responding still.
> >
> > I'm not exactly sure what to do or how to investigate. Suggestions?
>
> Try marking the osd out or partially out (ceph osd reweight N .9) to move
> some data off, and/or adjust the full ratio up (ceph pg set_full_ratio
> .95).  Note that this becomes increasinly dangerous as OSDs get closer to
> full; add some disks.
>
> sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-10 Thread Sage Weil
On Thu, 10 Apr 2014, Greg Poirier wrote:
> Hi,
> I have about 200 VMs with a common RBD volume as their root filesystem and a
> number of additional filesystems on Ceph.
> 
> All of them have stopped responding. One of the OSDs in my cluster is marked
> full. I tried stopping that OSD to force things to rebalance or at least go
> to degraded mode, but nothing is responding still. 
> 
> I'm not exactly sure what to do or how to investigate. Suggestions?

Try marking the osd out or partially out (ceph osd reweight N .9) to move 
some data off, and/or adjust the full ratio up (ceph pg set_full_ratio 
.95).  Note that this becomes increasinly dangerous as OSDs get closer to 
full; add some disks.

sage___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD full - All RBD Volumes stopped responding

2014-04-10 Thread Greg Poirier
Hi,

I have about 200 VMs with a common RBD volume as their root filesystem and
a number of additional filesystems on Ceph.

All of them have stopped responding. One of the OSDs in my cluster is
marked full. I tried stopping that OSD to force things to rebalance or at
least go to degraded mode, but nothing is responding still.

I'm not exactly sure what to do or how to investigate. Suggestions?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com