Re: [ceph-users] Changing the failure domain

David Turner Fri, 01 Sep 2017 10:06:13 -0700

Don't discount failing drives. You can have drives in a "ready-to-fail"
state that doesn't show up in SMART or anywhere easy to track. When
backfilling, the drive is using sectors it may not normally use. I managed
a 1400 osd cluster that would lose 1-3 drives in random nodes when I added
new storage due to the large backfill that took place. We monitored dmesg,
SMART, etc for detection of disk errors, but it would all of a sudden
happen during the large backfill. Several times the osd didn't even have
any SMART errors after it was dead.


It's easiest to track slow requests while they are happening. `ceph health
detail` will report which osd the request is blocked in and might she'd
something light. If a PG is peeing for a while, you can also check which
osd it is stuck waiting on.

On Fri, Sep 1, 2017, 12:09 PM Laszlo Budai <las...@componentsoft.eu> wrote:

> Hello,
>
>
> We have checked all the drives, and there is no problem with them. If
> there would be a failing drive, then I think that the slow requests should
> appear also in the normal traffic as the ceph cluster is using all the OSDs
> as primaries for some PGs. But these slow requests are appearing only
> during the backfill. I will try to dig deeper into the IO operations at the
> next test.
>
> Kind regards,
> Laszlo
>
>
>
> On 01.09.2017 16:08, David Turner wrote:
> > That is normal to have backfilling because the crush map did change. The
> host and the chassis have crush numbers and their own weight which is the
> sum of the osds under them.  By moving the host into the chassis you
> changed the weight of the chassis and that affects the PG placement even
> though you didn't change the failure domain.
> >
> > Osd_max_backfills = 1 shouldn't impact customer traffic and cause
> blocked requests. Most people find that they can use 3-5 before the disks
> are active enough to come close to impacting customer traffic.  That would
> lead me to think you have a dying drive that you're reading from/writing to
> in sectors that are bad or at least slower.
> >
> >
> > On Fri, Sep 1, 2017, 6:13 AM Laszlo Budai <las...@componentsoft.eu
> <mailto:las...@componentsoft.eu>> wrote:
> >
> >     Hi David,
> >
> >     Well, most probably the larger part of our PGs will have to be
> reorganized, as we are moving from 9 hosts to 3 chassis. But I was hoping
> to be able to throttle the backfilling to an extent where it has minimal
> impact on our user traffic. Unfortunately I wasn't able to do it. I saw
> that the newer versions of ceph have the "osd recovery sleep" parameter. I
> think this would help, but unfortunately it's not present in hammer ... :(
> >
> >     Also I have an other question: Is it normal to have backfill when we
> add a host to a chassis even if we don't change the CRUSH rule? Let me
> explain: We have the hosts directly assigned to the root bucket. Then we
> add chassis to the root, and then we move a host from the root to the
> chassis. In all this time the rule set remains unchanged, with the host
> being the failure domain.
> >
> >     Kind regards,
> >     Laszlo
> >
> >
> >     On 31.08.2017 17:56, David Turner wrote:
> >      > How long are you seeing these blocked requests for?  Initially or
> perpetually?  Changing the failure domain causes all PGs to peer at the
> same time.  This would be the cause if it happens really quickly.  There is
> no way to avoid all of them peering while making a change like this.  After
> that, It could easily be caused because a fair majority of your data is
> probably set to move around.  I would check what might be causing the
> blocked requests during this time.  See if there is an OSD that might be
> dying (large backfills have the tendancy to find a couple failing drives)
> which could easily cause things to block.  Also checking if your disks or
> journals are maxed out with iostat could shine some light on any mitigating
> factor.
> >      >
> >      > On Thu, Aug 31, 2017 at 9:01 AM Laszlo Budai <
> las...@componentsoft.eu <mailto:las...@componentsoft.eu> <mailto:
> las...@componentsoft.eu <mailto:las...@componentsoft.eu>>> wrote:
> >      >
> >      >     Dear all!
> >      >
> >      >     In our Hammer cluster we are planning to switch our failure
> domain from host to chassis. We have performed some simulations, and
> regardless of the settings we have used some slow requests have appeared
> all the time.
> >      >
> >      >     we had the the following settings:
> >      >
> >      >        "osd_max_backfills": "1",
> >      >           "osd_backfill_full_ratio": "0.85",
> >      >           "osd_backfill_retry_interval": "10",
> >      >           "osd_backfill_scan_min": "1",
> >      >           "osd_backfill_scan_max": "4",
> >      >           "osd_kill_backfill_at": "0",
> >      >           "osd_debug_skip_full_check_in_backfill_reservation":
> "false",
> >      >           "osd_debug_reject_backfill_probability": "0",
> >      >
> >      >          "osd_min_recovery_priority": "0",
> >      >           "osd_allow_recovery_below_min_size": "true",
> >      >           "osd_recovery_threads": "1",
> >      >           "osd_recovery_thread_timeout": "60",
> >      >           "osd_recovery_thread_suicide_timeout": "300",
> >      >           "osd_recovery_delay_start": "0",
> >      >           "osd_recovery_max_active": "1",
> >      >           "osd_recovery_max_single_start": "1",
> >      >           "osd_recovery_max_chunk": "8388608",
> >      >           "osd_recovery_forget_lost_objects": "false",
> >      >           "osd_recovery_op_priority": "1",
> >      >           "osd_recovery_op_warn_multiple": "16",
> >      >
> >      >
> >      >     we have also tested it with the CFQ IO scheduler on the OSDs
> and the following params:
> >      >           "osd_disk_thread_ioprio_priority": "7"
> >      >           "osd_disk_thread_ioprio_class": "idle"
> >      >
> >      >     and the nodeep-scrub set.
> >      >
> >      >     Is there anything else to try? Is there a good way to switch
> from one kind of failure domain to an other without slow requests?
> >      >
> >      >     Thank you in advance for any suggestions.
> >      >
> >      >     Kind regards,
> >      >     Laszlo
> >      >
> >      >
> >      >     _______________________________________________
> >      >     ceph-users mailing list
> >      > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> <mailto:ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>>
> >      > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >      >
> >
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Changing the failure domain

Reply via email to