Don't discount failing drives. You can have drives in a "ready-to-fail" state that doesn't show up in SMART or anywhere easy to track. When backfilling, the drive is using sectors it may not normally use. I managed a 1400 osd cluster that would lose 1-3 drives in random nodes when I added new storage due to the large backfill that took place. We monitored dmesg, SMART, etc for detection of disk errors, but it would all of a sudden happen during the large backfill. Several times the osd didn't even have any SMART errors after it was dead.
It's easiest to track slow requests while they are happening. `ceph health detail` will report which osd the request is blocked in and might she'd something light. If a PG is peeing for a while, you can also check which osd it is stuck waiting on. On Fri, Sep 1, 2017, 12:09 PM Laszlo Budai <las...@componentsoft.eu> wrote: > Hello, > > > We have checked all the drives, and there is no problem with them. If > there would be a failing drive, then I think that the slow requests should > appear also in the normal traffic as the ceph cluster is using all the OSDs > as primaries for some PGs. But these slow requests are appearing only > during the backfill. I will try to dig deeper into the IO operations at the > next test. > > Kind regards, > Laszlo > > > > On 01.09.2017 16:08, David Turner wrote: > > That is normal to have backfilling because the crush map did change. The > host and the chassis have crush numbers and their own weight which is the > sum of the osds under them. By moving the host into the chassis you > changed the weight of the chassis and that affects the PG placement even > though you didn't change the failure domain. > > > > Osd_max_backfills = 1 shouldn't impact customer traffic and cause > blocked requests. Most people find that they can use 3-5 before the disks > are active enough to come close to impacting customer traffic. That would > lead me to think you have a dying drive that you're reading from/writing to > in sectors that are bad or at least slower. > > > > > > On Fri, Sep 1, 2017, 6:13 AM Laszlo Budai <las...@componentsoft.eu > <mailto:las...@componentsoft.eu>> wrote: > > > > Hi David, > > > > Well, most probably the larger part of our PGs will have to be > reorganized, as we are moving from 9 hosts to 3 chassis. But I was hoping > to be able to throttle the backfilling to an extent where it has minimal > impact on our user traffic. Unfortunately I wasn't able to do it. I saw > that the newer versions of ceph have the "osd recovery sleep" parameter. I > think this would help, but unfortunately it's not present in hammer ... :( > > > > Also I have an other question: Is it normal to have backfill when we > add a host to a chassis even if we don't change the CRUSH rule? Let me > explain: We have the hosts directly assigned to the root bucket. Then we > add chassis to the root, and then we move a host from the root to the > chassis. In all this time the rule set remains unchanged, with the host > being the failure domain. > > > > Kind regards, > > Laszlo > > > > > > On 31.08.2017 17:56, David Turner wrote: > > > How long are you seeing these blocked requests for? Initially or > perpetually? Changing the failure domain causes all PGs to peer at the > same time. This would be the cause if it happens really quickly. There is > no way to avoid all of them peering while making a change like this. After > that, It could easily be caused because a fair majority of your data is > probably set to move around. I would check what might be causing the > blocked requests during this time. See if there is an OSD that might be > dying (large backfills have the tendancy to find a couple failing drives) > which could easily cause things to block. Also checking if your disks or > journals are maxed out with iostat could shine some light on any mitigating > factor. > > > > > > On Thu, Aug 31, 2017 at 9:01 AM Laszlo Budai < > las...@componentsoft.eu <mailto:las...@componentsoft.eu> <mailto: > las...@componentsoft.eu <mailto:las...@componentsoft.eu>>> wrote: > > > > > > Dear all! > > > > > > In our Hammer cluster we are planning to switch our failure > domain from host to chassis. We have performed some simulations, and > regardless of the settings we have used some slow requests have appeared > all the time. > > > > > > we had the the following settings: > > > > > > "osd_max_backfills": "1", > > > "osd_backfill_full_ratio": "0.85", > > > "osd_backfill_retry_interval": "10", > > > "osd_backfill_scan_min": "1", > > > "osd_backfill_scan_max": "4", > > > "osd_kill_backfill_at": "0", > > > "osd_debug_skip_full_check_in_backfill_reservation": > "false", > > > "osd_debug_reject_backfill_probability": "0", > > > > > > "osd_min_recovery_priority": "0", > > > "osd_allow_recovery_below_min_size": "true", > > > "osd_recovery_threads": "1", > > > "osd_recovery_thread_timeout": "60", > > > "osd_recovery_thread_suicide_timeout": "300", > > > "osd_recovery_delay_start": "0", > > > "osd_recovery_max_active": "1", > > > "osd_recovery_max_single_start": "1", > > > "osd_recovery_max_chunk": "8388608", > > > "osd_recovery_forget_lost_objects": "false", > > > "osd_recovery_op_priority": "1", > > > "osd_recovery_op_warn_multiple": "16", > > > > > > > > > we have also tested it with the CFQ IO scheduler on the OSDs > and the following params: > > > "osd_disk_thread_ioprio_priority": "7" > > > "osd_disk_thread_ioprio_class": "idle" > > > > > > and the nodeep-scrub set. > > > > > > Is there anything else to try? Is there a good way to switch > from one kind of failure domain to an other without slow requests? > > > > > > Thank you in advance for any suggestions. > > > > > > Kind regards, > > > Laszlo > > > > > > > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > <mailto:ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com