Thanks to all, now we can make that duration to 25 seconds around, this is the best result as we can.
BR On Tue, Dec 3, 2019 at 10:30 PM Wido den Hollander <w...@42on.com> wrote: > > > On 12/3/19 3:07 PM, Aleksey Gutikov wrote: > > > >> That is true. When an OSD goes down it will take a few seconds for it's > >> Placement Groups to re-peer with the other OSDs. During that period > >> writes to those PGs will stall for a couple of seconds. > >> > >> I wouldn't say it's 40s, but it can take ~10s. > > > > Hello, > > > > According to my experience, in case of OSD crashes, killed -9 (any kind > > abnormat termination) OSD failure handling contains next steps: > > 1) Failed OSD's peers detect that it does not respond - it can take up > > to osd_heartbeat_grace + osd_heartbeat_interval seconds > > If a 'Connection Refused' is detected the OSD will be marked as down > right away. > > > 2) Peers send reports to monitor > > 3) Monitor makes a decision according to (options from it's own config) > > mon_osd_adjust_heartbeat_grace, osd_heartbeat_grace, > > mon_osd_laggy_halflife, mon_osd_min_down_reporters, ... And finally mark > > OSD down in osdmap. > > True. > > > 4) Monitor send updated OSDmap to OSDs and clients > > 5) OSDs starting peering > > 5.1) Peering itself is complicated process, for example we had > > experienced PGs stuck in inactive state due to > > osd_max_pg_per_osd_hard_ratio. > > I would say that 5.1 isn't relevant for most cases. Yes, it can happen, > but it's rare. > > > 6) Peering finished (PGs' data continue moving) - clients can normally > > access affected PGs. Clients also have their own timeouts that can > > affect time to recover. > > > Again, according to my experience, 40s with default settings is possible. > > > > 40s is possible in certain scenarios. But I wouldn't say that's the > default for all cases. > > Wido > > > > -- The modern Unified Communications provider https://www.portsip.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com