Thanks to all, now we can make that duration to 25 seconds around, this is
the best result as we can.

BR

On Tue, Dec 3, 2019 at 10:30 PM Wido den Hollander <w...@42on.com> wrote:

>
>
> On 12/3/19 3:07 PM, Aleksey Gutikov wrote:
> >
> >> That is true. When an OSD goes down it will take a few seconds for it's
> >> Placement Groups to re-peer with the other OSDs. During that period
> >> writes to those PGs will stall for a couple of seconds.
> >>
> >> I wouldn't say it's 40s, but it can take ~10s.
> >
> > Hello,
> >
> > According to my experience, in case of OSD crashes, killed -9 (any kind
> > abnormat termination) OSD failure handling contains next steps:
> > 1) Failed OSD's peers detect that it does not respond - it can take up
> > to osd_heartbeat_grace + osd_heartbeat_interval seconds
>
> If a 'Connection Refused' is detected the OSD will be marked as down
> right away.
>
> > 2) Peers send reports to monitor
> > 3) Monitor makes a decision according to (options from it's own config)
> > mon_osd_adjust_heartbeat_grace, osd_heartbeat_grace,
> > mon_osd_laggy_halflife, mon_osd_min_down_reporters, ... And finally mark
> > OSD down in osdmap.
>
> True.
>
> > 4) Monitor send updated OSDmap to OSDs and clients
> > 5) OSDs starting peering
> > 5.1) Peering itself is complicated process, for example we had
> > experienced PGs stuck in inactive state due to
> > osd_max_pg_per_osd_hard_ratio.
>
> I would say that 5.1 isn't relevant for most cases. Yes, it can happen,
> but it's rare.
>
> > 6) Peering finished (PGs' data continue moving) - clients can normally
> > access affected PGs. Clients also have their own timeouts that can
> > affect time to recover. >
> > Again, according to my experience, 40s with default settings is possible.
> >
>
> 40s is possible in certain scenarios. But I wouldn't say that's the
> default for all cases.
>
> Wido
>
> >
>


-- 
The modern Unified Communications provider

https://www.portsip.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to