> On 24 Oct 2016, at 17:29, Nick Fisk <n...@fisk.me.uk> wrote: > >> -----Original Message----- >> From: Yan, Zheng [mailto:uker...@gmail.com] >> Sent: 24 October 2016 10:19 >> To: Gregory Farnum <gfar...@redhat.com> >> Cc: Nick Fisk <n...@fisk.me.uk>; Zheng Yan <z...@redhat.com>; Ceph Users >> <ceph-users@lists.ceph.com> >> Subject: Re: [ceph-users] Ceph and TCP States >> >> X-Assp-URIBL failed: 'ceph-users-ceph.com'(black.uribl.com ) >> X-Assp-Spam-Level: ***** >> X-Assp-Envelope-From: uker...@gmail.com >> X-Assp-Intended-For: n...@fisk.me.uk >> X-Assp-ID: ASSP.fisk.me.uk (47730-03772) >> X-Assp-Version: 1.9.1.4(1.0.00) >> >> On Sat, Oct 22, 2016 at 4:14 AM, Gregory Farnum <gfar...@redhat.com> wrote: >>> On Fri, Oct 21, 2016 at 7:56 AM, Nick Fisk <n...@fisk.me.uk> wrote: >>>>> -----Original Message----- >>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On >>>>> Behalf Of Haomai Wang >>>>> Sent: 21 October 2016 15:40 >>>>> To: Nick Fisk <n...@fisk.me.uk> >>>>> Cc: ceph-users@lists.ceph.com >>>>> Subject: Re: [ceph-users] Ceph and TCP States >>>>> >>>>> >>>>> >>>>> On Fri, Oct 21, 2016 at 10:31 PM, Nick Fisk <mailto:n...@fisk.me.uk> >>>>> wrote: >>>>>> -----Original Message----- >>>>>> From: ceph-users [mailto:mailto:ceph-users-boun...@lists.ceph.com] >>>>>> On Behalf Of Haomai Wang >>>>>> Sent: 21 October 2016 15:28 >>>>>> To: Nick Fisk <mailto:n...@fisk.me.uk> >>>>>> Cc: mailto:ceph-users@lists.ceph.com >>>>>> Subject: Re: [ceph-users] Ceph and TCP States >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Oct 21, 2016 at 10:19 PM, Nick Fisk >>>>>> <mailto:mailto:n...@fisk.me.uk> wrote: >>>>>> Hi, >>>>>> >>>>>> I'm just testing out using a Ceph client in a DMZ behind a FW from >>>>>> the main Ceph cluster. One thing I have noticed is that if the >>>>>> state table on the FW is emptied maybe by restarting it or just clearing >>>>>> the state table...etc. Then the Ceph client will hang for a >> long time as the TCP session can no longer pass through the FW and just gets >> blocked instead. >>>>>> >>>>>> This "FW" is linux firewall or hardware FW? >>>>> >>>>> PFSense running on dedicated HW. Eventually they will be in a HA pair so >>>>> states should persist, but trying to work around this for >> now. >>>>> Bit annoying having CephFS lock hard for 15 minutes even though the >>>>> network connection only went down for a few seconds. >>>>> >>>>> hmm, I'm not familiar with this fw. And from my view, whether >>>>> RST packet sent is decided by FW. But I think you can try >>>>> "/proc/sys/net/ipv4/tcp_keepalive_time", if FW reset tcp session, tcp >> keepalive should detect and send a rst. >>>> >>>> Yeah I think that’s where the problem lies. Most Firewalls tend to >>>> silently drop denied packets without sending RST's, so Ceph >> effectively just thinks that its experiencing packet loss and will never >> retry until the 15 minute timeout period is up. Am I right in >> thinking I can't tune down this parameter for a CephFS kernel client as it >> doesn't use the ceph.conf file? >>> >>> The kernel client has a lot of mount options and can be configured in >>> a few ways via debugfs et al; I think there's a setting for the >>> timeout as well. If you can't find it, I'm sure Zheng knows. :) -Greg >> >> So far, there is no mount option to control keepalive time for client-to-mds >> connection. > > I think, although can't be 100%, that most of the problem is around > client<->mon traffic. I'm pretty sure I saw a timeout to one of the mons > flash up on the screen just before everything sprung back into life. >
which version of kernel are you using? kernel client supports keepalive2 since 4.3 kernel. keepalive2 is supposed to detect the connection issue. >> >>> >>>> >>>>> >>>>>> >>>>>> >>>>>> I believe this behaviour can be adjusted by the "ms tcp read >>>>>> timeout" setting to limit its impact, but wondering if anybody has >>>>>> any other ideas. I'm also thinking of experimenting with either >>>>>> stateless FW rules for Ceph or getting the FW to send back RST >> packets instead of silently dropping packets. >>>>>> >>>>>> hmm, I think it depends on FW >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Nick >>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> mailto:mailto:ceph-users@lists.ceph.com >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> mailto:ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com