Hi Venky,

As I said: There are no laggy OSDs. The maximum ping I have for any OSD in ceph osd perf is around 60ms (just a handful, probably aging disks). The vast majority of OSDs have ping times of less than 1ms. Same for the host machines, yet I'm still seeing this message. It seems that the affected hosts are usually the same, but I have absolutely no clue why.

Janek


On 19/09/2023 12:36, Venky Shankar wrote:
Hi Janek,

On Mon, Sep 18, 2023 at 9:52 PM Janek Bevendorff <janek.bevendo...@uni-weimar.de> wrote:

    Thanks! However, I still don't really understand why I am seeing this.


This is due to a changes that was merged recently in pacific

https://github.com/ceph/ceph/pull/52270

The MDS would not evict laggy clients if the OSDs report as laggy. Laggy OSDs can cause cephfs clients to not flush dirty data (during cap revokes by the MDS) and thereby showing up as laggy and getting evicted by the MDS. This behaviour was changed and therefore you get warnings that some client are laggy but they are not evicted since the OSDs are laggy.

    The first time I had this, one of the clients was a remote user
    dialling in via VPN, which could indeed be laggy. But I am also
    seeing it from neighbouring hosts that are on the same physical
    network with reliable ping times way below 1ms. How is that
    considered laggy?

 Are some of your OSDs reporting laggy? This can be check via `perf dump`

> ceph tell mds.<> perf dump
(search for op_laggy/osd_laggy)


    On 18/09/2023 18:07, Laura Flores wrote:
    Hi Janek,

    There was some documentation added about it here:
    https://docs.ceph.com/en/pacific/cephfs/health-messages/

    There is a description of what it means, and it's tied to an mds
    configurable.

    On Mon, Sep 18, 2023 at 10:51 AM Janek Bevendorff
    <janek.bevendo...@uni-weimar.de> wrote:

        Hey all,

        Since the upgrade to Ceph 16.2.14, I keep seeing the
        following warning:

        10 client(s) laggy due to laggy OSDs

        ceph health detail shows it as:

        [WRN] MDS_CLIENTS_LAGGY: 10 client(s) laggy due to laggy OSDs
             mds.***(mds.3): Client *** is laggy; not evicted because
        some
        OSD(s) is/are laggy
             more of this...

        When I restart the client(s) or the affected MDS daemons, the
        message
        goes away and then comes back after a while. ceph osd perf
        does not list
        any laggy OSDs (a few with 10-60ms ping, but overwhelmingly <
        1ms), so
        I'm on a total loss what this even means.

        I have never seen this message before nor was I able to find
        anything
        about it. Do you have any idea what this message actually
        means and how
        I can get rid of it?

        Thanks
        Janek

        _______________________________________________
        ceph-users mailing list -- ceph-users@ceph.io
        To unsubscribe send an email to ceph-users-le...@ceph.io



--
    Laura Flores

    She/Her/Hers

    Software Engineer, Ceph Storage <https://ceph.io>

    Chicago, IL

    lflo...@ibm.com | lflo...@redhat.com <mailto:lflo...@redhat.com>
    M: +17087388804 <tel:+17087388804>



-- Bauhaus-Universität Weimar
    Bauhausstr. 9a, R308
    99423 Weimar, Germany

    Phone: +49 3643 58 3577
    www.webis.de  <http://www.webis.de>

    _______________________________________________
    ceph-users mailing list -- ceph-users@ceph.io
    To unsubscribe send an email to ceph-users-le...@ceph.io



--
Cheers,
Venky

--
Bauhaus-Universität Weimar
Bauhausstr. 9a, R308
99423 Weimar, Germany

Phone: +49 3643 58 3577
www.webis.de

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to