Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

Paul Emmerich Mon, 16 Apr 2018 08:51:38 -0700

Hi,

can you try to get a stack trace from ganesha (with gdb or from procfs)
when it's stuck?
Also, try to upgrade to ganesha 2.6. I'm running a bigger deployment with
~30 ganesha 2.6
gateways that are quite stable so far.


Paul

2018-04-16 17:30 GMT+02:00 Oliver Freyermuth <[email protected]>
:

> Am 16.04.2018 um 08:58 schrieb Oliver Freyermuth:
> > Am 16.04.2018 um 02:43 schrieb Oliver Freyermuth:
> >> Am 15.04.2018 um 23:04 schrieb John Spray:
> >>> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth
> >>> <[email protected]> wrote:
> >>>> Dear Cephalopodians,
> >>>>
> >>>> in our cluster (CentOS 7.4, EC Pool, Snappy compression, Luminous
> 12.2.4),
> >>>> we often have all (~40) clients accessing one file in readonly mode,
> even with multiple processes per client doing that.
> >>>>
> >>>> Sometimes (I do not yet know when, nor why!) the MDS ends up in a
> situation like:
> >>>> -----------------------------------------------------------
> >>>> 2018-04-13 18:08:34.378888 7f1ce4472700  0 log_channel(cluster) log
> [WRN] : 292 slow requests, 5 included below; oldest blocked for >
> 1745.864417 secs
> >>>> 2018-04-13 18:08:34.378900 7f1ce4472700  0 log_channel(cluster) log
> [WRN] : slow request 960.563534 seconds old, received at 2018-04-13
> 17:52:33.815273: client_request(client.34720:16487379 getattr pAsLsXsFs
> #0x1000009ff6d 2018-04-13 17:52:33.814904 caller_uid=94894,
> caller_gid=513{513,}) currently failed to rdlock, waiting
> >>>> 2018-04-13 18:08:34.378904 7f1ce4472700  0 log_channel(cluster) log
> [WRN] : slow request 30.636678 seconds old, received at 2018-04-13
> 18:08:03.742128: client_request(client.34302:16453640 getattr pAsLsXsFs
> #0x1000009ff6d 2018-04-13 18:08:03.741630 caller_uid=94894,
> caller_gid=513{513,}) currently failed to rdlock, waiting
> >>>> 2018-04-13 18:08:34.378908 7f1ce4472700  0 log_channel(cluster) log
> [WRN] : slow request 972.648926 seconds old, received at 2018-04-13
> 17:52:21.729881: client_request(client.34720:16487334 lookup
> #0x1000001fcab/sometarball.tar.gz 2018-04-13 17:52:21.729450
> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
> >>>> 2018-04-13 18:08:34.378913 7f1ce4472700  0 log_channel(cluster) log
> [WRN] : slow request 1685.953657 seconds old, received at 2018-04-13
> 17:40:28.425149: client_request(client.34810:16564864 lookup
> #0x1000001fcab/sometarball.tar.gz 2018-04-13 17:40:28.424961
> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
> >>>> 2018-04-13 18:08:34.378918 7f1ce4472700  0 log_channel(cluster) log
> [WRN] : slow request 1552.743795 seconds old, received at 2018-04-13
> 17:42:41.635012: client_request(client.34302:16453566 getattr pAsLsXsFs
> #0x1000009ff6d 2018-04-13 17:42:41.634726 caller_uid=94894,
> caller_gid=513{513,}) currently failed to rdlock, waiting
> >>>> -----------------------------------------------------------
> >>>> As you can see (oldest blocked for > 1745.864417 secs) it stays in
> that situation for quite a while.
> >>>> The number of blocked requests is also not decreasing, but instead
> slowly increasing whenever a new request is added to the queue.
> >>>>
> >>>> We have a setup of one active MDS, a standby-replay, and a standby.
> >>>> Triggering a failover does not help, it only resets the "oldest
> blocked" time.
> >>>
> >>> Sounds like a client issue (a client is holding a lock on a file but
> >>> failing to relinquish it for another client's request to be
> >>> processed).
> >>>
> >>> Are these kernel (and what version?) or fuse clients?
> >>
> >> The full cluster is running with Fuse clients, all on 12.2.4.
> >> Additionally, there are 6 NFS Ganesha servers using FSAL_CEPH, i.e.
> libcephfs.
> >>
> >> Cheers,
> >>      Oliver
> >
> > Related to that: In case it happens again, which it surely will sooner
> or later, how can I diagnose
> > which client is holding a lock and not relinquishing it?
> > Is there a way to dump all held locks, ideally with the time period how
> long they were held?
> >
> > ceph daemon mds.XXX help
> > does not yield anything obvious.
> >
> > Cheers and thanks,
> >       Oliver
>
> It happened again - and I managed to track down the client causing it:
> Dumping all operations in progress on the MDS, and looking for machines
> without operations which were waiting for an rdlock,
> one NFS Ganesha server was not affected.
> Unmounting the NFS share on the clients did not have any effect - things
> stayed in "stuck" state.
>
> I then stopped the NFS Ganesha server. This took quite a while, until
> systemd noticed it's taking too long, and force-killed the server.
> Shortly after, even before the NFS Ganesha libchepfs client was evicted,
> the MDS got unstuck, processed all requests within about 2 seconds, and
> everything was fine again.
>
> After that, the dead NFS Ganesha client was evicted, and I just restarted
> the NFS Ganesha server.
>
> Since this appears to be reproducible (even though we still do not know
> how exactly), this is a rather ugly issue. It seems NFS Ganesha with
> libcephfs is holding locks
> and not returning them. We are using the packages from:
> https://eu.ceph.com/nfs-ganesha/
> Newer versions are available upstream, but I don't know if they fix this
> critial issue.
>
> Is there somebody else with experience on that?
>
> As far as I can reproduce, the I/O pattern is:
> 1. Machine with NFS-mount checks datafolder/sometarball.tar.gz exists.
> 2. Machines in the cluster (with ceph-fuse) all access
> datafolder/sometarball.tar.gz (and extract it to a local filesystem)
>
> (1) may happen several times while (2) is going on, and it seems this
> somehow causes the NFS-server to (sometimes?) keep a persistent lock which
> is never relinquished.
>
> Any help appreciated!
> Oliver
>
>
> >
> >>
> >>>
> >>> John
> >>>
> >>>>
> >>>> I checked the following things on the active MDS:
> >>>> -----------------------------------------------------------
> >>>> # ceph daemon mds.mon001 objecter_requests
> >>>> {
> >>>>     "ops": [],
> >>>>     "linger_ops": [],
> >>>>     "pool_ops": [],
> >>>>     "pool_stat_ops": [],
> >>>>     "statfs_ops": [],
> >>>>     "command_ops": []
> >>>> }
> >>>> # ceph daemon mds.mon001 ops | grep event | grep -v "initiated" |
> grep -v "failed to rdlock" | grep -v events
> >>>> => no output, only "initiated" and "rdlock" are in the queue.
> >>>> -----------------------------------------------------------
> >>>>
> >>>> There's also almost no CPU load, almost no other I/O, and ceph is
> deep-scrubbing ~pg (this also finishes and the next pg is scrubbed fine),
> >>>> and the scrubbing is not even happening in the metadata pool (easy to
> see in the Luminous dashboard):
> >>>> -----------------------------------------------------------
> >>>> # ceph -s
> >>>>   cluster:
> >>>>     id:     some_funny_hash
> >>>>     health: HEALTH_WARN
> >>>>             1 MDSs report slow requests
> >>>>
> >>>>   services:
> >>>>     mon: 3 daemons, quorum mon003,mon001,mon002
> >>>>     mgr: mon001(active), standbys: mon002, mon003
> >>>>     mds: cephfs_baf-1/1/1 up  {0=mon001=up:active}, 1
> up:standby-replay, 1 up:standby
> >>>>     osd: 196 osds: 196 up, 196 in
> >>>>
> >>>>   data:
> >>>>     pools:   2 pools, 4224 pgs
> >>>>     objects: 15649k objects, 61761 GB
> >>>>     usage:   114 TB used, 586 TB / 700 TB avail
> >>>>     pgs:     4223 active+clean
> >>>>              1    active+clean+scrubbing+deep
> >>>>
> >>>>   io:
> >>>>     client:   175 kB/s rd, 3 op/s rd, 0 op/s wr
> >>>> -----------------------------------------------------------
> >>>>
> >>>> Does anybody have any idea what's going on here?
> >>>>
> >>>> Yesterday, this also happened, but resolved itself after about 1 hour.
> >>>> Right now, it's going on for about half an hour...
> >>>>
> >>>> Cheers,
> >>>>         Oliver
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> [email protected]
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> [email protected]
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
-- 
Paul Emmerich

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

Reply via email to