Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

Paul Emmerich Tue, 17 Apr 2018 08:35:28 -0700

2018-04-16 18:24 GMT+02:00 Oliver Freyermuth <freyerm...@physik.uni-bonn.de>
:


> Hi Paul,
>
> Am 16.04.2018 um 17:51 schrieb Paul Emmerich:
> > Hi,
> >
> > can you try to get a stack trace from ganesha (with gdb or from procfs)
> when it's stuck?
>
> I can try, as soon as it happens again. The problem is that it's not fully
> stuck - only the other clients are stuck when trying to access the
> directory in question,
> since it seems the Ganesha server holds a lock and never gives it back.
> Not sure if a backtrace can shed much light on this.
> It seems the NFS server itself is completely happy all the time, it only
> gets problems when I want to stop / restart it.
>

Oh, I assumed the whole process got stuck because you mentioned it not
terminating on SIGTERM. But then it just doesn't handle stuck requests
gracefully there.


>
> But of course, I will try to get a stacktrace next time.
>
> > Also, try to upgrade to ganesha 2.6. I'm running a bigger deployment
> with ~30 ganesha 2.6
> > gateways that are quite stable so far.
>
> That's good news! Do you also have parallel access from ceph-fuse clients
> to the same CephFS? Is this also Luminous?
>

yes and yes


> We are right now using the packages from https://eu.ceph.com/nfs-ganesha/
> since we would like not to have to build NFS Ganesha against Ceph ourselves,
> but would love to just follow upstream to save the maintenance burden. Are
> you building packages yourself, or using a repo maintained upstream?
>

We are building it ourselves. We plan to soon publish our own repository
for it.


Paul


>
> I see there have been some locking fixes upstream:
> https://github.com/nfs-ganesha/nfs-ganesha/search?q=ceph+lock&type=Commits
> but I'm not expert enough to deduce whether they apply to our observation.
>
> Cheers and thanks,
>         Oliver
>
> >
> > Paul
> >
> > 2018-04-16 17:30 GMT+02:00 Oliver Freyermuth <
> freyerm...@physik.uni-bonn.de <mailto:freyerm...@physik.uni-bonn.de>>:
> >
> >     Am 16.04.2018 um 08:58 schrieb Oliver Freyermuth:
> >     > Am 16.04.2018 um 02:43 schrieb Oliver Freyermuth:
> >     >> Am 15.04.2018 um 23:04 schrieb John Spray:
> >     >>> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth
> >     >>> <freyerm...@physik.uni-bonn.de <mailto:freyerm...@physik.uni-
> bonn.de>> wrote:
> >     >>>> Dear Cephalopodians,
> >     >>>>
> >     >>>> in our cluster (CentOS 7.4, EC Pool, Snappy compression,
> Luminous 12.2.4),
> >     >>>> we often have all (~40) clients accessing one file in readonly
> mode, even with multiple processes per client doing that.
> >     >>>>
> >     >>>> Sometimes (I do not yet know when, nor why!) the MDS ends up in
> a situation like:
> >     >>>> -----------------------------------------------------------
> >     >>>> 2018-04-13 18:08:34.378888 7f1ce4472700  0 log_channel(cluster)
> log [WRN] : 292 slow requests, 5 included below; oldest blocked for >
> 1745.864417 secs
> >     >>>> 2018-04-13 18:08:34.378900 7f1ce4472700  0 log_channel(cluster)
> log [WRN] : slow request 960.563534 seconds old, received at 2018-04-13
> 17:52:33.815273: client_request(client.34720:16487379 getattr pAsLsXsFs
> #0x1000009ff6d 2018-04-13 17:52:33.814904 caller_uid=94894,
> caller_gid=513{513,}) currently failed to rdlock, waiting
> >     >>>> 2018-04-13 18:08:34.378904 7f1ce4472700  0 log_channel(cluster)
> log [WRN] : slow request 30.636678 seconds old, received at 2018-04-13
> 18:08:03.742128: client_request(client.34302:16453640 getattr pAsLsXsFs
> #0x1000009ff6d 2018-04-13 18:08:03.741630 caller_uid=94894,
> caller_gid=513{513,}) currently failed to rdlock, waiting
> >     >>>> 2018-04-13 18:08:34.378908 7f1ce4472700  0 log_channel(cluster)
> log [WRN] : slow request 972.648926 seconds old, received at 2018-04-13
> 17:52:21.729881: client_request(client.34720:16487334 lookup
> #0x1000001fcab/sometarball.tar.gz 2018-04-13 17:52:21.729450
> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
> >     >>>> 2018-04-13 18:08:34.378913 7f1ce4472700  0 log_channel(cluster)
> log [WRN] : slow request 1685.953657 seconds old, received at 2018-04-13
> 17:40:28.425149: client_request(client.34810:16564864 lookup
> #0x1000001fcab/sometarball.tar.gz 2018-04-13 17:40:28.424961
> caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting
> >     >>>> 2018-04-13 18:08:34.378918 7f1ce4472700  0 log_channel(cluster)
> log [WRN] : slow request 1552.743795 seconds old, received at 2018-04-13
> 17:42:41.635012: client_request(client.34302:16453566 getattr pAsLsXsFs
> #0x1000009ff6d 2018-04-13 17:42:41.634726 caller_uid=94894,
> caller_gid=513{513,}) currently failed to rdlock, waiting
> >     >>>> -----------------------------------------------------------
> >     >>>> As you can see (oldest blocked for > 1745.864417 secs) it stays
> in that situation for quite a while.
> >     >>>> The number of blocked requests is also not decreasing, but
> instead slowly increasing whenever a new request is added to the queue.
> >     >>>>
> >     >>>> We have a setup of one active MDS, a standby-replay, and a
> standby.
> >     >>>> Triggering a failover does not help, it only resets the "oldest
> blocked" time.
> >     >>>
> >     >>> Sounds like a client issue (a client is holding a lock on a file
> but
> >     >>> failing to relinquish it for another client's request to be
> >     >>> processed).
> >     >>>
> >     >>> Are these kernel (and what version?) or fuse clients?
> >     >>
> >     >> The full cluster is running with Fuse clients, all on 12.2.4.
> >     >> Additionally, there are 6 NFS Ganesha servers using FSAL_CEPH,
> i.e. libcephfs.
> >     >>
> >     >> Cheers,
> >     >>      Oliver
> >     >
> >     > Related to that: In case it happens again, which it surely will
> sooner or later, how can I diagnose
> >     > which client is holding a lock and not relinquishing it?
> >     > Is there a way to dump all held locks, ideally with the time
> period how long they were held?
> >     >
> >     > ceph daemon mds.XXX help
> >     > does not yield anything obvious.
> >     >
> >     > Cheers and thanks,
> >     >       Oliver
> >
> >     It happened again - and I managed to track down the client causing
> it:
> >     Dumping all operations in progress on the MDS, and looking for
> machines without operations which were waiting for an rdlock,
> >     one NFS Ganesha server was not affected.
> >     Unmounting the NFS share on the clients did not have any effect -
> things stayed in "stuck" state.
> >
> >     I then stopped the NFS Ganesha server. This took quite a while,
> until systemd noticed it's taking too long, and force-killed the server.
> >     Shortly after, even before the NFS Ganesha libchepfs client was
> evicted, the MDS got unstuck, processed all requests within about 2
> seconds, and everything was fine again.
> >
> >     After that, the dead NFS Ganesha client was evicted, and I just
> restarted the NFS Ganesha server.
> >
> >     Since this appears to be reproducible (even though we still do not
> know how exactly), this is a rather ugly issue. It seems NFS Ganesha with
> libcephfs is holding locks
> >     and not returning them. We are using the packages from:
> >     https://eu.ceph.com/nfs-ganesha/ <https://eu.ceph.com/nfs-ganesha/>
> >     Newer versions are available upstream, but I don't know if they fix
> this critial issue.
> >
> >     Is there somebody else with experience on that?
> >
> >     As far as I can reproduce, the I/O pattern is:
> >     1. Machine with NFS-mount checks datafolder/sometarball.tar.gz
> exists.
> >     2. Machines in the cluster (with ceph-fuse) all access
> datafolder/sometarball.tar.gz (and extract it to a local filesystem)
> >
> >     (1) may happen several times while (2) is going on, and it seems
> this somehow causes the NFS-server to (sometimes?) keep a persistent lock
> which is never relinquished.
> >
> >     Any help appreciated!
> >     Oliver
> >
> >
> >     >
> >     >>
> >     >>>
> >     >>> John
> >     >>>
> >     >>>>
> >     >>>> I checked the following things on the active MDS:
> >     >>>> -----------------------------------------------------------
> >     >>>> # ceph daemon mds.mon001 objecter_requests
> >     >>>> {
> >     >>>>     "ops": [],
> >     >>>>     "linger_ops": [],
> >     >>>>     "pool_ops": [],
> >     >>>>     "pool_stat_ops": [],
> >     >>>>     "statfs_ops": [],
> >     >>>>     "command_ops": []
> >     >>>> }
> >     >>>> # ceph daemon mds.mon001 ops | grep event | grep -v "initiated"
> | grep -v "failed to rdlock" | grep -v events
> >     >>>> => no output, only "initiated" and "rdlock" are in the queue.
> >     >>>> -----------------------------------------------------------
> >     >>>>
> >     >>>> There's also almost no CPU load, almost no other I/O, and ceph
> is deep-scrubbing ~pg (this also finishes and the next pg is scrubbed fine),
> >     >>>> and the scrubbing is not even happening in the metadata pool
> (easy to see in the Luminous dashboard):
> >     >>>> -----------------------------------------------------------
> >     >>>> # ceph -s
> >     >>>>   cluster:
> >     >>>>     id:     some_funny_hash
> >     >>>>     health: HEALTH_WARN
> >     >>>>             1 MDSs report slow requests
> >     >>>>
> >     >>>>   services:
> >     >>>>     mon: 3 daemons, quorum mon003,mon001,mon002
> >     >>>>     mgr: mon001(active), standbys: mon002, mon003
> >     >>>>     mds: cephfs_baf-1/1/1 up  {0=mon001=up:active}, 1
> up:standby-replay, 1 up:standby
> >     >>>>     osd: 196 osds: 196 up, 196 in
> >     >>>>
> >     >>>>   data:
> >     >>>>     pools:   2 pools, 4224 pgs
> >     >>>>     objects: 15649k objects, 61761 GB
> >     >>>>     usage:   114 TB used, 586 TB / 700 TB avail
> >     >>>>     pgs:     4223 active+clean
> >     >>>>              1    active+clean+scrubbing+deep
> >     >>>>
> >     >>>>   io:
> >     >>>>     client:   175 kB/s rd, 3 op/s rd, 0 op/s wr
> >     >>>> -----------------------------------------------------------
> >     >>>>
> >     >>>> Does anybody have any idea what's going on here?
> >     >>>>
> >     >>>> Yesterday, this also happened, but resolved itself after about
> 1 hour.
> >     >>>> Right now, it's going on for about half an hour...
> >     >>>>
> >     >>>> Cheers,
> >     >>>>         Oliver
> >     >>>>
> >     >>>>
> >     >>>> _______________________________________________
> >     >>>> ceph-users mailing list
> >     >>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >     >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >     >>>>
> >     >>
> >     >>
> >     >>
> >     >> _______________________________________________
> >     >> ceph-users mailing list
> >     >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >     >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >     >>
> >     >
> >
> >
> >
> >
> >     _______________________________________________
> >     ceph-users mailing list
> >     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >
> >
> >
> >
> > --
> > --
> > Paul Emmerich
> >
> > croit GmbH
> > Freseniusstr. 31h
> > 81247 München
> > www.croit.io <http://www.croit.io>
> > Tel: +49 89 1896585 90
>
>
>


-- 
-- 
Paul Emmerich

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

Reply via email to