2018-04-16 18:24 GMT+02:00 Oliver Freyermuth <freyerm...@physik.uni-bonn.de> :
> Hi Paul, > > Am 16.04.2018 um 17:51 schrieb Paul Emmerich: > > Hi, > > > > can you try to get a stack trace from ganesha (with gdb or from procfs) > when it's stuck? > > I can try, as soon as it happens again. The problem is that it's not fully > stuck - only the other clients are stuck when trying to access the > directory in question, > since it seems the Ganesha server holds a lock and never gives it back. > Not sure if a backtrace can shed much light on this. > It seems the NFS server itself is completely happy all the time, it only > gets problems when I want to stop / restart it. > Oh, I assumed the whole process got stuck because you mentioned it not terminating on SIGTERM. But then it just doesn't handle stuck requests gracefully there. > > But of course, I will try to get a stacktrace next time. > > > Also, try to upgrade to ganesha 2.6. I'm running a bigger deployment > with ~30 ganesha 2.6 > > gateways that are quite stable so far. > > That's good news! Do you also have parallel access from ceph-fuse clients > to the same CephFS? Is this also Luminous? > yes and yes > We are right now using the packages from https://eu.ceph.com/nfs-ganesha/ > since we would like not to have to build NFS Ganesha against Ceph ourselves, > but would love to just follow upstream to save the maintenance burden. Are > you building packages yourself, or using a repo maintained upstream? > We are building it ourselves. We plan to soon publish our own repository for it. Paul > > I see there have been some locking fixes upstream: > https://github.com/nfs-ganesha/nfs-ganesha/search?q=ceph+lock&type=Commits > but I'm not expert enough to deduce whether they apply to our observation. > > Cheers and thanks, > Oliver > > > > > Paul > > > > 2018-04-16 17:30 GMT+02:00 Oliver Freyermuth < > freyerm...@physik.uni-bonn.de <mailto:freyerm...@physik.uni-bonn.de>>: > > > > Am 16.04.2018 um 08:58 schrieb Oliver Freyermuth: > > > Am 16.04.2018 um 02:43 schrieb Oliver Freyermuth: > > >> Am 15.04.2018 um 23:04 schrieb John Spray: > > >>> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth > > >>> <freyerm...@physik.uni-bonn.de <mailto:freyerm...@physik.uni- > bonn.de>> wrote: > > >>>> Dear Cephalopodians, > > >>>> > > >>>> in our cluster (CentOS 7.4, EC Pool, Snappy compression, > Luminous 12.2.4), > > >>>> we often have all (~40) clients accessing one file in readonly > mode, even with multiple processes per client doing that. > > >>>> > > >>>> Sometimes (I do not yet know when, nor why!) the MDS ends up in > a situation like: > > >>>> ----------------------------------------------------------- > > >>>> 2018-04-13 18:08:34.378888 7f1ce4472700 0 log_channel(cluster) > log [WRN] : 292 slow requests, 5 included below; oldest blocked for > > 1745.864417 secs > > >>>> 2018-04-13 18:08:34.378900 7f1ce4472700 0 log_channel(cluster) > log [WRN] : slow request 960.563534 seconds old, received at 2018-04-13 > 17:52:33.815273: client_request(client.34720:16487379 getattr pAsLsXsFs > #0x1000009ff6d 2018-04-13 17:52:33.814904 caller_uid=94894, > caller_gid=513{513,}) currently failed to rdlock, waiting > > >>>> 2018-04-13 18:08:34.378904 7f1ce4472700 0 log_channel(cluster) > log [WRN] : slow request 30.636678 seconds old, received at 2018-04-13 > 18:08:03.742128: client_request(client.34302:16453640 getattr pAsLsXsFs > #0x1000009ff6d 2018-04-13 18:08:03.741630 caller_uid=94894, > caller_gid=513{513,}) currently failed to rdlock, waiting > > >>>> 2018-04-13 18:08:34.378908 7f1ce4472700 0 log_channel(cluster) > log [WRN] : slow request 972.648926 seconds old, received at 2018-04-13 > 17:52:21.729881: client_request(client.34720:16487334 lookup > #0x1000001fcab/sometarball.tar.gz 2018-04-13 17:52:21.729450 > caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting > > >>>> 2018-04-13 18:08:34.378913 7f1ce4472700 0 log_channel(cluster) > log [WRN] : slow request 1685.953657 seconds old, received at 2018-04-13 > 17:40:28.425149: client_request(client.34810:16564864 lookup > #0x1000001fcab/sometarball.tar.gz 2018-04-13 17:40:28.424961 > caller_uid=94894, caller_gid=513{513,}) currently failed to rdlock, waiting > > >>>> 2018-04-13 18:08:34.378918 7f1ce4472700 0 log_channel(cluster) > log [WRN] : slow request 1552.743795 seconds old, received at 2018-04-13 > 17:42:41.635012: client_request(client.34302:16453566 getattr pAsLsXsFs > #0x1000009ff6d 2018-04-13 17:42:41.634726 caller_uid=94894, > caller_gid=513{513,}) currently failed to rdlock, waiting > > >>>> ----------------------------------------------------------- > > >>>> As you can see (oldest blocked for > 1745.864417 secs) it stays > in that situation for quite a while. > > >>>> The number of blocked requests is also not decreasing, but > instead slowly increasing whenever a new request is added to the queue. > > >>>> > > >>>> We have a setup of one active MDS, a standby-replay, and a > standby. > > >>>> Triggering a failover does not help, it only resets the "oldest > blocked" time. > > >>> > > >>> Sounds like a client issue (a client is holding a lock on a file > but > > >>> failing to relinquish it for another client's request to be > > >>> processed). > > >>> > > >>> Are these kernel (and what version?) or fuse clients? > > >> > > >> The full cluster is running with Fuse clients, all on 12.2.4. > > >> Additionally, there are 6 NFS Ganesha servers using FSAL_CEPH, > i.e. libcephfs. > > >> > > >> Cheers, > > >> Oliver > > > > > > Related to that: In case it happens again, which it surely will > sooner or later, how can I diagnose > > > which client is holding a lock and not relinquishing it? > > > Is there a way to dump all held locks, ideally with the time > period how long they were held? > > > > > > ceph daemon mds.XXX help > > > does not yield anything obvious. > > > > > > Cheers and thanks, > > > Oliver > > > > It happened again - and I managed to track down the client causing > it: > > Dumping all operations in progress on the MDS, and looking for > machines without operations which were waiting for an rdlock, > > one NFS Ganesha server was not affected. > > Unmounting the NFS share on the clients did not have any effect - > things stayed in "stuck" state. > > > > I then stopped the NFS Ganesha server. This took quite a while, > until systemd noticed it's taking too long, and force-killed the server. > > Shortly after, even before the NFS Ganesha libchepfs client was > evicted, the MDS got unstuck, processed all requests within about 2 > seconds, and everything was fine again. > > > > After that, the dead NFS Ganesha client was evicted, and I just > restarted the NFS Ganesha server. > > > > Since this appears to be reproducible (even though we still do not > know how exactly), this is a rather ugly issue. It seems NFS Ganesha with > libcephfs is holding locks > > and not returning them. We are using the packages from: > > https://eu.ceph.com/nfs-ganesha/ <https://eu.ceph.com/nfs-ganesha/> > > Newer versions are available upstream, but I don't know if they fix > this critial issue. > > > > Is there somebody else with experience on that? > > > > As far as I can reproduce, the I/O pattern is: > > 1. Machine with NFS-mount checks datafolder/sometarball.tar.gz > exists. > > 2. Machines in the cluster (with ceph-fuse) all access > datafolder/sometarball.tar.gz (and extract it to a local filesystem) > > > > (1) may happen several times while (2) is going on, and it seems > this somehow causes the NFS-server to (sometimes?) keep a persistent lock > which is never relinquished. > > > > Any help appreciated! > > Oliver > > > > > > > > > >> > > >>> > > >>> John > > >>> > > >>>> > > >>>> I checked the following things on the active MDS: > > >>>> ----------------------------------------------------------- > > >>>> # ceph daemon mds.mon001 objecter_requests > > >>>> { > > >>>> "ops": [], > > >>>> "linger_ops": [], > > >>>> "pool_ops": [], > > >>>> "pool_stat_ops": [], > > >>>> "statfs_ops": [], > > >>>> "command_ops": [] > > >>>> } > > >>>> # ceph daemon mds.mon001 ops | grep event | grep -v "initiated" > | grep -v "failed to rdlock" | grep -v events > > >>>> => no output, only "initiated" and "rdlock" are in the queue. > > >>>> ----------------------------------------------------------- > > >>>> > > >>>> There's also almost no CPU load, almost no other I/O, and ceph > is deep-scrubbing ~pg (this also finishes and the next pg is scrubbed fine), > > >>>> and the scrubbing is not even happening in the metadata pool > (easy to see in the Luminous dashboard): > > >>>> ----------------------------------------------------------- > > >>>> # ceph -s > > >>>> cluster: > > >>>> id: some_funny_hash > > >>>> health: HEALTH_WARN > > >>>> 1 MDSs report slow requests > > >>>> > > >>>> services: > > >>>> mon: 3 daemons, quorum mon003,mon001,mon002 > > >>>> mgr: mon001(active), standbys: mon002, mon003 > > >>>> mds: cephfs_baf-1/1/1 up {0=mon001=up:active}, 1 > up:standby-replay, 1 up:standby > > >>>> osd: 196 osds: 196 up, 196 in > > >>>> > > >>>> data: > > >>>> pools: 2 pools, 4224 pgs > > >>>> objects: 15649k objects, 61761 GB > > >>>> usage: 114 TB used, 586 TB / 700 TB avail > > >>>> pgs: 4223 active+clean > > >>>> 1 active+clean+scrubbing+deep > > >>>> > > >>>> io: > > >>>> client: 175 kB/s rd, 3 op/s rd, 0 op/s wr > > >>>> ----------------------------------------------------------- > > >>>> > > >>>> Does anybody have any idea what's going on here? > > >>>> > > >>>> Yesterday, this also happened, but resolved itself after about > 1 hour. > > >>>> Right now, it's going on for about half an hour... > > >>>> > > >>>> Cheers, > > >>>> Oliver > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> ceph-users mailing list > > >>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com < > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > >>>> > > >> > > >> > > >> > > >> _______________________________________________ > > >> ceph-users mailing list > > >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com < > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > >> > > > > > > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com < > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > > > > > > > -- > > -- > > Paul Emmerich > > > > croit GmbH > > Freseniusstr. 31h > > 81247 München > > www.croit.io <http://www.croit.io> > > Tel: +49 89 1896585 90 > > > -- -- Paul Emmerich croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com