Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
On Fri, Jun 21, 2019 at 6:10 PM Frank Schilder wrote: > > Dear Yan, Zheng, > > does mimic 13.2.6 fix the snapshot issue? If not, could you please send me a > link to the issue tracker? > no https://tracker.ceph.com/issues/39987 > Thanks and best regards, > > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Yan, Zheng > Sent: 20 May 2019 13:34 > To: Frank Schilder > Cc: Stefan Kooman; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS > bug?) > > On Sat, May 18, 2019 at 5:47 PM Frank Schilder wrote: > > > > Dear Yan and Stefan, > > > > it happened again and there were only very few ops in the queue. I pulled > > the ops list and the cache. Please find a zip file here: > > "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . Its a > > bit more than 100MB. > > > > MSD cache dump shows there is a snapshot related. Please avoid using > snapshot until we fix the bug. > > Regards > Yan, Zheng ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
Dear Yan, Zheng, does mimic 13.2.6 fix the snapshot issue? If not, could you please send me a link to the issue tracker? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Yan, Zheng Sent: 20 May 2019 13:34 To: Frank Schilder Cc: Stefan Kooman; ceph-users@lists.ceph.com Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?) On Sat, May 18, 2019 at 5:47 PM Frank Schilder wrote: > > Dear Yan and Stefan, > > it happened again and there were only very few ops in the queue. I pulled the > ops list and the cache. Please find a zip file here: > "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . Its a bit > more than 100MB. > MSD cache dump shows there is a snapshot related. Please avoid using snapshot until we fix the bug. Regards Yan, Zheng ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
Dear Yan, thank you for taking care of this. I removed all snapshots and stopped snapshot creation. Please keep me posted. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Yan, Zheng Sent: 20 May 2019 13:34:07 To: Frank Schilder Cc: Stefan Kooman; ceph-users@lists.ceph.com Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?) On Sat, May 18, 2019 at 5:47 PM Frank Schilder wrote: > > Dear Yan and Stefan, > > it happened again and there were only very few ops in the queue. I pulled the > ops list and the cache. Please find a zip file here: > "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . Its a bit > more than 100MB. > MSD cache dump shows there is a snapshot related. Please avoid using snapshot until we fix the bug. Regards Yan, Zheng > The active MDS failed over to the standby after or during the dump cache > operation. Is this expected? As a result, the cluster is healthy and I can't > do further diagnostics. In case you need more information, we have to wait > until next time. > > Some further observations: > > There was no load on the system. I start suspecting that this is not a > load-induced event. It is also not cause by excessive atime updates, the FS > is mounted with relatime. Could it have to do with the large level-2 network > (ca. 550 client servers in the same broadcast domain)? I include our kernel > tuning profile below, just in case. The cluster networks (back and front) are > isolated VLANs, no gateways, no routing. > > We run rolling snapshots on the file system. I didn't observe any problems > with this, but am wondering if this might be related. We have currently 30 > snapshots in total. Here is the output of status and pool ls: > > [root@ceph-01 ~]# ceph status # before the MDS failed over > cluster: > id: ### > health: HEALTH_WARN > 1 MDSs report slow requests > > services: > mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 > mgr: ceph-01(active), standbys: ceph-02, ceph-03 > mds: con-fs-1/1/1 up {0=ceph-08=up:active}, 1 up:standby > osd: 192 osds: 192 up, 192 in > > data: > pools: 5 pools, 750 pgs > objects: 6.35 M objects, 5.2 TiB > usage: 5.1 TiB used, 1.3 PiB / 1.3 PiB avail > pgs: 750 active+clean > > [root@ceph-01 ~]# ceph status # after cache dump and the MDS failed over > cluster: > id: ### > health: HEALTH_OK > > services: > mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 > mgr: ceph-01(active), standbys: ceph-02, ceph-03 > mds: con-fs-1/1/1 up {0=ceph-12=up:active}, 1 up:standby > osd: 192 osds: 192 up, 192 in > > data: > pools: 5 pools, 750 pgs > objects: 6.33 M objects, 5.2 TiB > usage: 5.1 TiB used, 1.3 PiB / 1.3 PiB avail > pgs: 749 active+clean > 1 active+clean+scrubbing+deep > > io: > client: 6.3 KiB/s wr, 0 op/s rd, 0 op/s wr > > [root@ceph-01 ~]# ceph osd pool ls detail # after the MDS failed over > pool 1 'sr-rbd-meta-one' replicated size 3 min_size 2 crush_rule 1 > object_hash rjenkins pg_num 80 pgp_num 80 last_change 486 flags > hashpspool,nodelete,selfmanaged_snaps max_bytes 536870912000 stripe_width 0 > application rbd > removed_snaps [1~5] > pool 2 'sr-rbd-data-one' erasure size 8 min_size 6 crush_rule 5 object_hash > rjenkins pg_num 300 pgp_num 300 last_change 1759 flags > hashpspool,ec_overwrites,nodelete,selfmanaged_snaps max_bytes 274877906944000 > stripe_width 24576 compression_mode aggressive application rbd > removed_snaps [1~3] > pool 3 'sr-rbd-one-stretch' replicated size 4 min_size 2 crush_rule 2 > object_hash rjenkins pg_num 20 pgp_num 20 last_change 500 flags > hashpspool,nodelete,selfmanaged_snaps max_bytes 5497558138880 stripe_width 0 > compression_mode aggressive application rbd > removed_snaps [1~7] > pool 4 'con-fs-meta' replicated size 3 min_size 2 crush_rule 3 object_hash > rjenkins pg_num 50 pgp_num 50 last_change 428 flags hashpspool,nodelete > max_bytes 1099511627776 stripe_width 0 application cephfs > pool 5 'con-fs-data' erasure size 10 min_size 8 crush_rule 6 object_hash > rjenkins pg_num 300 pgp_num 300 last_change 2561 flags > hashpspool,ec_overwrites,nodelete,selfmanaged_snaps max_bytes 21990232200 > stripe_width 32768 compression_mode aggressive application cephfs > removed_snaps > [2~3d,41~2a,6d~2a,99~c,a6~1e,c6~18,df~3,e3~1,e5~3,e9~1,eb~3,ef~1,f1~1,f3~1,f5~3,f9~1,fb~3,ff~1,101~1,103~1,105~1,107~1,109~1,10b~1,10d~1,10f~1,111~1] > > The relevant pools are con-fs-meta and con-fs-data. > > Best regards, >
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
On Sat, May 18, 2019 at 5:47 PM Frank Schilder wrote: > > Dear Yan and Stefan, > > it happened again and there were only very few ops in the queue. I pulled the > ops list and the cache. Please find a zip file here: > "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . Its a bit > more than 100MB. > MSD cache dump shows there is a snapshot related. Please avoid using snapshot until we fix the bug. Regards Yan, Zheng > The active MDS failed over to the standby after or during the dump cache > operation. Is this expected? As a result, the cluster is healthy and I can't > do further diagnostics. In case you need more information, we have to wait > until next time. > > Some further observations: > > There was no load on the system. I start suspecting that this is not a > load-induced event. It is also not cause by excessive atime updates, the FS > is mounted with relatime. Could it have to do with the large level-2 network > (ca. 550 client servers in the same broadcast domain)? I include our kernel > tuning profile below, just in case. The cluster networks (back and front) are > isolated VLANs, no gateways, no routing. > > We run rolling snapshots on the file system. I didn't observe any problems > with this, but am wondering if this might be related. We have currently 30 > snapshots in total. Here is the output of status and pool ls: > > [root@ceph-01 ~]# ceph status # before the MDS failed over > cluster: > id: ### > health: HEALTH_WARN > 1 MDSs report slow requests > > services: > mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 > mgr: ceph-01(active), standbys: ceph-02, ceph-03 > mds: con-fs-1/1/1 up {0=ceph-08=up:active}, 1 up:standby > osd: 192 osds: 192 up, 192 in > > data: > pools: 5 pools, 750 pgs > objects: 6.35 M objects, 5.2 TiB > usage: 5.1 TiB used, 1.3 PiB / 1.3 PiB avail > pgs: 750 active+clean > > [root@ceph-01 ~]# ceph status # after cache dump and the MDS failed over > cluster: > id: ### > health: HEALTH_OK > > services: > mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 > mgr: ceph-01(active), standbys: ceph-02, ceph-03 > mds: con-fs-1/1/1 up {0=ceph-12=up:active}, 1 up:standby > osd: 192 osds: 192 up, 192 in > > data: > pools: 5 pools, 750 pgs > objects: 6.33 M objects, 5.2 TiB > usage: 5.1 TiB used, 1.3 PiB / 1.3 PiB avail > pgs: 749 active+clean > 1 active+clean+scrubbing+deep > > io: > client: 6.3 KiB/s wr, 0 op/s rd, 0 op/s wr > > [root@ceph-01 ~]# ceph osd pool ls detail # after the MDS failed over > pool 1 'sr-rbd-meta-one' replicated size 3 min_size 2 crush_rule 1 > object_hash rjenkins pg_num 80 pgp_num 80 last_change 486 flags > hashpspool,nodelete,selfmanaged_snaps max_bytes 536870912000 stripe_width 0 > application rbd > removed_snaps [1~5] > pool 2 'sr-rbd-data-one' erasure size 8 min_size 6 crush_rule 5 object_hash > rjenkins pg_num 300 pgp_num 300 last_change 1759 flags > hashpspool,ec_overwrites,nodelete,selfmanaged_snaps max_bytes 274877906944000 > stripe_width 24576 compression_mode aggressive application rbd > removed_snaps [1~3] > pool 3 'sr-rbd-one-stretch' replicated size 4 min_size 2 crush_rule 2 > object_hash rjenkins pg_num 20 pgp_num 20 last_change 500 flags > hashpspool,nodelete,selfmanaged_snaps max_bytes 5497558138880 stripe_width 0 > compression_mode aggressive application rbd > removed_snaps [1~7] > pool 4 'con-fs-meta' replicated size 3 min_size 2 crush_rule 3 object_hash > rjenkins pg_num 50 pgp_num 50 last_change 428 flags hashpspool,nodelete > max_bytes 1099511627776 stripe_width 0 application cephfs > pool 5 'con-fs-data' erasure size 10 min_size 8 crush_rule 6 object_hash > rjenkins pg_num 300 pgp_num 300 last_change 2561 flags > hashpspool,ec_overwrites,nodelete,selfmanaged_snaps max_bytes 21990232200 > stripe_width 32768 compression_mode aggressive application cephfs > removed_snaps > [2~3d,41~2a,6d~2a,99~c,a6~1e,c6~18,df~3,e3~1,e5~3,e9~1,eb~3,ef~1,f1~1,f3~1,f5~3,f9~1,fb~3,ff~1,101~1,103~1,105~1,107~1,109~1,10b~1,10d~1,10f~1,111~1] > > The relevant pools are con-fs-meta and con-fs-data. > > Best regards, > Frank > > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > [root@ceph-08 ~]# cat /etc/tuned/ceph/tuned.conf > [main] > summary=Settings for ceph cluster. Derived from throughput-performance. > include=throughput-performance > > [vm] > transparent_hugepages=never > > [sysctl] > # See also: > # - https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt > # - https://www.kernel.org/doc/Documentation/sysctl/net.txt > # - https://cromwell-intl.com/open-source/performance-tuning/tcp.html > # - https://fatmin.com/2015/08/19/ceph-tcp-performance-tuning/ > # - https://www.spinics.net/lists/ceph-devel/msg21721.html > > # Set available PIDs and open files to maximum possible. >
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
Hi Stefan, thanks for being so thorough. I am aware of that. We are still in a pilot phase, which is also the reason that I'm still relatively relaxed about the observed issue. I guess you also noticed that our cluster is almost empty too. I don't have a complete list of storage requirements yet and had to restrict allocation of PGs to a reasonable minimum as with mimic I cannot reduce the PG count of a pool. With the current values I see imbalance but still reasonable performance. Once I have more information about what pools I still need to create, I will aim for the 100 PGs per OSD. I actually plan to give the cephfs a bit higher share for performance reasons. Its on the list. Thanks again and have a good weekend, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Kooman Sent: 18 May 2019 17:41 To: Frank Schilder Cc: Yan, Zheng; ceph-users@lists.ceph.com Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?) Quoting Frank Schilder (fr...@dtu.dk): > > [root@ceph-01 ~]# ceph status # before the MDS failed over > cluster: > id: ### > health: HEALTH_WARN > 1 MDSs report slow requests > > services: > mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 > mgr: ceph-01(active), standbys: ceph-02, ceph-03 > mds: con-fs-1/1/1 up {0=ceph-08=up:active}, 1 up:standby > osd: 192 osds: 192 up, 192 in > > data: > pools: 5 pools, 750 pgs > objects: 6.35 M objects, 5.2 TiB > usage: 5.1 TiB used, 1.3 PiB / 1.3 PiB avail > pgs: 750 active+clean How many pools do you plan to use? You have 5 pools and only 750 PGs total? What hardware do you have for OSDs? If cephfs is your biggest user I would at up to 6150! PGs to your pool(s). Having around ~ 100 PGs per OSD is healthy. The cluster will also be able to balance way better. Math: (100 (PG/OSD) * 192 (# OSDs)) - 750)) / 3 = 6150 for 3 replica pools. You might have a lot of contention going on on your OSDs, they are probably under performing. Gr. Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
Quoting Frank Schilder (fr...@dtu.dk): > > [root@ceph-01 ~]# ceph status # before the MDS failed over > cluster: > id: ### > health: HEALTH_WARN > 1 MDSs report slow requests > > services: > mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 > mgr: ceph-01(active), standbys: ceph-02, ceph-03 > mds: con-fs-1/1/1 up {0=ceph-08=up:active}, 1 up:standby > osd: 192 osds: 192 up, 192 in > > data: > pools: 5 pools, 750 pgs > objects: 6.35 M objects, 5.2 TiB > usage: 5.1 TiB used, 1.3 PiB / 1.3 PiB avail > pgs: 750 active+clean How many pools do you plan to use? You have 5 pools and only 750 PGs total? What hardware do you have for OSDs? If cephfs is your biggest user I would at up to 6150! PGs to your pool(s). Having around ~ 100 PGs per OSD is healthy. The cluster will also be able to balance way better. Math: (100 (PG/OSD) * 192 (# OSDs)) - 750)) / 3 = 6150 for 3 replica pools. You might have a lot of contention going on on your OSDs, they are probably under performing. Gr. Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
Hi Stefan, cc Yan, thanks for your quick reply. > I am pretty sure you hit bug #26982: https://tracker.ceph.com/issues/26982 > "mds: crash when dumping ops in flight". Everything is fine, the daemon did not crash. The dump cache operation seems to be a blocking operation. It simply blocked the MDS on ceph-08 for too long and the mons decided to flip to the MDS on ceph-12. The MDS on ceph-08 is up for almost 5 days: [root@ceph-mds:ceph-08 /]# ps -e -o pid,etime,cmd PID ELAPSED CMD 1 4-21:03:44 /bin/bash /entrypoint.sh mds 190 4-21:03:43 /usr/bin/ceph-mds --cluster ceph --setuser ceph --setgroup ceph -d -i ceph-08 31344 02:42 /bin/bash 31364 00:00 ps -e -o pid,etime,cmd The relevant section from the syslog is (filtered by 'grep -i mds'): May 18 10:20:45 ceph-08 journal: 2019-05-18 08:20:45.400 7f1c99552700 1 mds.ceph-08 asok_command: dump cache (starting...) May 18 10:20:45 ceph-08 journal: 2019-05-18 08:20:45.400 7f1c99552700 1 mds.0.cache dump_cache to /var/log/ceph/mds-case/cache May 18 10:20:51 ceph-01 journal: cluster 2019-05-18 08:20:44.135690 mds.ceph-08 mds.0 192.168.32.72:6800/314672380 2554 : cluster [WRN] 7 slow requests, 0 included below; oldest blocked for > 1931.724397 secs May 18 10:20:51 ceph-03 journal: cluster 2019-05-18 08:20:44.135690 mds.ceph-08 mds.0 192.168.32.72:6800/314672380 2554 : cluster [WRN] 7 slow requests, 0 included below; oldest blocked for > 1931.724397 secs May 18 10:20:51 ceph-02 journal: cluster 2019-05-18 08:20:44.135690 mds.ceph-08 mds.0 192.168.32.72:6800/314672380 2554 : cluster [WRN] 7 slow requests, 0 included below; oldest blocked for > 1931.724397 secs May 18 10:21:01 ceph-08 journal: 2019-05-18 08:21:01.414 7f1c952c1700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 May 18 10:21:01 ceph-08 journal: 2019-05-18 08:21:01.414 7f1c952c1700 0 mds.beacon.ceph-08 _send skipping beacon, heartbeat map not healthy May 18 10:21:03 ceph-08 journal: 2019-05-18 08:21:03.549 7f1c99d53700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 May 18 10:21:05 ceph-08 journal: 2019-05-18 08:21:05.414 7f1c952c1700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 May 18 10:21:05 ceph-08 journal: 2019-05-18 08:21:05.414 7f1c952c1700 0 mds.beacon.ceph-08 _send skipping beacon, heartbeat map not healthy May 18 10:21:08 ceph-08 journal: 2019-05-18 08:21:08.549 7f1c99d53700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 May 18 10:21:09 ceph-08 journal: 2019-05-18 08:21:09.415 7f1c952c1700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 May 18 10:21:09 ceph-08 journal: 2019-05-18 08:21:09.415 7f1c952c1700 0 mds.beacon.ceph-08 _send skipping beacon, heartbeat map not healthy May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.021 7f38552b8700 1 mon.ceph-01@0(leader).mds e16312 no beacon from mds.0.15942 (gid: 327273 addr: 192.168.32.72:6800/314672380 state: up:active) since 15.6064s May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.021 7f38552b8700 1 mon.ceph-01@0(leader).mds e16312 replacing 327273 192.168.32.72:6800/314672380mds.0.15942 up:active with 457451/ceph-12 192.168.32.76:6800/3202682100 May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.021 7f38552b8700 0 log_channel(cluster) log [WRN] : daemon mds.ceph-08 is not responding, replacing it as rank 0 with standby daemon mds.ceph-12 May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.021 7f38552b8700 1 mon.ceph-01@0(leader).mds e16312 fail_mds_gid 327273 mds.ceph-08 role 0 May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.038 7f38552b8700 0 log_channel(cluster) log [WRN] : Health check failed: insufficient standby MDS daemons available (MDS_INSUFFICIENT_STANDBY) May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.038 7f38552b8700 0 log_channel(cluster) log [INF] : Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests) May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.105 7f384eaab700 0 mon.ceph-01@0(leader).mds e16313 new map May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.105 7f384eaab700 0 mon.ceph-01@0(leader).mds e16313 print_map May 18 10:21:13 ceph-01 journal: compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} Sorry, I should have checked this first. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
Quoting Frank Schilder (fr...@dtu.dk): > Dear Yan and Stefan, > > it happened again and there were only very few ops in the queue. I > pulled the ops list and the cache. Please find a zip file here: > "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . > Its a bit more than 100MB. > > The active MDS failed over to the standby after or during the dump > cache operation. Is this expected? As a result, the cluster is healthy > and I can't do further diagnostics. In case you need more information, > we have to wait until next time. > > Some further observations: > > There was no load on the system. I start suspecting that this is not a > load-induced event. It is also not cause by excessive atime updates, the FS > is mounted with relatime. Could it have to do with the large level-2 network > (ca. 550 client servers in the same broadcast domain)? I include our kernel > tuning profile below, just in case. The cluster networks (back and front) are > isolated VLANs, no gateways, no routing. I am pretty sure you hit bug #26982: https://tracker.ceph.com/issues/26982 "mds: crash when dumping ops in flight". So, if you need a reason to update to 13.2.5 there you have it. Sorry that I not realized beforehand you could hit this bug as you're running 13.2.2. So I would update to 13.2.5 and try again. Gr. Stefan -- | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
Dear Yan and Stefan, it happened again and there were only very few ops in the queue. I pulled the ops list and the cache. Please find a zip file here: "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . Its a bit more than 100MB. The active MDS failed over to the standby after or during the dump cache operation. Is this expected? As a result, the cluster is healthy and I can't do further diagnostics. In case you need more information, we have to wait until next time. Some further observations: There was no load on the system. I start suspecting that this is not a load-induced event. It is also not cause by excessive atime updates, the FS is mounted with relatime. Could it have to do with the large level-2 network (ca. 550 client servers in the same broadcast domain)? I include our kernel tuning profile below, just in case. The cluster networks (back and front) are isolated VLANs, no gateways, no routing. We run rolling snapshots on the file system. I didn't observe any problems with this, but am wondering if this might be related. We have currently 30 snapshots in total. Here is the output of status and pool ls: [root@ceph-01 ~]# ceph status # before the MDS failed over cluster: id: ### health: HEALTH_WARN 1 MDSs report slow requests services: mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 mgr: ceph-01(active), standbys: ceph-02, ceph-03 mds: con-fs-1/1/1 up {0=ceph-08=up:active}, 1 up:standby osd: 192 osds: 192 up, 192 in data: pools: 5 pools, 750 pgs objects: 6.35 M objects, 5.2 TiB usage: 5.1 TiB used, 1.3 PiB / 1.3 PiB avail pgs: 750 active+clean [root@ceph-01 ~]# ceph status # after cache dump and the MDS failed over cluster: id: ### health: HEALTH_OK services: mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 mgr: ceph-01(active), standbys: ceph-02, ceph-03 mds: con-fs-1/1/1 up {0=ceph-12=up:active}, 1 up:standby osd: 192 osds: 192 up, 192 in data: pools: 5 pools, 750 pgs objects: 6.33 M objects, 5.2 TiB usage: 5.1 TiB used, 1.3 PiB / 1.3 PiB avail pgs: 749 active+clean 1 active+clean+scrubbing+deep io: client: 6.3 KiB/s wr, 0 op/s rd, 0 op/s wr [root@ceph-01 ~]# ceph osd pool ls detail # after the MDS failed over pool 1 'sr-rbd-meta-one' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 80 pgp_num 80 last_change 486 flags hashpspool,nodelete,selfmanaged_snaps max_bytes 536870912000 stripe_width 0 application rbd removed_snaps [1~5] pool 2 'sr-rbd-data-one' erasure size 8 min_size 6 crush_rule 5 object_hash rjenkins pg_num 300 pgp_num 300 last_change 1759 flags hashpspool,ec_overwrites,nodelete,selfmanaged_snaps max_bytes 274877906944000 stripe_width 24576 compression_mode aggressive application rbd removed_snaps [1~3] pool 3 'sr-rbd-one-stretch' replicated size 4 min_size 2 crush_rule 2 object_hash rjenkins pg_num 20 pgp_num 20 last_change 500 flags hashpspool,nodelete,selfmanaged_snaps max_bytes 5497558138880 stripe_width 0 compression_mode aggressive application rbd removed_snaps [1~7] pool 4 'con-fs-meta' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 50 pgp_num 50 last_change 428 flags hashpspool,nodelete max_bytes 1099511627776 stripe_width 0 application cephfs pool 5 'con-fs-data' erasure size 10 min_size 8 crush_rule 6 object_hash rjenkins pg_num 300 pgp_num 300 last_change 2561 flags hashpspool,ec_overwrites,nodelete,selfmanaged_snaps max_bytes 21990232200 stripe_width 32768 compression_mode aggressive application cephfs removed_snaps [2~3d,41~2a,6d~2a,99~c,a6~1e,c6~18,df~3,e3~1,e5~3,e9~1,eb~3,ef~1,f1~1,f3~1,f5~3,f9~1,fb~3,ff~1,101~1,103~1,105~1,107~1,109~1,10b~1,10d~1,10f~1,111~1] The relevant pools are con-fs-meta and con-fs-data. Best regards, Frank = Frank Schilder AIT Risø Campus Bygning 109, rum S14 [root@ceph-08 ~]# cat /etc/tuned/ceph/tuned.conf [main] summary=Settings for ceph cluster. Derived from throughput-performance. include=throughput-performance [vm] transparent_hugepages=never [sysctl] # See also: # - https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt # - https://www.kernel.org/doc/Documentation/sysctl/net.txt # - https://cromwell-intl.com/open-source/performance-tuning/tcp.html # - https://fatmin.com/2015/08/19/ceph-tcp-performance-tuning/ # - https://www.spinics.net/lists/ceph-devel/msg21721.html # Set available PIDs and open files to maximum possible. kernel.pid_max=4194304 fs.file-max=26234859 # Swap options, reduce swappiness. vm.zone_reclaim_mode=0 #vm.dirty_ratio = 20 vm.dirty_bytes = 629145600 vm.dirty_background_bytes = 314572800 vm.swappiness=10 vm.min_free_kbytes=8388608 # Increase ARP cache size to accommodate large level-2 client network. net.ipv4.neigh.default.gc_thresh1 = 1024 net.ipv4.neigh.default.gc_thresh2 = 2048
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
On Thu, May 16, 2019 at 4:10 PM Frank Schilder wrote: > > Dear Yan and Stefan, > > thanks for the additional information, it should help reproducing the issue. > > The pdsh command executes a bash script that echoes a few values to stdout. > Access should be read-only, however, we still have the FS mounted with atime > enabled, so there is probably meta data write and synchronisation per access. > Files accessed are ssh auth-keys in .ssh and the shell script. The shell > script was located in the home-dir of the user and, following your > explanations, to reproduce the issue I will create a directory with many > entries and execute a test with the many-clients single-file-read load on it. > try setting mds_bal_split_rd and mds_bal_split_wr to very large value. which prevent mds from splitting hot dirfrag Regards Yan, Zheng > I hope it doesn't take too long. > > Thanks for your input! > > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Yan, Zheng > Sent: 16 May 2019 09:35 > To: Frank Schilder > Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS > bug?) > > On Thu, May 16, 2019 at 2:52 PM Frank Schilder wrote: > > > > Dear Yan, > > > > OK, I will try to trigger the problem again and dump the information > > requested. Since it is not easy to get into this situation and I usually > > need to resolve it fast (its not a test system), is there anything else > > worth capturing? > > > > just > > ceph daemon mds.x dump_ops_in_flight > ceph daemon mds.x dump cache /tmp/cachedump.x > > > I will get back as soon as it happened again. > > > > In the meantime, I would be grateful if you could shed some light on the > > following questions: > > > > - Is there a way to cancel an individual operation in the queue? It is a > > bit harsh to have to fail an MDS for that. > > no > > > - What is the fragmentdir operation doing in a single MDS setup? I thought > > this was only relevant if multiple MDS daemons are active on a file system. > > > > It splits large directory to smaller parts. > > > > = > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > > > From: Yan, Zheng > > Sent: 16 May 2019 05:50 > > To: Frank Schilder > > Cc: Stefan Kooman; ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops > > (MDS bug?) > > > > > [...] > > > This time I captured the MDS ops list (log output does not really contain > > > more info than this list). It contains 12 ops and I will include it here > > > in full length (hope this is acceptable): > > > > > > > Your issues were caused by stuck internal op fragmentdir. Can you > > dump mds cache and send the output to us? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
Dear Yan, it is difficult to push the MDS to err in this special way. Is it advisable or not to increase the likelihood and frequency of dirfrag operations by tweaking some of the parameters mentioned here: http://docs.ceph.com/docs/mimic/cephfs/dirfrags/. If so, what would reasonable values be, keeping in mind that we are in a pilot production phase already and need to maintain integrity of user data? Is there any counter showing if such operations happened at all? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Yan, Zheng Sent: 16 May 2019 09:35 To: Frank Schilder Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?) On Thu, May 16, 2019 at 2:52 PM Frank Schilder wrote: > > Dear Yan, > > OK, I will try to trigger the problem again and dump the information > requested. Since it is not easy to get into this situation and I usually need > to resolve it fast (its not a test system), is there anything else worth > capturing? > just ceph daemon mds.x dump_ops_in_flight ceph daemon mds.x dump cache /tmp/cachedump.x > I will get back as soon as it happened again. > > In the meantime, I would be grateful if you could shed some light on the > following questions: > > - Is there a way to cancel an individual operation in the queue? It is a bit > harsh to have to fail an MDS for that. no > - What is the fragmentdir operation doing in a single MDS setup? I thought > this was only relevant if multiple MDS daemons are active on a file system. > It splits large directory to smaller parts. > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Yan, Zheng > Sent: 16 May 2019 05:50 > To: Frank Schilder > Cc: Stefan Kooman; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS > bug?) > > > [...] > > This time I captured the MDS ops list (log output does not really contain > > more info than this list). It contains 12 ops and I will include it here in > > full length (hope this is acceptable): > > > > Your issues were caused by stuck internal op fragmentdir. Can you > dump mds cache and send the output to us? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
Dear Yan and Stefan, thanks for the additional information, it should help reproducing the issue. The pdsh command executes a bash script that echoes a few values to stdout. Access should be read-only, however, we still have the FS mounted with atime enabled, so there is probably meta data write and synchronisation per access. Files accessed are ssh auth-keys in .ssh and the shell script. The shell script was located in the home-dir of the user and, following your explanations, to reproduce the issue I will create a directory with many entries and execute a test with the many-clients single-file-read load on it. I hope it doesn't take too long. Thanks for your input! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Yan, Zheng Sent: 16 May 2019 09:35 To: Frank Schilder Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?) On Thu, May 16, 2019 at 2:52 PM Frank Schilder wrote: > > Dear Yan, > > OK, I will try to trigger the problem again and dump the information > requested. Since it is not easy to get into this situation and I usually need > to resolve it fast (its not a test system), is there anything else worth > capturing? > just ceph daemon mds.x dump_ops_in_flight ceph daemon mds.x dump cache /tmp/cachedump.x > I will get back as soon as it happened again. > > In the meantime, I would be grateful if you could shed some light on the > following questions: > > - Is there a way to cancel an individual operation in the queue? It is a bit > harsh to have to fail an MDS for that. no > - What is the fragmentdir operation doing in a single MDS setup? I thought > this was only relevant if multiple MDS daemons are active on a file system. > It splits large directory to smaller parts. > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Yan, Zheng > Sent: 16 May 2019 05:50 > To: Frank Schilder > Cc: Stefan Kooman; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS > bug?) > > > [...] > > This time I captured the MDS ops list (log output does not really contain > > more info than this list). It contains 12 ops and I will include it here in > > full length (hope this is acceptable): > > > > Your issues were caused by stuck internal op fragmentdir. Can you > dump mds cache and send the output to us? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
Quoting Frank Schilder (fr...@dtu.dk): > Dear Stefan, > > thanks for the fast reply. We encountered the problem again, this time in a > much simpler situation; please see below. However, let me start with your > questions first: > > What bug? -- In a single-active MDS set-up, should there ever occur an > operation with "op_name": "fragmentdir"? Yes, see http://docs.ceph.com/docs/mimic/cephfs/dirfrags/. If you would have multiple active MDS the load could be shared among those. There are some parameters that might need to be tuned in your environment. But Zheng Yan is an expert in this matter, so maybe after analysis of the mds dump cache it might reveal what is the culprit. > Upgrading: The problem described here is the only issue we observe. > Unless the problem is fixed upstream, upgrading won't help us and > would be a bit of a waste of time. If someone can confirm that this > problem is fixed in a newer version, we will do it. Otherwise, we > might prefer to wait until it is. Keeping your systems up to date generally improves stability. You might prevent hitting issues when your workload changes in the future. First testing new releases on a test system is recommended though. > > News on the problem. We encountered it again when one of our users executed a > command in parallel with pdsh on all our ~500 client nodes. This command > accesses the same file from all these nodes pretty much simultaneously. We > did this quite often in the past, but this time, the command got stuck and we > started observing the MDS health problem again. Symptoms: This command, does that incur writes, reads or a combination of both on files in this directory? I wonder if you might prevent this from happening when tuning "Activity thresholds". Especially when you say it is load (# clients) dependend. Gr. Stefan -- | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
Dear Yan, OK, I will try to trigger the problem again and dump the information requested. Since it is not easy to get into this situation and I usually need to resolve it fast (its not a test system), is there anything else worth capturing? I will get back as soon as it happened again. In the meantime, I would be grateful if you could shed some light on the following questions: - Is there a way to cancel an individual operation in the queue? It is a bit harsh to have to fail an MDS for that. - What is the fragmentdir operation doing in a single MDS setup? I thought this was only relevant if multiple MDS daemons are active on a file system. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Yan, Zheng Sent: 16 May 2019 05:50 To: Frank Schilder Cc: Stefan Kooman; ceph-users@lists.ceph.com Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?) > [...] > This time I captured the MDS ops list (log output does not really contain > more info than this list). It contains 12 ops and I will include it here in > full length (hope this is acceptable): > Your issues were caused by stuck internal op fragmentdir. Can you dump mds cache and send the output to us? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
}, > { > "time": "2019-05-15 11:22:06.784506", > "event": "dispatched" > }, > { > "time": "2019-05-15 11:22:06.784562", > "event": "failed to authpin, dir is being fragmented" > } > ] > } > }, > { > "description": "client_request(client.386087:12795 lookup > #0x127/file.pdf 2019-05-15 11:37:31.764353 caller_uid=0, > caller_gid=0{})", > "initiated_at": "2019-05-15 11:37:31.765631", > "age": 88.208747, > "duration": 88.209160, > "type_data": { > "flag_point": "failed to authpin, dir is being fragmented", > "reqid": "client.386087:12795", > "op_type": "client_request", > "client_info": { > "client": "client.386087", > "tid": 12795 > }, > "events": [ > { > "time": "2019-05-15 11:37:31.765631", > "event": "initiated" > }, > { > "time": "2019-05-15 11:37:31.765631", > "event": "header_read" > }, > { > "time": "2019-05-15 11:37:31.765633", > "event": "throttled" > }, > { > "time": "2019-05-15 11:37:31.765640", > "event": "all_read" > }, > { > "time": "2019-05-15 11:37:31.765731", > "event": "dispatched" > }, > { > "time": "2019-05-15 11:37:31.765759", > "event": "failed to authpin, dir is being fragmented" > } > ] > } > }, > { > "description": "client_request(client.377552:5446 readdir > #0x13a 2019-05-15 11:43:07.569329 caller_uid=0, caller_gid=0{})", > "initiated_at": "2019-05-15 11:38:36.511381", > "age": 23.462997, > "duration": 23.463467, > "type_data": { > "flag_point": "failed to authpin, dir is being fragmented", > "reqid": "client.377552:5446", > "op_type": "client_request", > "client_info": { > "client": "client.377552", > "tid": 5446 > }, > "events": [ > { > "time": "2019-05-15 11:38:36.511381", > "event": "initiated" > }, > { > "time": "2019-05-15 11:38:36.511381", > "event": "header_read" > }, > { > "time": "2019-05-15 11:38:36.511383", > "event": "throttled" > }, > { > "time": "2019-05-15 11:38:36.511392", > "event": "all_read" > }, > { > "time": "2019-05-15 11:38:36.511561", > "event": "dispatched" > }, > { > "time": "2019-05-15 11:38:36.511604", > "event": "failed to authpin, dir is being fragmented" > } > ] > } > }, > { > "description": "client_request(client.62472:6092368 getattr > pAsLsXsFs #0x138 2019-05-15 11:17:21.633854 caller_uid=105731, > caller_gid=105731{})", > "initiated_at": "2019-05-15 11:17:21.635927", > "age": 1298.338451, > "duration": 1298.338955, > "type_data": { > "flag_point": "failed to authpin, dir is being fragmented", > "reqid": "client.62472:6092368", > "op_type": "client_request", > "client_info": { > "client": "client.62472", > "tid": 6092368 > }, > "events": [ > { > "time": "2019-05-15 11:17:21.635927", > "event": "initiated" > }, > { > "time": "2019-05-15 11:17:21.635927", > "event": "header_read" > }, > { > "time": "2019-05-15 11:17:21.635931", > "event": "throttled" > }, > { > "time": "2019-05-15 11:17:21.635944", > "event": "all_read" > }, > { > "time": "2019-05-15 11:17:21.636081", > "event": "dispatched" > }, > { > "time": "2019-05-15 11:17:21.636118", > "event": "failed to authpin, dir is being fragmented" > } > ] > } > }, > { > "description": "client_request(client.62472:6092400 getattr > pAsLsXsFs #0x138 2019-05-15 11:21:25.909555 caller_uid=105731, > caller_gid=105731{})", > "initiated_at": "2019-05-15 11:21:25.910514", > "age": 1054.063864, > "duration": 1054.064406, > "type_data": { > "flag_point": "failed to authpin, dir is being fragmented", > "reqid": "client.62472:6092400", > "op_type": "client_request", > "client_info": { > "client": "client.62472", > "tid": 6092400 > }, > "events": [ > { > "time": "2019-05-15 11:21:25.910514", > "event": "initiated" > }, > { > "time": "2019-05-15 11:21:25.910514", > "event": "header_read" > }, > { > "time": "2019-05-15 11:21:25.910527", > "event": "throttled" > }, > { > "time": "2019-05-15 11:21:25.910537", > "event": "all_read" > }, > { > "time": "2019-05-15 11:21:25.910597", > "event": "dispatched" > }, > { > "time": "2019-05-15 11:21:25.910635", > "event": "failed to authpin, dir is being fragmented" > } > ] > } > } > ], > "num_ops": 12 > } > > = > Frank Schilder > AIT Ris? Campus > Bygning 109, rum S14 > > > From: Stefan Kooman > Sent: 14 May 2019 09:54:05 > To: Frank Schilder > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS > bug?) > > Quoting Frank Schilder (fr...@dtu.dk): > > If at all possible I would: > > Upgrade to 13.2.5 (there have been quite a few MDS fixes since 13.2.2). > Use more recent kernels on the clients. > > Below settings for [mds] might help with trimming (you might already > have changed mds_log_max_segments to 128 according to logs): > > [mds] > mds_log_max_expiring = 80 # default 20 > # trim max $value segments in parallel > # Defaults are too conservative. > mds_log_max_segments = 120 # default 30 > > > > 1) Is there a bug with having MDS daemons acting as standby-replay? > I can't tell what bug you are referring to based on info below. It does > seem to work as designed. > > Gr. Stefan > > -- > | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 > | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
31.765640", "event": "all_read" }, { "time": "2019-05-15 11:37:31.765731", "event": "dispatched" }, { "time": "2019-05-15 11:37:31.765759", "event": "failed to authpin, dir is being fragmented" } ] } }, { "description": "client_request(client.377552:5446 readdir #0x13a 2019-05-15 11:43:07.569329 caller_uid=0, caller_gid=0{})", "initiated_at": "2019-05-15 11:38:36.511381", "age": 23.462997, "duration": 23.463467, "type_data": { "flag_point": "failed to authpin, dir is being fragmented", "reqid": "client.377552:5446", "op_type": "client_request", "client_info": { "client": "client.377552", "tid": 5446 }, "events": [ { "time": "2019-05-15 11:38:36.511381", "event": "initiated" }, { "time": "2019-05-15 11:38:36.511381", "event": "header_read" }, { "time": "2019-05-15 11:38:36.511383", "event": "throttled" }, { "time": "2019-05-15 11:38:36.511392", "event": "all_read" }, { "time": "2019-05-15 11:38:36.511561", "event": "dispatched" }, { "time": "2019-05-15 11:38:36.511604", "event": "failed to authpin, dir is being fragmented" } ] } }, { "description": "client_request(client.62472:6092368 getattr pAsLsXsFs #0x138 2019-05-15 11:17:21.633854 caller_uid=105731, caller_gid=105731{})", "initiated_at": "2019-05-15 11:17:21.635927", "age": 1298.338451, "duration": 1298.338955, "type_data": { "flag_point": "failed to authpin, dir is being fragmented", "reqid": "client.62472:6092368", "op_type": "client_request", "client_info": { "client": "client.62472", "tid": 6092368 }, "events": [ { "time": "2019-05-15 11:17:21.635927", "event": "initiated" }, { "time": "2019-05-15 11:17:21.635927", "event": "header_read" }, { "time": "2019-05-15 11:17:21.635931", "event": "throttled" }, { "time": "2019-05-15 11:17:21.635944", "event": "all_read" }, { "time": "2019-05-15 11:17:21.636081", "event": "dispatched" }, { "time": "2019-05-15 11:17:21.636118", "event": "failed to authpin, dir is being fragmented" } ] } }, { "description": "client_request(client.62472:6092400 getattr pAsLsXsFs #0x138 2019-05-15 11:21:25.909555 caller_uid=105731, caller_gid=105731{})", "initiated_at": "2019-05-15 11:21:25.910514", "age": 1054.063864, "duration": 1054.064406, "type_data": { "flag_point": "failed to authpin, dir is being fragmented", "reqid": "client.62472:6092400", "op_type": "client_request", "client_info": { "client": "client.62472", "tid": 6092400 }, "events": [ { "time": "2019-05-15 11:21:25.910514", "event": "initiated" }, { "time": "2019-05-15 11:21:25.910514", "event": "header_read" }, { "time": "2019-05-15 11:21:25.910527", "event": "throttled" }, { "time": "2019-05-15 11:21:25.910537", "event": "all_read" }, { "time": "2019-05-15 11:21:25.910597", "event": "dispatched" }, { "time": "2019-05-15 11:21:25.910635", "event": "failed to authpin, dir is being fragmented" } ] } } ], "num_ops": 12 } = Frank Schilder AIT Ris? Campus Bygning 109, rum S14 From: Stefan Kooman Sent: 14 May 2019 09:54:05 To: Frank Schilder Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?) Quoting Frank Schilder (fr...@dtu.dk): If at all possible I would: Upgrade to 13.2.5 (there have been quite a few MDS fixes since 13.2.2). Use more recent kernels on the clients. Below settings for [mds] might help with trimming (you might already have changed mds_log_max_segments to 128 according to logs): [mds] mds_log_max_expiring = 80 # default 20 # trim max $value segments in parallel # Defaults are too conservative. mds_log_max_segments = 120 # default 30 > 1) Is there a bug with having MDS daemons acting as standby-replay? I can't tell what bug you are referring to based on info below. It does seem to work as designed. Gr. Stefan -- | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)
Quoting Frank Schilder (fr...@dtu.dk): If at all possible I would: Upgrade to 13.2.5 (there have been quite a few MDS fixes since 13.2.2). Use more recent kernels on the clients. Below settings for [mds] might help with trimming (you might already have changed mds_log_max_segments to 128 according to logs): [mds] mds_log_max_expiring = 80 # default 20 # trim max $value segments in parallel # Defaults are too conservative. mds_log_max_segments = 120 # default 30 > 1) Is there a bug with having MDS daemons acting as standby-replay? I can't tell what bug you are referring to based on info below. It does seem to work as designed. Gr. Stefan -- | BIT BV http://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com