Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-06-21 Thread Yan, Zheng
On Fri, Jun 21, 2019 at 6:10 PM Frank Schilder  wrote:
>
> Dear Yan, Zheng,
>
> does mimic 13.2.6 fix the snapshot issue? If not, could you please send me a 
> link to the issue tracker?
>
no

https://tracker.ceph.com/issues/39987


> Thanks and best regards,
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Yan, Zheng 
> Sent: 20 May 2019 13:34
> To: Frank Schilder
> Cc: Stefan Kooman; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS 
> bug?)
>
> On Sat, May 18, 2019 at 5:47 PM Frank Schilder  wrote:
> >
> > Dear Yan and Stefan,
> >
> > it happened again and there were only very few ops in the queue. I pulled 
> > the ops list and the cache. Please find a zip file here: 
> > "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . Its a 
> > bit more than 100MB.
> >
>
> MSD cache dump shows there is a snapshot related. Please avoid using
> snapshot until we fix the bug.
>
> Regards
> Yan, Zheng
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-06-21 Thread Frank Schilder
Dear Yan, Zheng,

does mimic 13.2.6 fix the snapshot issue? If not, could you please send me a 
link to the issue tracker?

Thanks and best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Yan, Zheng 
Sent: 20 May 2019 13:34
To: Frank Schilder
Cc: Stefan Kooman; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS 
bug?)

On Sat, May 18, 2019 at 5:47 PM Frank Schilder  wrote:
>
> Dear Yan and Stefan,
>
> it happened again and there were only very few ops in the queue. I pulled the 
> ops list and the cache. Please find a zip file here: 
> "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . Its a bit 
> more than 100MB.
>

MSD cache dump shows there is a snapshot related. Please avoid using
snapshot until we fix the bug.

Regards
Yan, Zheng
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-20 Thread Frank Schilder
Dear Yan,

thank you for taking care of this. I removed all snapshots and stopped snapshot 
creation.

Please keep me posted.

Best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Yan, Zheng 
Sent: 20 May 2019 13:34:07
To: Frank Schilder
Cc: Stefan Kooman; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS 
bug?)

On Sat, May 18, 2019 at 5:47 PM Frank Schilder  wrote:
>
> Dear Yan and Stefan,
>
> it happened again and there were only very few ops in the queue. I pulled the 
> ops list and the cache. Please find a zip file here: 
> "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . Its a bit 
> more than 100MB.
>

MSD cache dump shows there is a snapshot related. Please avoid using
snapshot until we fix the bug.

Regards
Yan, Zheng

> The active MDS failed over to the standby after or during the dump cache 
> operation. Is this expected? As a result, the cluster is healthy and I can't 
> do further diagnostics. In case you need more information, we have to wait 
> until next time.
>
> Some further observations:
>
> There was no load on the system. I start suspecting that this is not a 
> load-induced event. It is also not cause by excessive atime updates, the FS 
> is mounted with relatime. Could it have to do with the large level-2 network 
> (ca. 550 client servers in the same broadcast domain)? I include our kernel 
> tuning profile below, just in case. The cluster networks (back and front) are 
> isolated VLANs, no gateways, no routing.
>
> We run rolling snapshots on the file system. I didn't observe any problems 
> with this, but am wondering if this might be related. We have currently 30 
> snapshots in total. Here is the output of status and pool ls:
>
> [root@ceph-01 ~]# ceph status # before the MDS failed over
>   cluster:
> id: ###
> health: HEALTH_WARN
> 1 MDSs report slow requests
>
>   services:
> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
> mgr: ceph-01(active), standbys: ceph-02, ceph-03
> mds: con-fs-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby
> osd: 192 osds: 192 up, 192 in
>
>   data:
> pools:   5 pools, 750 pgs
> objects: 6.35 M objects, 5.2 TiB
> usage:   5.1 TiB used, 1.3 PiB / 1.3 PiB avail
> pgs: 750 active+clean
>
> [root@ceph-01 ~]# ceph status # after cache dump and the MDS failed over
>   cluster:
> id: ###
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
> mgr: ceph-01(active), standbys: ceph-02, ceph-03
> mds: con-fs-1/1/1 up  {0=ceph-12=up:active}, 1 up:standby
> osd: 192 osds: 192 up, 192 in
>
>   data:
> pools:   5 pools, 750 pgs
> objects: 6.33 M objects, 5.2 TiB
> usage:   5.1 TiB used, 1.3 PiB / 1.3 PiB avail
> pgs: 749 active+clean
>  1   active+clean+scrubbing+deep
>
>   io:
> client:   6.3 KiB/s wr, 0 op/s rd, 0 op/s wr
>
> [root@ceph-01 ~]# ceph osd pool ls detail # after the MDS failed over
> pool 1 'sr-rbd-meta-one' replicated size 3 min_size 2 crush_rule 1 
> object_hash rjenkins pg_num 80 pgp_num 80 last_change 486 flags 
> hashpspool,nodelete,selfmanaged_snaps max_bytes 536870912000 stripe_width 0 
> application rbd
> removed_snaps [1~5]
> pool 2 'sr-rbd-data-one' erasure size 8 min_size 6 crush_rule 5 object_hash 
> rjenkins pg_num 300 pgp_num 300 last_change 1759 flags 
> hashpspool,ec_overwrites,nodelete,selfmanaged_snaps max_bytes 274877906944000 
> stripe_width 24576 compression_mode aggressive application rbd
> removed_snaps [1~3]
> pool 3 'sr-rbd-one-stretch' replicated size 4 min_size 2 crush_rule 2 
> object_hash rjenkins pg_num 20 pgp_num 20 last_change 500 flags 
> hashpspool,nodelete,selfmanaged_snaps max_bytes 5497558138880 stripe_width 0 
> compression_mode aggressive application rbd
> removed_snaps [1~7]
> pool 4 'con-fs-meta' replicated size 3 min_size 2 crush_rule 3 object_hash 
> rjenkins pg_num 50 pgp_num 50 last_change 428 flags hashpspool,nodelete 
> max_bytes 1099511627776 stripe_width 0 application cephfs
> pool 5 'con-fs-data' erasure size 10 min_size 8 crush_rule 6 object_hash 
> rjenkins pg_num 300 pgp_num 300 last_change 2561 flags 
> hashpspool,ec_overwrites,nodelete,selfmanaged_snaps max_bytes 21990232200 
> stripe_width 32768 compression_mode aggressive application cephfs
> removed_snaps 
> [2~3d,41~2a,6d~2a,99~c,a6~1e,c6~18,df~3,e3~1,e5~3,e9~1,eb~3,ef~1,f1~1,f3~1,f5~3,f9~1,fb~3,ff~1,101~1,103~1,105~1,107~1,109~1,10b~1,10d~1,10f~1,111~1]
>
> The relevant pools are con-fs-meta and con-fs-data.
>
> Best regards,
>

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-20 Thread Yan, Zheng
On Sat, May 18, 2019 at 5:47 PM Frank Schilder  wrote:
>
> Dear Yan and Stefan,
>
> it happened again and there were only very few ops in the queue. I pulled the 
> ops list and the cache. Please find a zip file here: 
> "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . Its a bit 
> more than 100MB.
>

MSD cache dump shows there is a snapshot related. Please avoid using
snapshot until we fix the bug.

Regards
Yan, Zheng

> The active MDS failed over to the standby after or during the dump cache 
> operation. Is this expected? As a result, the cluster is healthy and I can't 
> do further diagnostics. In case you need more information, we have to wait 
> until next time.
>
> Some further observations:
>
> There was no load on the system. I start suspecting that this is not a 
> load-induced event. It is also not cause by excessive atime updates, the FS 
> is mounted with relatime. Could it have to do with the large level-2 network 
> (ca. 550 client servers in the same broadcast domain)? I include our kernel 
> tuning profile below, just in case. The cluster networks (back and front) are 
> isolated VLANs, no gateways, no routing.
>
> We run rolling snapshots on the file system. I didn't observe any problems 
> with this, but am wondering if this might be related. We have currently 30 
> snapshots in total. Here is the output of status and pool ls:
>
> [root@ceph-01 ~]# ceph status # before the MDS failed over
>   cluster:
> id: ###
> health: HEALTH_WARN
> 1 MDSs report slow requests
>
>   services:
> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
> mgr: ceph-01(active), standbys: ceph-02, ceph-03
> mds: con-fs-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby
> osd: 192 osds: 192 up, 192 in
>
>   data:
> pools:   5 pools, 750 pgs
> objects: 6.35 M objects, 5.2 TiB
> usage:   5.1 TiB used, 1.3 PiB / 1.3 PiB avail
> pgs: 750 active+clean
>
> [root@ceph-01 ~]# ceph status # after cache dump and the MDS failed over
>   cluster:
> id: ###
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
> mgr: ceph-01(active), standbys: ceph-02, ceph-03
> mds: con-fs-1/1/1 up  {0=ceph-12=up:active}, 1 up:standby
> osd: 192 osds: 192 up, 192 in
>
>   data:
> pools:   5 pools, 750 pgs
> objects: 6.33 M objects, 5.2 TiB
> usage:   5.1 TiB used, 1.3 PiB / 1.3 PiB avail
> pgs: 749 active+clean
>  1   active+clean+scrubbing+deep
>
>   io:
> client:   6.3 KiB/s wr, 0 op/s rd, 0 op/s wr
>
> [root@ceph-01 ~]# ceph osd pool ls detail # after the MDS failed over
> pool 1 'sr-rbd-meta-one' replicated size 3 min_size 2 crush_rule 1 
> object_hash rjenkins pg_num 80 pgp_num 80 last_change 486 flags 
> hashpspool,nodelete,selfmanaged_snaps max_bytes 536870912000 stripe_width 0 
> application rbd
> removed_snaps [1~5]
> pool 2 'sr-rbd-data-one' erasure size 8 min_size 6 crush_rule 5 object_hash 
> rjenkins pg_num 300 pgp_num 300 last_change 1759 flags 
> hashpspool,ec_overwrites,nodelete,selfmanaged_snaps max_bytes 274877906944000 
> stripe_width 24576 compression_mode aggressive application rbd
> removed_snaps [1~3]
> pool 3 'sr-rbd-one-stretch' replicated size 4 min_size 2 crush_rule 2 
> object_hash rjenkins pg_num 20 pgp_num 20 last_change 500 flags 
> hashpspool,nodelete,selfmanaged_snaps max_bytes 5497558138880 stripe_width 0 
> compression_mode aggressive application rbd
> removed_snaps [1~7]
> pool 4 'con-fs-meta' replicated size 3 min_size 2 crush_rule 3 object_hash 
> rjenkins pg_num 50 pgp_num 50 last_change 428 flags hashpspool,nodelete 
> max_bytes 1099511627776 stripe_width 0 application cephfs
> pool 5 'con-fs-data' erasure size 10 min_size 8 crush_rule 6 object_hash 
> rjenkins pg_num 300 pgp_num 300 last_change 2561 flags 
> hashpspool,ec_overwrites,nodelete,selfmanaged_snaps max_bytes 21990232200 
> stripe_width 32768 compression_mode aggressive application cephfs
> removed_snaps 
> [2~3d,41~2a,6d~2a,99~c,a6~1e,c6~18,df~3,e3~1,e5~3,e9~1,eb~3,ef~1,f1~1,f3~1,f5~3,f9~1,fb~3,ff~1,101~1,103~1,105~1,107~1,109~1,10b~1,10d~1,10f~1,111~1]
>
> The relevant pools are con-fs-meta and con-fs-data.
>
> Best regards,
> Frank
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
>
> [root@ceph-08 ~]# cat /etc/tuned/ceph/tuned.conf
> [main]
> summary=Settings for ceph cluster. Derived from throughput-performance.
> include=throughput-performance
>
> [vm]
> transparent_hugepages=never
>
> [sysctl]
> # See also:
> # - https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
> # - https://www.kernel.org/doc/Documentation/sysctl/net.txt
> # - https://cromwell-intl.com/open-source/performance-tuning/tcp.html
> # - https://fatmin.com/2015/08/19/ceph-tcp-performance-tuning/
> # - https://www.spinics.net/lists/ceph-devel/msg21721.html
>
> # Set available PIDs and open files to maximum possible.
> 

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-18 Thread Frank Schilder
Hi Stefan,

thanks for being so thorough. I am aware of that. We are still in a pilot 
phase, which is also the reason that I'm still relatively relaxed about the 
observed issue. I guess you also noticed that our cluster is almost empty too.

I don't have a complete list of storage requirements yet and had to restrict 
allocation of PGs to a reasonable minimum as with mimic I cannot reduce the PG 
count of a pool. With the current values I see imbalance but still reasonable 
performance. Once I have more information about what pools I still need to 
create, I will aim for the 100 PGs per OSD. I actually plan to give the cephfs 
a bit higher share for performance reasons. Its on the list.

Thanks again and have a good weekend,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Stefan Kooman 
Sent: 18 May 2019 17:41
To: Frank Schilder
Cc: Yan, Zheng; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS 
bug?)

Quoting Frank Schilder (fr...@dtu.dk):
>
> [root@ceph-01 ~]# ceph status # before the MDS failed over
>   cluster:
> id: ###
> health: HEALTH_WARN
> 1 MDSs report slow requests
>
>   services:
> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
> mgr: ceph-01(active), standbys: ceph-02, ceph-03
> mds: con-fs-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby
> osd: 192 osds: 192 up, 192 in
>
>   data:
> pools:   5 pools, 750 pgs
> objects: 6.35 M objects, 5.2 TiB
> usage:   5.1 TiB used, 1.3 PiB / 1.3 PiB avail
> pgs: 750 active+clean

How many pools do you plan to use? You have 5 pools and only 750 PGs
total? What hardware do you have for OSDs? If cephfs is your biggest
user I would at up to 6150! PGs to your pool(s). Having around ~ 100 PGs
per OSD is healthy. The cluster will also be able to balance way better.
Math: (100 (PG/OSD) * 192 (# OSDs)) - 750)) / 3 = 6150 for 3 replica
pools. You might have a lot of contention going on on your OSDs, they
are probably under performing.

Gr. Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-18 Thread Stefan Kooman
Quoting Frank Schilder (fr...@dtu.dk):
> 
> [root@ceph-01 ~]# ceph status # before the MDS failed over
>   cluster:
> id: ###
> health: HEALTH_WARN
> 1 MDSs report slow requests
>  
>   services:
> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
> mgr: ceph-01(active), standbys: ceph-02, ceph-03
> mds: con-fs-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby
> osd: 192 osds: 192 up, 192 in
>  
>   data:
> pools:   5 pools, 750 pgs
> objects: 6.35 M objects, 5.2 TiB
> usage:   5.1 TiB used, 1.3 PiB / 1.3 PiB avail
> pgs: 750 active+clean

How many pools do you plan to use? You have 5 pools and only 750 PGs
total? What hardware do you have for OSDs? If cephfs is your biggest
user I would at up to 6150! PGs to your pool(s). Having around ~ 100 PGs
per OSD is healthy. The cluster will also be able to balance way better.
Math: (100 (PG/OSD) * 192 (# OSDs)) - 750)) / 3 = 6150 for 3 replica
pools. You might have a lot of contention going on on your OSDs, they
are probably under performing.

Gr. Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-18 Thread Frank Schilder
Hi Stefan, cc Yan,

thanks for your quick reply.

> I am pretty sure you hit bug #26982: https://tracker.ceph.com/issues/26982 
> "mds: crash when dumping ops in flight".

Everything is fine, the daemon did not crash. The dump cache operation seems to 
be a blocking operation. It simply blocked the MDS on ceph-08 for too long and 
the mons decided to flip to the MDS on ceph-12. The MDS on ceph-08 is up for 
almost 5 days:

[root@ceph-mds:ceph-08 /]# ps -e -o pid,etime,cmd
PID ELAPSED CMD
  1  4-21:03:44 /bin/bash /entrypoint.sh mds
190  4-21:03:43 /usr/bin/ceph-mds --cluster ceph --setuser ceph --setgroup 
ceph -d -i ceph-08
  31344   02:42 /bin/bash
  31364   00:00 ps -e -o pid,etime,cmd

The relevant section from the syslog is (filtered by 'grep -i mds'):

May 18 10:20:45 ceph-08 journal: 2019-05-18 08:20:45.400 7f1c99552700  1 
mds.ceph-08 asok_command: dump cache (starting...)
May 18 10:20:45 ceph-08 journal: 2019-05-18 08:20:45.400 7f1c99552700  1 
mds.0.cache dump_cache to /var/log/ceph/mds-case/cache
May 18 10:20:51 ceph-01 journal: cluster 2019-05-18 08:20:44.135690 mds.ceph-08 
mds.0 192.168.32.72:6800/314672380 2554 : cluster 
[WRN] 7 slow requests, 0 included below; oldest blocked for > 1931.724397 secs
May 18 10:20:51 ceph-03 journal: cluster 2019-05-18 08:20:44.135690 mds.ceph-08 
mds.0 192.168.32.72:6800/314672380 2554 : cluster 
[WRN] 7 slow requests, 0 included below; oldest blocked for > 1931.724397 secs
May 18 10:20:51 ceph-02 journal: cluster 2019-05-18 08:20:44.135690 mds.ceph-08 
mds.0 192.168.32.72:6800/314672380 2554 : cluster 
[WRN] 7 slow requests, 0 included below; oldest blocked for > 1931.724397 secs
May 18 10:21:01 ceph-08 journal: 2019-05-18 08:21:01.414 7f1c952c1700  1 
heartbeat_map is_healthy 'MDSRank' had timed out after 15
May 18 10:21:01 ceph-08 journal: 2019-05-18 08:21:01.414 7f1c952c1700  0 
mds.beacon.ceph-08 _send skipping beacon, heartbeat map not healthy
May 18 10:21:03 ceph-08 journal: 2019-05-18 08:21:03.549 7f1c99d53700  1 
heartbeat_map is_healthy 'MDSRank' had timed out after 15
May 18 10:21:05 ceph-08 journal: 2019-05-18 08:21:05.414 7f1c952c1700  1 
heartbeat_map is_healthy 'MDSRank' had timed out after 15
May 18 10:21:05 ceph-08 journal: 2019-05-18 08:21:05.414 7f1c952c1700  0 
mds.beacon.ceph-08 _send skipping beacon, heartbeat map not healthy
May 18 10:21:08 ceph-08 journal: 2019-05-18 08:21:08.549 7f1c99d53700  1 
heartbeat_map is_healthy 'MDSRank' had timed out after 15
May 18 10:21:09 ceph-08 journal: 2019-05-18 08:21:09.415 7f1c952c1700  1 
heartbeat_map is_healthy 'MDSRank' had timed out after 15
May 18 10:21:09 ceph-08 journal: 2019-05-18 08:21:09.415 7f1c952c1700  0 
mds.beacon.ceph-08 _send skipping beacon, heartbeat map not healthy
May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.021 7f38552b8700  1 
mon.ceph-01@0(leader).mds e16312 no beacon from mds.0.15942 (gid: 327273 addr: 
192.168.32.72:6800/314672380 state: up:active) since 15.6064s
May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.021 7f38552b8700  1 
mon.ceph-01@0(leader).mds e16312  replacing 327273 
192.168.32.72:6800/314672380mds.0.15942 up:active with 457451/ceph-12 
192.168.32.76:6800/3202682100
May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.021 7f38552b8700  0 
log_channel(cluster) log [WRN] : daemon mds.ceph-08 is not responding, 
replacing it as rank 0 with standby daemon mds.ceph-12
May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.021 7f38552b8700  1 
mon.ceph-01@0(leader).mds e16312 fail_mds_gid 327273 mds.ceph-08 role 0
May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.038 7f38552b8700  0 
log_channel(cluster) log [WRN] : Health check failed: insufficient standby MDS 
daemons available (MDS_INSUFFICIENT_STANDBY)
May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.038 7f38552b8700  0 
log_channel(cluster) log [INF] : Health check cleared: MDS_SLOW_REQUEST (was: 1 
MDSs report slow requests)
May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.105 7f384eaab700  0 
mon.ceph-01@0(leader).mds e16313 new map
May 18 10:21:13 ceph-01 journal: debug 2019-05-18 08:21:13.105 7f384eaab700  0 
mon.ceph-01@0(leader).mds e16313 print_map
May 18 10:21:13 ceph-01 journal: compat: compat={},rocompat={},incompat={1=base 
v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in 
separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no 
anchor table,9=file layout v2,10=snaprealm v2}

Sorry, I should have checked this first.

Best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-18 Thread Stefan Kooman
Quoting Frank Schilder (fr...@dtu.dk):
> Dear Yan and Stefan,
> 
> it happened again and there were only very few ops in the queue. I
> pulled the ops list and the cache. Please find a zip file here:
> "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; .
> Its a bit more than 100MB.
> 
> The active MDS failed over to the standby after or during the dump
> cache operation. Is this expected? As a result, the cluster is healthy
> and I can't do further diagnostics. In case you need more information,
> we have to wait until next time.


> 
> Some further observations:
> 
> There was no load on the system. I start suspecting that this is not a 
> load-induced event. It is also not cause by excessive atime updates, the FS 
> is mounted with relatime. Could it have to do with the large level-2 network 
> (ca. 550 client servers in the same broadcast domain)? I include our kernel 
> tuning profile below, just in case. The cluster networks (back and front) are 
> isolated VLANs, no gateways, no routing.

I am pretty sure you hit bug #26982: https://tracker.ceph.com/issues/26982

"mds: crash when dumping ops in flight".

So, if you need a reason to update to 13.2.5 there you have it. Sorry
that I not realized beforehand you could hit this bug as you're running
13.2.2.

So I would update to 13.2.5 and try again.

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-18 Thread Frank Schilder
Dear Yan and Stefan,

it happened again and there were only very few ops in the queue. I pulled the 
ops list and the cache. Please find a zip file here: 
"https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . Its a bit 
more than 100MB.

The active MDS failed over to the standby after or during the dump cache 
operation. Is this expected? As a result, the cluster is healthy and I can't do 
further diagnostics. In case you need more information, we have to wait until 
next time.

Some further observations:

There was no load on the system. I start suspecting that this is not a 
load-induced event. It is also not cause by excessive atime updates, the FS is 
mounted with relatime. Could it have to do with the large level-2 network (ca. 
550 client servers in the same broadcast domain)? I include our kernel tuning 
profile below, just in case. The cluster networks (back and front) are isolated 
VLANs, no gateways, no routing.

We run rolling snapshots on the file system. I didn't observe any problems with 
this, but am wondering if this might be related. We have currently 30 snapshots 
in total. Here is the output of status and pool ls:

[root@ceph-01 ~]# ceph status # before the MDS failed over
  cluster:
id: ###
health: HEALTH_WARN
1 MDSs report slow requests
 
  services:
mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
mgr: ceph-01(active), standbys: ceph-02, ceph-03
mds: con-fs-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby
osd: 192 osds: 192 up, 192 in
 
  data:
pools:   5 pools, 750 pgs
objects: 6.35 M objects, 5.2 TiB
usage:   5.1 TiB used, 1.3 PiB / 1.3 PiB avail
pgs: 750 active+clean
 
[root@ceph-01 ~]# ceph status # after cache dump and the MDS failed over
  cluster:
id: ###
health: HEALTH_OK
 
  services:
mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
mgr: ceph-01(active), standbys: ceph-02, ceph-03
mds: con-fs-1/1/1 up  {0=ceph-12=up:active}, 1 up:standby
osd: 192 osds: 192 up, 192 in
 
  data:
pools:   5 pools, 750 pgs
objects: 6.33 M objects, 5.2 TiB
usage:   5.1 TiB used, 1.3 PiB / 1.3 PiB avail
pgs: 749 active+clean
 1   active+clean+scrubbing+deep
 
  io:
client:   6.3 KiB/s wr, 0 op/s rd, 0 op/s wr

[root@ceph-01 ~]# ceph osd pool ls detail # after the MDS failed over
pool 1 'sr-rbd-meta-one' replicated size 3 min_size 2 crush_rule 1 object_hash 
rjenkins pg_num 80 pgp_num 80 last_change 486 flags 
hashpspool,nodelete,selfmanaged_snaps max_bytes 536870912000 stripe_width 0 
application rbd
removed_snaps [1~5]
pool 2 'sr-rbd-data-one' erasure size 8 min_size 6 crush_rule 5 object_hash 
rjenkins pg_num 300 pgp_num 300 last_change 1759 flags 
hashpspool,ec_overwrites,nodelete,selfmanaged_snaps max_bytes 274877906944000 
stripe_width 24576 compression_mode aggressive application rbd
removed_snaps [1~3]
pool 3 'sr-rbd-one-stretch' replicated size 4 min_size 2 crush_rule 2 
object_hash rjenkins pg_num 20 pgp_num 20 last_change 500 flags 
hashpspool,nodelete,selfmanaged_snaps max_bytes 5497558138880 stripe_width 0 
compression_mode aggressive application rbd
removed_snaps [1~7]
pool 4 'con-fs-meta' replicated size 3 min_size 2 crush_rule 3 object_hash 
rjenkins pg_num 50 pgp_num 50 last_change 428 flags hashpspool,nodelete 
max_bytes 1099511627776 stripe_width 0 application cephfs
pool 5 'con-fs-data' erasure size 10 min_size 8 crush_rule 6 object_hash 
rjenkins pg_num 300 pgp_num 300 last_change 2561 flags 
hashpspool,ec_overwrites,nodelete,selfmanaged_snaps max_bytes 21990232200 
stripe_width 32768 compression_mode aggressive application cephfs
removed_snaps 
[2~3d,41~2a,6d~2a,99~c,a6~1e,c6~18,df~3,e3~1,e5~3,e9~1,eb~3,ef~1,f1~1,f3~1,f5~3,f9~1,fb~3,ff~1,101~1,103~1,105~1,107~1,109~1,10b~1,10d~1,10f~1,111~1]

The relevant pools are con-fs-meta and con-fs-data.

Best regards,
Frank

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


[root@ceph-08 ~]# cat /etc/tuned/ceph/tuned.conf 
[main]
summary=Settings for ceph cluster. Derived from throughput-performance.
include=throughput-performance

[vm]
transparent_hugepages=never

[sysctl]
# See also:
# - https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
# - https://www.kernel.org/doc/Documentation/sysctl/net.txt
# - https://cromwell-intl.com/open-source/performance-tuning/tcp.html
# - https://fatmin.com/2015/08/19/ceph-tcp-performance-tuning/
# - https://www.spinics.net/lists/ceph-devel/msg21721.html

# Set available PIDs and open files to maximum possible.
kernel.pid_max=4194304
fs.file-max=26234859

# Swap options, reduce swappiness.
vm.zone_reclaim_mode=0
#vm.dirty_ratio = 20
vm.dirty_bytes = 629145600
vm.dirty_background_bytes = 314572800
vm.swappiness=10
vm.min_free_kbytes=8388608

# Increase ARP cache size to accommodate large level-2 client network.
net.ipv4.neigh.default.gc_thresh1 = 1024
net.ipv4.neigh.default.gc_thresh2 = 2048

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Yan, Zheng
On Thu, May 16, 2019 at 4:10 PM Frank Schilder  wrote:
>
> Dear Yan and Stefan,
>
> thanks for the additional information, it should help reproducing the issue.
>
> The pdsh command executes a bash script that echoes a few values to stdout. 
> Access should be read-only, however, we still have the FS mounted with atime 
> enabled, so there is probably meta data write and synchronisation per access. 
> Files accessed are ssh auth-keys in .ssh and the shell script. The shell 
> script was located in the home-dir of the user and, following your 
> explanations, to reproduce the issue I will create a directory with many 
> entries and execute a test with the many-clients single-file-read load on it.
>

try setting mds_bal_split_rd and mds_bal_split_wr to very large value.
which prevent mds from splitting hot dirfrag

Regards
Yan, Zheng

> I hope it doesn't take too long.
>
> Thanks for your input!
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Yan, Zheng 
> Sent: 16 May 2019 09:35
> To: Frank Schilder
> Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS 
> bug?)
>
> On Thu, May 16, 2019 at 2:52 PM Frank Schilder  wrote:
> >
> > Dear Yan,
> >
> > OK, I will try to trigger the problem again and dump the information 
> > requested. Since it is not easy to get into this situation and I usually 
> > need to resolve it fast (its not a test system), is there anything else 
> > worth capturing?
> >
>
> just
>
> ceph daemon mds.x dump_ops_in_flight
> ceph daemon mds.x dump cache /tmp/cachedump.x
>
> > I will get back as soon as it happened again.
> >
> > In the meantime, I would be grateful if you could shed some light on the 
> > following questions:
> >
> > - Is there a way to cancel an individual operation in the queue? It is a 
> > bit harsh to have to fail an MDS for that.
>
> no
>
> > - What is the fragmentdir operation doing in a single MDS setup? I thought 
> > this was only relevant if multiple MDS daemons are active on a file system.
> >
>
> It splits large directory to smaller parts.
>
>
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: Yan, Zheng 
> > Sent: 16 May 2019 05:50
> > To: Frank Schilder
> > Cc: Stefan Kooman; ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops 
> > (MDS bug?)
> >
> > > [...]
> > > This time I captured the MDS ops list (log output does not really contain 
> > > more info than this list). It contains 12 ops and I will include it here 
> > > in full length (hope this is acceptable):
> > >
> >
> > Your issues were caused by stuck internal op fragmentdir.  Can you
> > dump mds cache and send the output to us?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Frank Schilder
Dear Yan,

it is difficult to push the MDS to err in this special way. Is it advisable or 
not to increase the likelihood and frequency of dirfrag operations by tweaking 
some of the parameters mentioned here: 
http://docs.ceph.com/docs/mimic/cephfs/dirfrags/. If so, what would reasonable 
values be, keeping in mind that we are in a pilot production phase already and 
need to maintain integrity of user data?

Is there any counter showing if such operations happened at all?

Best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Yan, Zheng 
Sent: 16 May 2019 09:35
To: Frank Schilder
Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS 
bug?)

On Thu, May 16, 2019 at 2:52 PM Frank Schilder  wrote:
>
> Dear Yan,
>
> OK, I will try to trigger the problem again and dump the information 
> requested. Since it is not easy to get into this situation and I usually need 
> to resolve it fast (its not a test system), is there anything else worth 
> capturing?
>

just

ceph daemon mds.x dump_ops_in_flight
ceph daemon mds.x dump cache /tmp/cachedump.x

> I will get back as soon as it happened again.
>
> In the meantime, I would be grateful if you could shed some light on the 
> following questions:
>
> - Is there a way to cancel an individual operation in the queue? It is a bit 
> harsh to have to fail an MDS for that.

no

> - What is the fragmentdir operation doing in a single MDS setup? I thought 
> this was only relevant if multiple MDS daemons are active on a file system.
>

It splits large directory to smaller parts.


> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Yan, Zheng 
> Sent: 16 May 2019 05:50
> To: Frank Schilder
> Cc: Stefan Kooman; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS 
> bug?)
>
> > [...]
> > This time I captured the MDS ops list (log output does not really contain 
> > more info than this list). It contains 12 ops and I will include it here in 
> > full length (hope this is acceptable):
> >
>
> Your issues were caused by stuck internal op fragmentdir.  Can you
> dump mds cache and send the output to us?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Frank Schilder
Dear Yan and Stefan,

thanks for the additional information, it should help reproducing the issue.

The pdsh command executes a bash script that echoes a few values to stdout. 
Access should be read-only, however, we still have the FS mounted with atime 
enabled, so there is probably meta data write and synchronisation per access. 
Files accessed are ssh auth-keys in .ssh and the shell script. The shell script 
was located in the home-dir of the user and, following your explanations, to 
reproduce the issue I will create a directory with many entries and execute a 
test with the many-clients single-file-read load on it.

I hope it doesn't take too long.

Thanks for your input!

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Yan, Zheng 
Sent: 16 May 2019 09:35
To: Frank Schilder
Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS 
bug?)

On Thu, May 16, 2019 at 2:52 PM Frank Schilder  wrote:
>
> Dear Yan,
>
> OK, I will try to trigger the problem again and dump the information 
> requested. Since it is not easy to get into this situation and I usually need 
> to resolve it fast (its not a test system), is there anything else worth 
> capturing?
>

just

ceph daemon mds.x dump_ops_in_flight
ceph daemon mds.x dump cache /tmp/cachedump.x

> I will get back as soon as it happened again.
>
> In the meantime, I would be grateful if you could shed some light on the 
> following questions:
>
> - Is there a way to cancel an individual operation in the queue? It is a bit 
> harsh to have to fail an MDS for that.

no

> - What is the fragmentdir operation doing in a single MDS setup? I thought 
> this was only relevant if multiple MDS daemons are active on a file system.
>

It splits large directory to smaller parts.


> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Yan, Zheng 
> Sent: 16 May 2019 05:50
> To: Frank Schilder
> Cc: Stefan Kooman; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS 
> bug?)
>
> > [...]
> > This time I captured the MDS ops list (log output does not really contain 
> > more info than this list). It contains 12 ops and I will include it here in 
> > full length (hope this is acceptable):
> >
>
> Your issues were caused by stuck internal op fragmentdir.  Can you
> dump mds cache and send the output to us?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Stefan Kooman
Quoting Frank Schilder (fr...@dtu.dk):
> Dear Stefan,
> 
> thanks for the fast reply. We encountered the problem again, this time in a 
> much simpler situation; please see below. However, let me start with your 
> questions first:
> 
> What bug? -- In a single-active MDS set-up, should there ever occur an 
> operation with "op_name": "fragmentdir"?

Yes, see http://docs.ceph.com/docs/mimic/cephfs/dirfrags/. If you would
have multiple active MDS the load could be shared among those.

There are some parameters that might need to be tuned in your
environment. But Zheng Yan is an expert in this matter, so maybe after
analysis of the mds dump cache it might reveal what is the culprit.

> Upgrading: The problem described here is the only issue we observe.
> Unless the problem is fixed upstream, upgrading won't help us and
> would be a bit of a waste of time. If someone can confirm that this
> problem is fixed in a newer version, we will do it. Otherwise, we
> might prefer to wait until it is.

Keeping your systems up to date generally improves stability. You might
prevent hitting issues when your workload changes in the future. First
testing new releases on a test system is recommended though.

> 
> News on the problem. We encountered it again when one of our users executed a 
> command in parallel with pdsh on all our ~500 client nodes. This command 
> accesses the same file from all these nodes pretty much simultaneously. We 
> did this quite often in the past, but this time, the command got stuck and we 
> started observing the MDS health problem again. Symptoms:

This command, does that incur writes, reads or a combination of both on
files in this directory? I wonder if you might prevent this from
happening when tuning "Activity thresholds". Especially when you say it
is load (# clients) dependend.

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Frank Schilder
Dear Yan,

OK, I will try to trigger the problem again and dump the information requested. 
Since it is not easy to get into this situation and I usually need to resolve 
it fast (its not a test system), is there anything else worth capturing?

I will get back as soon as it happened again.

In the meantime, I would be grateful if you could shed some light on the 
following questions:

- Is there a way to cancel an individual operation in the queue? It is a bit 
harsh to have to fail an MDS for that.
- What is the fragmentdir operation doing in a single MDS setup? I thought this 
was only relevant if multiple MDS daemons are active on a file system.

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Yan, Zheng 
Sent: 16 May 2019 05:50
To: Frank Schilder
Cc: Stefan Kooman; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS 
bug?)

> [...]
> This time I captured the MDS ops list (log output does not really contain 
> more info than this list). It contains 12 ops and I will include it here in 
> full length (hope this is acceptable):
>

Your issues were caused by stuck internal op fragmentdir.  Can you
dump mds cache and send the output to us?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-15 Thread Yan, Zheng
},
> {
> "time": "2019-05-15 11:22:06.784506",
> "event": "dispatched"
> },
> {
> "time": "2019-05-15 11:22:06.784562",
> "event": "failed to authpin, dir is being fragmented"
> }
> ]
> }
> },
> {
> "description": "client_request(client.386087:12795 lookup 
> #0x127/file.pdf 2019-05-15 11:37:31.764353 caller_uid=0, 
> caller_gid=0{})",
> "initiated_at": "2019-05-15 11:37:31.765631",
> "age": 88.208747,
> "duration": 88.209160,
> "type_data": {
> "flag_point": "failed to authpin, dir is being fragmented",
> "reqid": "client.386087:12795",
> "op_type": "client_request",
> "client_info": {
> "client": "client.386087",
> "tid": 12795
> },
> "events": [
> {
> "time": "2019-05-15 11:37:31.765631",
> "event": "initiated"
> },
> {
> "time": "2019-05-15 11:37:31.765631",
> "event": "header_read"
> },
> {
> "time": "2019-05-15 11:37:31.765633",
> "event": "throttled"
> },
> {
> "time": "2019-05-15 11:37:31.765640",
> "event": "all_read"
> },
> {
> "time": "2019-05-15 11:37:31.765731",
> "event": "dispatched"
> },
> {
> "time": "2019-05-15 11:37:31.765759",
> "event": "failed to authpin, dir is being fragmented"
> }
> ]
> }
> },
> {
> "description": "client_request(client.377552:5446 readdir 
> #0x13a 2019-05-15 11:43:07.569329 caller_uid=0, caller_gid=0{})",
> "initiated_at": "2019-05-15 11:38:36.511381",
> "age": 23.462997,
> "duration": 23.463467,
> "type_data": {
>         "flag_point": "failed to authpin, dir is being fragmented",
> "reqid": "client.377552:5446",
> "op_type": "client_request",
> "client_info": {
> "client": "client.377552",
> "tid": 5446
> },
> "events": [
> {
> "time": "2019-05-15 11:38:36.511381",
> "event": "initiated"
> },
> {
> "time": "2019-05-15 11:38:36.511381",
> "event": "header_read"
> },
> {
> "time": "2019-05-15 11:38:36.511383",
> "event": "throttled"
> },
> {
> "time": "2019-05-15 11:38:36.511392",
> "event": "all_read"
> },
> {
> "time": "2019-05-15 11:38:36.511561",
> "event": "dispatched"
> },
> {
> "time": "2019-05-15 11:38:36.511604",
> "event": "failed to authpin, dir is being fragmented"
> }
> ]
> }
> },
> {
> "description": "client_request(client.62472:6092368 getattr 
> pAsLsXsFs #0x138 2019-05-15 11:17:21.633854 caller_uid=105731, 
> caller_gid=105731{})",
> "initiated_at": "2019-05-15 11:17:21.635927",
> "age": 1298.338451,
> "duration": 1298.338955,
> "type_data": {
> "flag_point": "failed to authpin, dir is being fragmented",
> "reqid": "client.62472:6092368",
> "op_type": "client_request",
> "client_info": {
> "client": "client.62472",
> "tid": 6092368
> },
> "events": [
> {
> "time": "2019-05-15 11:17:21.635927",
> "event": "initiated"
> },
> {
> "time": "2019-05-15 11:17:21.635927",
> "event": "header_read"
> },
> {
> "time": "2019-05-15 11:17:21.635931",
> "event": "throttled"
> },
> {
> "time": "2019-05-15 11:17:21.635944",
> "event": "all_read"
> },
> {
> "time": "2019-05-15 11:17:21.636081",
> "event": "dispatched"
> },
> {
> "time": "2019-05-15 11:17:21.636118",
> "event": "failed to authpin, dir is being fragmented"
> }
> ]
> }
> },
> {
> "description": "client_request(client.62472:6092400 getattr 
> pAsLsXsFs #0x138 2019-05-15 11:21:25.909555 caller_uid=105731, 
> caller_gid=105731{})",
> "initiated_at": "2019-05-15 11:21:25.910514",
> "age": 1054.063864,
> "duration": 1054.064406,
> "type_data": {
> "flag_point": "failed to authpin, dir is being fragmented",
> "reqid": "client.62472:6092400",
> "op_type": "client_request",
> "client_info": {
> "client": "client.62472",
> "tid": 6092400
> },
> "events": [
> {
> "time": "2019-05-15 11:21:25.910514",
> "event": "initiated"
> },
> {
> "time": "2019-05-15 11:21:25.910514",
> "event": "header_read"
> },
> {
> "time": "2019-05-15 11:21:25.910527",
> "event": "throttled"
> },
> {
> "time": "2019-05-15 11:21:25.910537",
> "event": "all_read"
> },
> {
> "time": "2019-05-15 11:21:25.910597",
> "event": "dispatched"
> },
> {
> "time": "2019-05-15 11:21:25.910635",
> "event": "failed to authpin, dir is being fragmented"
> }
> ]
> }
> }
> ],
> "num_ops": 12
> }
>
> =
> Frank Schilder
> AIT Ris? Campus
> Bygning 109, rum S14
>
> 
> From: Stefan Kooman 
> Sent: 14 May 2019 09:54:05
> To: Frank Schilder
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS 
> bug?)
>
> Quoting Frank Schilder (fr...@dtu.dk):
>
> If at all possible I would:
>
> Upgrade to 13.2.5 (there have been quite a few MDS fixes since 13.2.2).
> Use more recent kernels on the clients.
>
> Below settings for [mds] might help with trimming (you might already
> have changed mds_log_max_segments to 128 according to logs):
>
> [mds]
> mds_log_max_expiring = 80  # default 20
> # trim max $value segments in parallel
> # Defaults are too conservative.
> mds_log_max_segments = 120  # default 30
>
>
> > 1) Is there a bug with having MDS daemons acting as standby-replay?
> I can't tell what bug you are referring to based on info below. It does
> seem to work as designed.
>
> Gr. Stefan
>
> --
> | BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-15 Thread Frank Schilder
31.765640",
"event": "all_read"
},
{
"time": "2019-05-15 11:37:31.765731",
"event": "dispatched"
},
{
"time": "2019-05-15 11:37:31.765759",
"event": "failed to authpin, dir is being fragmented"
}
]
}
},
{
"description": "client_request(client.377552:5446 readdir 
#0x13a 2019-05-15 11:43:07.569329 caller_uid=0, caller_gid=0{})",
"initiated_at": "2019-05-15 11:38:36.511381",
"age": 23.462997,
"duration": 23.463467,
"type_data": {
"flag_point": "failed to authpin, dir is being fragmented",
"reqid": "client.377552:5446",
"op_type": "client_request",
"client_info": {
"client": "client.377552",
"tid": 5446
},
"events": [
{
"time": "2019-05-15 11:38:36.511381",
"event": "initiated"
},
{
"time": "2019-05-15 11:38:36.511381",
"event": "header_read"
},
{
"time": "2019-05-15 11:38:36.511383",
"event": "throttled"
},
        {
                "time": "2019-05-15 11:38:36.511392",
"event": "all_read"
},
{
"time": "2019-05-15 11:38:36.511561",
"event": "dispatched"
},
{
"time": "2019-05-15 11:38:36.511604",
"event": "failed to authpin, dir is being fragmented"
}
]
}
},
{
"description": "client_request(client.62472:6092368 getattr 
pAsLsXsFs #0x138 2019-05-15 11:17:21.633854 caller_uid=105731, 
caller_gid=105731{})",
"initiated_at": "2019-05-15 11:17:21.635927",
"age": 1298.338451,
"duration": 1298.338955,
"type_data": {
"flag_point": "failed to authpin, dir is being fragmented",
"reqid": "client.62472:6092368",
"op_type": "client_request",
"client_info": {
"client": "client.62472",
"tid": 6092368
},
"events": [
{
"time": "2019-05-15 11:17:21.635927",
"event": "initiated"
},
{
"time": "2019-05-15 11:17:21.635927",
"event": "header_read"
},
{
"time": "2019-05-15 11:17:21.635931",
"event": "throttled"
},
{
"time": "2019-05-15 11:17:21.635944",
"event": "all_read"
},
{
"time": "2019-05-15 11:17:21.636081",
"event": "dispatched"
},
{
"time": "2019-05-15 11:17:21.636118",
"event": "failed to authpin, dir is being fragmented"
}
]
}
},
{
"description": "client_request(client.62472:6092400 getattr 
pAsLsXsFs #0x138 2019-05-15 11:21:25.909555 caller_uid=105731, 
caller_gid=105731{})",
"initiated_at": "2019-05-15 11:21:25.910514",
"age": 1054.063864,
"duration": 1054.064406,
"type_data": {
"flag_point": "failed to authpin, dir is being fragmented",
"reqid": "client.62472:6092400",
"op_type": "client_request",
"client_info": {
"client": "client.62472",
"tid": 6092400
},
"events": [
{
"time": "2019-05-15 11:21:25.910514",
"event": "initiated"
},
{
"time": "2019-05-15 11:21:25.910514",
"event": "header_read"
},
{
"time": "2019-05-15 11:21:25.910527",
"event": "throttled"
},
{
"time": "2019-05-15 11:21:25.910537",
"event": "all_read"
},
{
"time": "2019-05-15 11:21:25.910597",
"event": "dispatched"
},
{
"time": "2019-05-15 11:21:25.910635",
"event": "failed to authpin, dir is being fragmented"
}
]
}
}
],
"num_ops": 12
}

=
Frank Schilder
AIT Ris? Campus
Bygning 109, rum S14


From: Stefan Kooman 
Sent: 14 May 2019 09:54:05
To: Frank Schilder
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS 
bug?)

Quoting Frank Schilder (fr...@dtu.dk):

If at all possible I would:

Upgrade to 13.2.5 (there have been quite a few MDS fixes since 13.2.2).
Use more recent kernels on the clients.

Below settings for [mds] might help with trimming (you might already
have changed mds_log_max_segments to 128 according to logs):

[mds]
mds_log_max_expiring = 80  # default 20
# trim max $value segments in parallel
# Defaults are too conservative.
mds_log_max_segments = 120  # default 30


> 1) Is there a bug with having MDS daemons acting as standby-replay?
I can't tell what bug you are referring to based on info below. It does
seem to work as designed.

Gr. Stefan

--
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-14 Thread Stefan Kooman
Quoting Frank Schilder (fr...@dtu.dk):

If at all possible I would:

Upgrade to 13.2.5 (there have been quite a few MDS fixes since 13.2.2).
Use more recent kernels on the clients.

Below settings for [mds] might help with trimming (you might already
have changed mds_log_max_segments to 128 according to logs):

[mds]
mds_log_max_expiring = 80  # default 20
# trim max $value segments in parallel
# Defaults are too conservative.
mds_log_max_segments = 120  # default 30


> 1) Is there a bug with having MDS daemons acting as standby-replay?
I can't tell what bug you are referring to based on info below. It does
seem to work as designed.

Gr. Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com