Re: [ceph-users] mds0: Client X failing to respond to capability release
On Fri, Feb 5, 2016 at 10:19 PM, Michael Metz-Martini | SpeedPartner GmbH wrote: > Hi, > > Am 06.02.2016 um 07:15 schrieb Yan, Zheng: >>> On Feb 6, 2016, at 13:41, Michael Metz-Martini | SpeedPartner GmbH >>> wrote: >>> Am 04.02.2016 um 15:38 schrieb Yan, Zheng: > On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH > wrote: > Am 04.02.2016 um 09:43 schrieb Yan, Zheng: >> On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner >> GmbH wrote: >>> Am 03.02.2016 um 15:55 schrieb Yan, Zheng: > On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH > wrote: > Am 03.02.2016 um 12:11 schrieb Yan, Zheng: >>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH >>> wrote: >>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum: On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner > 2016-02-03 14:42:25.581840 7fadfd280700 0 log_channel(default) log > [WRN] : 7 slow requests, 6 included below; oldest blocked for > > 62.125785 secs > 2016-02-03 14:42:25.581849 7fadfd280700 0 log_channel(default) log > [WRN] : slow request 62.125785 seconds old, received at 2016-02-03 > 14:41:23.455812: client_request(client.10199855:1313157 getattr > pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to > rdlock, waiting This seems like dirty page writeback is too slow. Is there any hung OSD request in /sys/kernel/debug/ceph/xxx/osdc? > Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed) That’s quite a lot requests. Could you pick some requests in osdc, and check how long do these requests last. >>> After stopping load/access to cephfs there are a few requests left: >>> 330 osd87 5.72c3bf71 100826d5cdc.0002write >>> 508 osd87 5.569ad068 100826d5d18.write >>> 668 osd87 5.3db54b00 100826d5d4d.0001write >>> 799 osd87 5.65f8c4e0 100826d5d79.write >>> 874 osd87 5.d238da71 100826d5d98.write >>> 1023osd87 5.705950e0 100826d5e2d.write >>> 1277osd87 5.33673f71 100826d5f2a.write >>> 1329osd87 5.e81ab868 100826d5f5e.write >>> 1392osd87 5.aea1c771 100826d5f9c.write >>> >>> osd.87 is near full and currently has some pg's with backfill_toofull >>> but can this be the reason for this? >> >> Yes, it’s likely. > But "why"? > I thought that reads/writes are still possible but not replicated / > objects are degraded. As long as all the PGs are "active" they'll still accept reads/writes, but it's possible that osd 87 is just so busy that the clients are all stuck waiting for it. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client X failing to respond to capability release
Hi, Am 06.02.2016 um 07:15 schrieb Yan, Zheng: >> On Feb 6, 2016, at 13:41, Michael Metz-Martini | SpeedPartner GmbH >> wrote: >> Am 04.02.2016 um 15:38 schrieb Yan, Zheng: On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH wrote: Am 04.02.2016 um 09:43 schrieb Yan, Zheng: > On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner > GmbH wrote: >> Am 03.02.2016 um 15:55 schrieb Yan, Zheng: On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH wrote: Am 03.02.2016 um 12:11 schrieb Yan, Zheng: >> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH >> wrote: >> Am 03.02.2016 um 10:26 schrieb Gregory Farnum: >>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner 2016-02-03 14:42:25.581840 7fadfd280700 0 log_channel(default) log [WRN] : 7 slow requests, 6 included below; oldest blocked for > 62.125785 secs 2016-02-03 14:42:25.581849 7fadfd280700 0 log_channel(default) log [WRN] : slow request 62.125785 seconds old, received at 2016-02-03 14:41:23.455812: client_request(client.10199855:1313157 getattr pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to rdlock, waiting >>> This seems like dirty page writeback is too slow. Is there any hung >>> OSD request in /sys/kernel/debug/ceph/xxx/osdc? Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed) >>> That’s quite a lot requests. Could you pick some requests in osdc, and >>> check how long do these requests last. >> After stopping load/access to cephfs there are a few requests left: >> 330 osd87 5.72c3bf71 100826d5cdc.0002write >> 508 osd87 5.569ad068 100826d5d18.write >> 668 osd87 5.3db54b00 100826d5d4d.0001write >> 799 osd87 5.65f8c4e0 100826d5d79.write >> 874 osd87 5.d238da71 100826d5d98.write >> 1023osd87 5.705950e0 100826d5e2d.write >> 1277osd87 5.33673f71 100826d5f2a.write >> 1329osd87 5.e81ab868 100826d5f5e.write >> 1392osd87 5.aea1c771 100826d5f9c.write >> >> osd.87 is near full and currently has some pg's with backfill_toofull >> but can this be the reason for this? > > Yes, it’s likely. But "why"? I thought that reads/writes are still possible but not replicated / objects are degraded. -- Kind regards Michael Metz-Martini ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client X failing to respond to capability release
> On Feb 6, 2016, at 13:41, Michael Metz-Martini | SpeedPartner GmbH > wrote: > > Hi, > > sorry for the delay - productional system unfortunately ;-( > > Am 04.02.2016 um 15:38 schrieb Yan, Zheng: >>> On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH >>> wrote: >>> Am 04.02.2016 um 09:43 schrieb Yan, Zheng: On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner GmbH wrote: > Am 03.02.2016 um 15:55 schrieb Yan, Zheng: >>> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH >>> wrote: >>> Am 03.02.2016 um 12:11 schrieb Yan, Zheng: > On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH > wrote: > Am 03.02.2016 um 10:26 schrieb Gregory Farnum: >> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner >>> 2016-02-03 14:42:25.581840 7fadfd280700 0 log_channel(default) log >>> [WRN] : 7 slow requests, 6 included below; oldest blocked for > >>> 62.125785 secs >>> 2016-02-03 14:42:25.581849 7fadfd280700 0 log_channel(default) log >>> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03 >>> 14:41:23.455812: client_request(client.10199855:1313157 getattr >>> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to >>> rdlock, waiting >> This seems like dirty page writeback is too slow. Is there any hung OSD >> request in /sys/kernel/debug/ceph/xxx/osdc? >>> Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed) >> That’s quite a lot requests. Could you pick some requests in osdc, and check >> how long do these requests last. > After stopping load/access to cephfs there are a few requests left: > 330 osd87 5.72c3bf71 100826d5cdc.0002write > 508 osd87 5.569ad068 100826d5d18.write > 668 osd87 5.3db54b00 100826d5d4d.0001write > 799 osd87 5.65f8c4e0 100826d5d79.write > 874 osd87 5.d238da71 100826d5d98.write > 1023osd87 5.705950e0 100826d5e2d.write > 1277osd87 5.33673f71 100826d5f2a.write > 1329osd87 5.e81ab868 100826d5f5e.write > 1392osd87 5.aea1c771 100826d5f9c.write > > osd.87 is near full and currently has some pg's with backfill_toofull > but can this be the reason for this? Yes, it’s likely. > > -- > Kind regards > Michael Metz-Martini ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client X failing to respond to capability release
Hi, sorry for the delay - productional system unfortunately ;-( Am 04.02.2016 um 15:38 schrieb Yan, Zheng: >> On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH >> wrote: >> Am 04.02.2016 um 09:43 schrieb Yan, Zheng: >>> On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner >>> GmbH wrote: Am 03.02.2016 um 15:55 schrieb Yan, Zheng: >> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH >> wrote: >> Am 03.02.2016 um 12:11 schrieb Yan, Zheng: On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH wrote: Am 03.02.2016 um 10:26 schrieb Gregory Farnum: > On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner >> 2016-02-03 14:42:25.581840 7fadfd280700 0 log_channel(default) log >> [WRN] : 7 slow requests, 6 included below; oldest blocked for > >> 62.125785 secs >> 2016-02-03 14:42:25.581849 7fadfd280700 0 log_channel(default) log >> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03 >> 14:41:23.455812: client_request(client.10199855:1313157 getattr >> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to >> rdlock, waiting > This seems like dirty page writeback is too slow. Is there any hung OSD > request in /sys/kernel/debug/ceph/xxx/osdc? >> Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed) > That’s quite a lot requests. Could you pick some requests in osdc, and check > how long do these requests last. After stopping load/access to cephfs there are a few requests left: 330 osd87 5.72c3bf71 100826d5cdc.0002write 508 osd87 5.569ad068 100826d5d18.write 668 osd87 5.3db54b00 100826d5d4d.0001write 799 osd87 5.65f8c4e0 100826d5d79.write 874 osd87 5.d238da71 100826d5d98.write 1023osd87 5.705950e0 100826d5e2d.write 1277osd87 5.33673f71 100826d5f2a.write 1329osd87 5.e81ab868 100826d5f5e.write 1392osd87 5.aea1c771 100826d5f9c.write osd.87 is near full and currently has some pg's with backfill_toofull but can this be the reason for this? -- Kind regards Michael Metz-Martini ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client X failing to respond to capability release
> On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH > wrote: > > Hi, > > Am 04.02.2016 um 09:43 schrieb Yan, Zheng: >> On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner >> GmbH wrote: >>> Am 03.02.2016 um 15:55 schrieb Yan, Zheng: > On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH > wrote: > Am 03.02.2016 um 12:11 schrieb Yan, Zheng: >>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH >>> wrote: >>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum: On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner > 2016-02-03 14:42:25.581840 7fadfd280700 0 log_channel(default) log > [WRN] : 7 slow requests, 6 included below; oldest blocked for > > 62.125785 secs > 2016-02-03 14:42:25.581849 7fadfd280700 0 log_channel(default) log > [WRN] : slow request 62.125785 seconds old, received at 2016-02-03 > 14:41:23.455812: client_request(client.10199855:1313157 getattr > pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to > rdlock, waiting This seems like dirty page writeback is too slow. Is there any hung OSD request in /sys/kernel/debug/ceph/xxx/osdc? >>> Where should I check? Client or mds? Do I have to enable something to >>> get this details? Directory /sys/kernel/debug/ceph/ seems to be missing. >> On client with kernel ceph mount. If there is no debugfs, mount >> debugfs first (mount -t debugfs /sys/kernel/debug /sys/kernel/debug) > Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed) That’s quite a lot requests. Could you pick some requests in osdc, and check how long do these requests last. > > By looking around I found caps .. $ cat caps > total 305975 > avail 2 > used305973 > reserved0 > min 1024 > > Somehow related? avail=2 is low ;-) This is not problem. > > -- > Kind regards > Michael Metz-Martini ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client X failing to respond to capability release
Hi, Am 04.02.2016 um 09:43 schrieb Yan, Zheng: > On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner > GmbH wrote: >> Am 03.02.2016 um 15:55 schrieb Yan, Zheng: On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH wrote: Am 03.02.2016 um 12:11 schrieb Yan, Zheng: >> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH >> wrote: >> Am 03.02.2016 um 10:26 schrieb Gregory Farnum: >>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner 2016-02-03 14:42:25.581840 7fadfd280700 0 log_channel(default) log [WRN] : 7 slow requests, 6 included below; oldest blocked for > 62.125785 secs 2016-02-03 14:42:25.581849 7fadfd280700 0 log_channel(default) log [WRN] : slow request 62.125785 seconds old, received at 2016-02-03 14:41:23.455812: client_request(client.10199855:1313157 getattr pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to rdlock, waiting >>> >>> This seems like dirty page writeback is too slow. Is there any hung OSD >>> request in /sys/kernel/debug/ceph/xxx/osdc? >> Where should I check? Client or mds? Do I have to enable something to >> get this details? Directory /sys/kernel/debug/ceph/ seems to be missing. > On client with kernel ceph mount. If there is no debugfs, mount > debugfs first (mount -t debugfs /sys/kernel/debug /sys/kernel/debug) Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed) By looking around I found caps .. $ cat caps total 305975 avail 2 used305973 reserved0 min 1024 Somehow related? avail=2 is low ;-) -- Kind regards Michael Metz-Martini ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client X failing to respond to capability release
On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner GmbH wrote: > Hi, > > Am 03.02.2016 um 15:55 schrieb Yan, Zheng: >>> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH >>> wrote: >>> Am 03.02.2016 um 12:11 schrieb Yan, Zheng: > On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH > wrote: > Am 03.02.2016 um 10:26 schrieb Gregory Farnum: >> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner >>> 2016-02-03 14:42:25.581840 7fadfd280700 0 log_channel(default) log >>> [WRN] : 7 slow requests, 6 included below; oldest blocked for > >>> 62.125785 secs >>> 2016-02-03 14:42:25.581849 7fadfd280700 0 log_channel(default) log >>> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03 >>> 14:41:23.455812: client_request(client.10199855:1313157 getattr >>> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to >>> rdlock, waiting >> >> This seems like dirty page writeback is too slow. Is there any hung OSD >> request in /sys/kernel/debug/ceph/xxx/osdc? > Where should I check? Client or mds? Do I have to enable something to > get this details? Directory /sys/kernel/debug/ceph/ seems to be missing. > On client with kernel ceph mount. If there is no debugfs, mount debugfs first (mount -t debugfs /sys/kernel/debug /sys/kernel/debug) Regards Yan, Zheng > > -- > Kind regards > Michael Metz-Martini > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client X failing to respond to capability release
Hi, Am 03.02.2016 um 15:55 schrieb Yan, Zheng: >> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH >> wrote: >> Am 03.02.2016 um 12:11 schrieb Yan, Zheng: On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH wrote: Am 03.02.2016 um 10:26 schrieb Gregory Farnum: > On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner >> 2016-02-03 14:42:25.581840 7fadfd280700 0 log_channel(default) log >> [WRN] : 7 slow requests, 6 included below; oldest blocked for > >> 62.125785 secs >> 2016-02-03 14:42:25.581849 7fadfd280700 0 log_channel(default) log >> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03 >> 14:41:23.455812: client_request(client.10199855:1313157 getattr >> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to >> rdlock, waiting > > This seems like dirty page writeback is too slow. Is there any hung OSD > request in /sys/kernel/debug/ceph/xxx/osdc? Where should I check? Client or mds? Do I have to enable something to get this details? Directory /sys/kernel/debug/ceph/ seems to be missing. -- Kind regards Michael Metz-Martini ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client X failing to respond to capability release
> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH > wrote: > > Hi, > > Am 03.02.2016 um 12:11 schrieb Yan, Zheng: >>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH >>> wrote: >>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum: On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner GmbH wrote: Or maybe your kernels are too old; Zheng would know. >>> We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x >>> for the clients should be possible if that might help. >> mds log should contain messages like: >> >> client. isn't responding to mclientcaps(revoke) >> >> please send these messages to us. > 2016-02-03 14:42:25.568800 7fadfd280700 2 mds.0.cache > check_memory_usage total 17302804, rss 16604996, heap 42916, malloc > -1008738 mmap 0, baseline 39844, buffers 0, max 1048576, 881503 / > 388 inodes have caps, 882499 caps, 0.220625 caps per inode > 2016-02-03 14:42:25.581494 7fadfd280700 0 log_channel(default) log > [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino > 100815bd349 pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.127500 > seconds ago > 2016-02-03 14:42:25.581519 7fadfd280700 0 log_channel(default) log > [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino > 100815bf1af pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.085996 > seconds ago > 2016-02-03 14:42:25.581527 7fadfd280700 0 log_channel(default) log > [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino > 100815bf4d3 pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.084284 > seconds ago > 2016-02-03 14:42:25.581534 7fadfd280700 0 log_channel(default) log > [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino > 100815d2500 pending pAsLsXsFsc issued pAsLsXsFscb, sent 61.731320 > seconds ago > 2016-02-03 14:42:25.581840 7fadfd280700 0 log_channel(default) log > [WRN] : 7 slow requests, 6 included below; oldest blocked for > > 62.125785 secs > 2016-02-03 14:42:25.581849 7fadfd280700 0 log_channel(default) log > [WRN] : slow request 62.125785 seconds old, received at 2016-02-03 > 14:41:23.455812: client_request(client.10199855:1313157 getattr > pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to > rdlock, waiting This seems like dirty page writeback is too slow. Is there any hung OSD request in /sys/kernel/debug/ceph/xxx/osdc? > > -- > Kind regards > Michael ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client X failing to respond to capability release
Hi, Am 03.02.2016 um 12:11 schrieb Yan, Zheng: >> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH >> wrote: >> Am 03.02.2016 um 10:26 schrieb Gregory Farnum: >>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner >>> GmbH wrote: >>> Or maybe your kernels are too old; Zheng would know. >> We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x >> for the clients should be possible if that might help. > mds log should contain messages like: > > client. isn't responding to mclientcaps(revoke) > > please send these messages to us. 2016-02-03 14:42:25.568800 7fadfd280700 2 mds.0.cache check_memory_usage total 17302804, rss 16604996, heap 42916, malloc -1008738 mmap 0, baseline 39844, buffers 0, max 1048576, 881503 / 388 inodes have caps, 882499 caps, 0.220625 caps per inode 2016-02-03 14:42:25.581494 7fadfd280700 0 log_channel(default) log [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino 100815bd349 pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.127500 seconds ago 2016-02-03 14:42:25.581519 7fadfd280700 0 log_channel(default) log [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino 100815bf1af pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.085996 seconds ago 2016-02-03 14:42:25.581527 7fadfd280700 0 log_channel(default) log [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino 100815bf4d3 pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.084284 seconds ago 2016-02-03 14:42:25.581534 7fadfd280700 0 log_channel(default) log [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino 100815d2500 pending pAsLsXsFsc issued pAsLsXsFscb, sent 61.731320 seconds ago 2016-02-03 14:42:25.581840 7fadfd280700 0 log_channel(default) log [WRN] : 7 slow requests, 6 included below; oldest blocked for > 62.125785 secs 2016-02-03 14:42:25.581849 7fadfd280700 0 log_channel(default) log [WRN] : slow request 62.125785 seconds old, received at 2016-02-03 14:41:23.455812: client_request(client.10199855:1313157 getattr pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to rdlock, waiting -- Kind regards Michael ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client X failing to respond to capability release
> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH > wrote: > > Hi, > > Am 03.02.2016 um 10:26 schrieb Gregory Farnum: >> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner >> GmbH wrote: >>> Putting some higher load via cephfs on the cluster leads to messages >>> like mds0: Client X failing to respond to capability release after some >>> minutes. Requests from other clients start to block after a while. >>> Rebooting the client named client resolves the issue. >> There are some bugs around this functionality, but I *think* your >> clients are new enough it shouldn't be an issue. >> However, it's entirely possible your clients are actually making use >> of enough inodes that the MDS server is running into its default >> limits. If your MDS has memory available, you probably want to >> increase the cache size from its default 100k (mds cache size = X). > mds_cache_size is already 400 and so a lot higher than usual. > (google said I should increase ...) > >> Or maybe your kernels are too old; Zheng would know. > We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x > for the clients should be possible if that might help. > mds log should contain messages like: client. isn't responding to mclientcaps(revoke) please send these messages to us. Regards Yan, Zheng > -- > Kind regards > Michael > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client X failing to respond to capability release
Hi, Am 03.02.2016 um 10:26 schrieb Gregory Farnum: > On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner > GmbH wrote: >> Putting some higher load via cephfs on the cluster leads to messages >> like mds0: Client X failing to respond to capability release after some >> minutes. Requests from other clients start to block after a while. >> Rebooting the client named client resolves the issue. > There are some bugs around this functionality, but I *think* your > clients are new enough it shouldn't be an issue. > However, it's entirely possible your clients are actually making use > of enough inodes that the MDS server is running into its default > limits. If your MDS has memory available, you probably want to > increase the cache size from its default 100k (mds cache size = X). mds_cache_size is already 400 and so a lot higher than usual. (google said I should increase ...) > Or maybe your kernels are too old; Zheng would know. We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x for the clients should be possible if that might help. -- Kind regards Michael ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client X failing to respond to capability release
On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner GmbH wrote: > Hi, > > we're experiencing some strange issues running ceph 0.87 in our, I > think, quite large cluster (taking number of objects as a measurement). > > mdsmap e721086: 1/1/1 up {0=storagemds01=up:active}, 2 up:standby > osdmap e143048: 92 osds: 92 up, 92 in > flags noout,noscrub,nodeep-scrub > pgmap v45790682: 4736 pgs, 6 pools, 109 TB data, 3841 Mobjects > 255 TB used, 48892 GB / 303 TB avail > > Putting some higher load via cephfs on the cluster leads to messages > like mds0: Client X failing to respond to capability release after some > minutes. Requests from other clients start to block after a while. > > Rebooting the client named client resolves the issue. There are some bugs around this functionality, but I *think* your clients are new enough it shouldn't be an issue. However, it's entirely possible your clients are actually making use of enough inodes that the MDS server is running into its default limits. If your MDS has memory available, you probably want to increase the cache size from its default 100k (mds cache size = X). Or maybe your kernels are too old; Zheng would know. -Greg > > Clients are a mix of CentOS6 & CentOS7 running kernel > 4.1.4-1.el7.elrepo.x86_64 > 4.1.4-1.el6.elrepo.x86_64 > 4.4.0-2.el6.elrepo.x86_64 > but other releases show the same behavior. > > Currently running 3 OSD Nodes and 3 combined MDS/MON-Nodes. > > What information do you need to further track down this issue? Quite > unsure so this is only a rough overview of the setup. > > > We have another issue with sometimes broken files ; bad checksums after > storage, but I think I will start a new thread for this ;-) > > Thanks! > > -- > Kind Regards > Michael > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mds0: Client X failing to respond to capability release
Hi, we're experiencing some strange issues running ceph 0.87 in our, I think, quite large cluster (taking number of objects as a measurement). mdsmap e721086: 1/1/1 up {0=storagemds01=up:active}, 2 up:standby osdmap e143048: 92 osds: 92 up, 92 in flags noout,noscrub,nodeep-scrub pgmap v45790682: 4736 pgs, 6 pools, 109 TB data, 3841 Mobjects 255 TB used, 48892 GB / 303 TB avail Putting some higher load via cephfs on the cluster leads to messages like mds0: Client X failing to respond to capability release after some minutes. Requests from other clients start to block after a while. Rebooting the client named client resolves the issue. Clients are a mix of CentOS6 & CentOS7 running kernel 4.1.4-1.el7.elrepo.x86_64 4.1.4-1.el6.elrepo.x86_64 4.4.0-2.el6.elrepo.x86_64 but other releases show the same behavior. Currently running 3 OSD Nodes and 3 combined MDS/MON-Nodes. What information do you need to further track down this issue? Quite unsure so this is only a rough overview of the setup. We have another issue with sometimes broken files ; bad checksums after storage, but I think I will start a new thread for this ;-) Thanks! -- Kind Regards Michael ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com