subject:"\[ceph\-users\] mds0\: Client X failing to respond to capability release"

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-08 Thread Gregory Farnum

On Fri, Feb 5, 2016 at 10:19 PM, Michael Metz-Martini | SpeedPartner
GmbH  wrote:
> Hi,
>
> Am 06.02.2016 um 07:15 schrieb Yan, Zheng:
>>> On Feb 6, 2016, at 13:41, Michael Metz-Martini | SpeedPartner GmbH 
>>>  wrote:
>>> Am 04.02.2016 um 15:38 schrieb Yan, Zheng:
> On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> Am 04.02.2016 um 09:43 schrieb Yan, Zheng:
>> On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner
>> GmbH  wrote:
>>> Am 03.02.2016 um 15:55 schrieb Yan, Zheng:
> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
>>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>>>  wrote:
>>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
 On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | 
 SpeedPartner
> 2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
> [WRN] : 7 slow requests, 6 included below; oldest blocked for >
> 62.125785 secs
> 2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03
> 14:41:23.455812: client_request(client.10199855:1313157 getattr
> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
> rdlock, waiting
 This seems like dirty page writeback is too slow.  Is there any hung 
 OSD request in /sys/kernel/debug/ceph/xxx/osdc?
> Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed)
 That’s quite a lot requests. Could you pick some requests in osdc, and 
 check how long do these requests last.
>>> After stopping load/access to cephfs there are a few requests left:
>>> 330 osd87   5.72c3bf71  100826d5cdc.0002write
>>> 508 osd87   5.569ad068  100826d5d18.write
>>> 668 osd87   5.3db54b00  100826d5d4d.0001write
>>> 799 osd87   5.65f8c4e0  100826d5d79.write
>>> 874 osd87   5.d238da71  100826d5d98.write
>>> 1023osd87   5.705950e0  100826d5e2d.write
>>> 1277osd87   5.33673f71  100826d5f2a.write
>>> 1329osd87   5.e81ab868  100826d5f5e.write
>>> 1392osd87   5.aea1c771  100826d5f9c.write
>>>
>>> osd.87 is near full and currently has some pg's with backfill_toofull
>>> but can this be the reason for this?
>>
>> Yes, it’s likely.
> But "why"?
> I thought that reads/writes are still possible but not replicated /
> objects are degraded.

As long as all the PGs are "active" they'll still accept reads/writes,
but it's possible that osd 87 is just so busy that the clients are all
stuck waiting for it.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-05 Thread Michael Metz-Martini | SpeedPartner GmbH

Hi,

Am 06.02.2016 um 07:15 schrieb Yan, Zheng:
>> On Feb 6, 2016, at 13:41, Michael Metz-Martini | SpeedPartner GmbH 
>>  wrote:
>> Am 04.02.2016 um 15:38 schrieb Yan, Zheng:
 On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH 
  wrote:
 Am 04.02.2016 um 09:43 schrieb Yan, Zheng:
> On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner
> GmbH  wrote:
>> Am 03.02.2016 um 15:55 schrieb Yan, Zheng:
 On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH 
  wrote:
 Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>>  wrote:
>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
>>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
 2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
 [WRN] : 7 slow requests, 6 included below; oldest blocked for >
 62.125785 secs
 2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
 [WRN] : slow request 62.125785 seconds old, received at 2016-02-03
 14:41:23.455812: client_request(client.10199855:1313157 getattr
 pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
 rdlock, waiting
>>> This seems like dirty page writeback is too slow.  Is there any hung 
>>> OSD request in /sys/kernel/debug/ceph/xxx/osdc?
 Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed)
>>> That’s quite a lot requests. Could you pick some requests in osdc, and 
>>> check how long do these requests last.
>> After stopping load/access to cephfs there are a few requests left:
>> 330 osd87   5.72c3bf71  100826d5cdc.0002write
>> 508 osd87   5.569ad068  100826d5d18.write
>> 668 osd87   5.3db54b00  100826d5d4d.0001write
>> 799 osd87   5.65f8c4e0  100826d5d79.write
>> 874 osd87   5.d238da71  100826d5d98.write
>> 1023osd87   5.705950e0  100826d5e2d.write
>> 1277osd87   5.33673f71  100826d5f2a.write
>> 1329osd87   5.e81ab868  100826d5f5e.write
>> 1392osd87   5.aea1c771  100826d5f9c.write
>>
>> osd.87 is near full and currently has some pg's with backfill_toofull
>> but can this be the reason for this?
> 
> Yes, it’s likely.  
But "why"?
I thought that reads/writes are still possible but not replicated /
objects are degraded.


-- 
Kind regards
 Michael Metz-Martini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-05 Thread Yan, Zheng


> On Feb 6, 2016, at 13:41, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> 
> Hi,
> 
> sorry for the delay - productional system unfortunately ;-(
> 
> Am 04.02.2016 um 15:38 schrieb Yan, Zheng:
>>> On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH 
>>>  wrote:
>>> Am 04.02.2016 um 09:43 schrieb Yan, Zheng:
 On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner
 GmbH  wrote:
> Am 03.02.2016 um 15:55 schrieb Yan, Zheng:
>>> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH 
>>>  wrote:
>>> Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
>>> 2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
>>> [WRN] : 7 slow requests, 6 included below; oldest blocked for >
>>> 62.125785 secs
>>> 2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
>>> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03
>>> 14:41:23.455812: client_request(client.10199855:1313157 getattr
>>> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
>>> rdlock, waiting
>> This seems like dirty page writeback is too slow.  Is there any hung OSD 
>> request in /sys/kernel/debug/ceph/xxx/osdc?
>>> Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed)
>> That’s quite a lot requests. Could you pick some requests in osdc, and check 
>> how long do these requests last.
> After stopping load/access to cephfs there are a few requests left:
> 330 osd87   5.72c3bf71  100826d5cdc.0002write
> 508 osd87   5.569ad068  100826d5d18.write
> 668 osd87   5.3db54b00  100826d5d4d.0001write
> 799 osd87   5.65f8c4e0  100826d5d79.write
> 874 osd87   5.d238da71  100826d5d98.write
> 1023osd87   5.705950e0  100826d5e2d.write
> 1277osd87   5.33673f71  100826d5f2a.write
> 1329osd87   5.e81ab868  100826d5f5e.write
> 1392osd87   5.aea1c771  100826d5f9c.write
> 
> osd.87 is near full and currently has some pg's with backfill_toofull
> but can this be the reason for this?

Yes, it’s likely.  

> 
> -- 
> Kind regards
> Michael Metz-Martini

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-05 Thread Michael Metz-Martini | SpeedPartner GmbH

Hi,

sorry for the delay - productional system unfortunately ;-(

Am 04.02.2016 um 15:38 schrieb Yan, Zheng:
>> On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH 
>>  wrote:
>> Am 04.02.2016 um 09:43 schrieb Yan, Zheng:
>>> On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner
>>> GmbH  wrote:
 Am 03.02.2016 um 15:55 schrieb Yan, Zheng:
>> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH 
>>  wrote:
>> Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
 On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
  wrote:
 Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
>> 2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
>> [WRN] : 7 slow requests, 6 included below; oldest blocked for >
>> 62.125785 secs
>> 2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
>> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03
>> 14:41:23.455812: client_request(client.10199855:1313157 getattr
>> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
>> rdlock, waiting
> This seems like dirty page writeback is too slow.  Is there any hung OSD 
> request in /sys/kernel/debug/ceph/xxx/osdc?
>> Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed)
> That’s quite a lot requests. Could you pick some requests in osdc, and check 
> how long do these requests last.
After stopping load/access to cephfs there are a few requests left:
330 osd87   5.72c3bf71  100826d5cdc.0002write
508 osd87   5.569ad068  100826d5d18.write
668 osd87   5.3db54b00  100826d5d4d.0001write
799 osd87   5.65f8c4e0  100826d5d79.write
874 osd87   5.d238da71  100826d5d98.write
1023osd87   5.705950e0  100826d5e2d.write
1277osd87   5.33673f71  100826d5f2a.write
1329osd87   5.e81ab868  100826d5f5e.write
1392osd87   5.aea1c771  100826d5f9c.write

osd.87 is near full and currently has some pg's with backfill_toofull
but can this be the reason for this?

-- 
Kind regards
 Michael Metz-Martini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-04 Thread Yan, Zheng


> On Feb 4, 2016, at 17:00, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> 
> Hi,
> 
> Am 04.02.2016 um 09:43 schrieb Yan, Zheng:
>> On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner
>> GmbH  wrote:
>>> Am 03.02.2016 um 15:55 schrieb Yan, Zheng:
> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
>>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>>>  wrote:
>>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
 On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
> 2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
> [WRN] : 7 slow requests, 6 included below; oldest blocked for >
> 62.125785 secs
> 2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03
> 14:41:23.455812: client_request(client.10199855:1313157 getattr
> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
> rdlock, waiting
 
 This seems like dirty page writeback is too slow.  Is there any hung OSD 
 request in /sys/kernel/debug/ceph/xxx/osdc?
>>> Where should I check? Client or mds? Do I have to enable something to
>>> get this details? Directory /sys/kernel/debug/ceph/ seems to be missing.
>> On client with kernel ceph mount. If there is no debugfs, mount
>> debugfs first (mount -t debugfs /sys/kernel/debug /sys/kernel/debug)
> Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed)

That’s quite a lot requests. Could you pick some requests in osdc, and check 
how long do these requests last.

> 
> By looking around I found caps .. $ cat caps
> total   305975
> avail   2
> used305973
> reserved0
> min 1024
> 
> Somehow related? avail=2 is low ;-)

This is not problem.



> 
> -- 
> Kind regards
> Michael Metz-Martini

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-04 Thread Michael Metz-Martini | SpeedPartner GmbH

Hi,

Am 04.02.2016 um 09:43 schrieb Yan, Zheng:
> On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner
> GmbH  wrote:
>> Am 03.02.2016 um 15:55 schrieb Yan, Zheng:
 On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH 
  wrote:
 Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>>  wrote:
>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
>>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
 2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
 [WRN] : 7 slow requests, 6 included below; oldest blocked for >
 62.125785 secs
 2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
 [WRN] : slow request 62.125785 seconds old, received at 2016-02-03
 14:41:23.455812: client_request(client.10199855:1313157 getattr
 pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
 rdlock, waiting
>>>
>>> This seems like dirty page writeback is too slow.  Is there any hung OSD 
>>> request in /sys/kernel/debug/ceph/xxx/osdc?
>> Where should I check? Client or mds? Do I have to enable something to
>> get this details? Directory /sys/kernel/debug/ceph/ seems to be missing.
> On client with kernel ceph mount. If there is no debugfs, mount
> debugfs first (mount -t debugfs /sys/kernel/debug /sys/kernel/debug)
Got it. http://www.michael-metz.de/osdc.txt.gz (about 500kb uncompressed)

By looking around I found caps .. $ cat caps
total   305975
avail   2
used305973
reserved0
min 1024

Somehow related? avail=2 is low ;-)

-- 
Kind regards
 Michael Metz-Martini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-04 Thread Yan, Zheng

On Thu, Feb 4, 2016 at 4:36 PM, Michael Metz-Martini | SpeedPartner
GmbH  wrote:
> Hi,
>
> Am 03.02.2016 um 15:55 schrieb Yan, Zheng:
>>> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH 
>>>  wrote:
>>> Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
>>> 2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
>>> [WRN] : 7 slow requests, 6 included below; oldest blocked for >
>>> 62.125785 secs
>>> 2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
>>> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03
>>> 14:41:23.455812: client_request(client.10199855:1313157 getattr
>>> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
>>> rdlock, waiting
>>
>> This seems like dirty page writeback is too slow.  Is there any hung OSD 
>> request in /sys/kernel/debug/ceph/xxx/osdc?
> Where should I check? Client or mds? Do I have to enable something to
> get this details? Directory /sys/kernel/debug/ceph/ seems to be missing.
>

On client with kernel ceph mount. If there is no debugfs, mount
debugfs first (mount -t debugfs /sys/kernel/debug /sys/kernel/debug)

Regards
Yan, Zheng

>
> --
> Kind regards
>  Michael Metz-Martini
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-04 Thread Michael Metz-Martini | SpeedPartner GmbH

Hi,

Am 03.02.2016 um 15:55 schrieb Yan, Zheng:
>> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH 
>>  wrote:
>> Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
 On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
  wrote:
 Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
>> 2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
>> [WRN] : 7 slow requests, 6 included below; oldest blocked for >
>> 62.125785 secs
>> 2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
>> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03
>> 14:41:23.455812: client_request(client.10199855:1313157 getattr
>> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
>> rdlock, waiting
> 
> This seems like dirty page writeback is too slow.  Is there any hung OSD 
> request in /sys/kernel/debug/ceph/xxx/osdc?
Where should I check? Client or mds? Do I have to enable something to
get this details? Directory /sys/kernel/debug/ceph/ seems to be missing.


-- 
Kind regards
 Michael Metz-Martini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-03 Thread Yan, Zheng


> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> 
> Hi,
> 
> Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
>>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>>>  wrote:
>>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
 On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
 GmbH  wrote:
 Or maybe your kernels are too old; Zheng would know.
>>> We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x
>>> for the clients should be possible if that might help.
>> mds log should contain messages like:
>> 
>> client. isn't responding to mclientcaps(revoke)
>> 
>> please send these messages to us.
> 2016-02-03 14:42:25.568800 7fadfd280700  2 mds.0.cache
> check_memory_usage total 17302804, rss 16604996, heap 42916, malloc
> -1008738 mmap 0, baseline 39844, buffers 0, max 1048576, 881503 /
> 388 inodes have caps, 882499 caps, 0.220625 caps per inode
> 2016-02-03 14:42:25.581494 7fadfd280700  0 log_channel(default) log
> [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
> 100815bd349 pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.127500
> seconds ago
> 2016-02-03 14:42:25.581519 7fadfd280700  0 log_channel(default) log
> [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
> 100815bf1af pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.085996
> seconds ago
> 2016-02-03 14:42:25.581527 7fadfd280700  0 log_channel(default) log
> [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
> 100815bf4d3 pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.084284
> seconds ago
> 2016-02-03 14:42:25.581534 7fadfd280700  0 log_channel(default) log
> [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
> 100815d2500 pending pAsLsXsFsc issued pAsLsXsFscb, sent 61.731320
> seconds ago
> 2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
> [WRN] : 7 slow requests, 6 included below; oldest blocked for >
> 62.125785 secs
> 2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03
> 14:41:23.455812: client_request(client.10199855:1313157 getattr
> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
> rdlock, waiting

This seems like dirty page writeback is too slow.  Is there any hung OSD 
request in /sys/kernel/debug/ceph/xxx/osdc?

> 
> -- 
> Kind regards
> Michael

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-03 Thread Michael Metz-Martini | SpeedPartner GmbH

Hi,

Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>>  wrote:
>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
>>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
>>> GmbH  wrote:
>>> Or maybe your kernels are too old; Zheng would know.
>> We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x
>> for the clients should be possible if that might help.
> mds log should contain messages like:
> 
> client. isn't responding to mclientcaps(revoke)
> 
> please send these messages to us.
2016-02-03 14:42:25.568800 7fadfd280700  2 mds.0.cache
check_memory_usage total 17302804, rss 16604996, heap 42916, malloc
-1008738 mmap 0, baseline 39844, buffers 0, max 1048576, 881503 /
388 inodes have caps, 882499 caps, 0.220625 caps per inode
2016-02-03 14:42:25.581494 7fadfd280700  0 log_channel(default) log
[WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
100815bd349 pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.127500
seconds ago
2016-02-03 14:42:25.581519 7fadfd280700  0 log_channel(default) log
[WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
100815bf1af pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.085996
seconds ago
2016-02-03 14:42:25.581527 7fadfd280700  0 log_channel(default) log
[WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
100815bf4d3 pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.084284
seconds ago
2016-02-03 14:42:25.581534 7fadfd280700  0 log_channel(default) log
[WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
100815d2500 pending pAsLsXsFsc issued pAsLsXsFscb, sent 61.731320
seconds ago
2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
[WRN] : 7 slow requests, 6 included below; oldest blocked for >
62.125785 secs
2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
[WRN] : slow request 62.125785 seconds old, received at 2016-02-03
14:41:23.455812: client_request(client.10199855:1313157 getattr
pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
rdlock, waiting

-- 
Kind regards
 Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-03 Thread Yan, Zheng


> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> 
> Hi,
> 
> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
>> GmbH  wrote:
>>> Putting some higher load via cephfs on the cluster leads to messages
>>> like mds0: Client X failing to respond to capability release after some
>>> minutes. Requests from other clients start to block after a while.
>>> Rebooting the client named client resolves the issue.
>> There are some bugs around this functionality, but I *think* your
>> clients are new enough it shouldn't be an issue.
>> However, it's entirely possible your clients are actually making use
>> of enough inodes that the MDS server is running into its default
>> limits. If your MDS has memory available, you probably want to
>> increase the cache size from its default 100k (mds cache size = X).
> mds_cache_size is already 400 and so a lot higher than usual.
> (google said I should increase ...)
> 
>> Or maybe your kernels are too old; Zheng would know.
> We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x
> for the clients should be possible if that might help.
> 

mds log should contain messages like:

client. isn't responding to mclientcaps(revoke)

please send these messages to us.

Regards
Yan, Zheng



> -- 
> Kind regards
> Michael
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-03 Thread Michael Metz-Martini | SpeedPartner GmbH

Hi,

Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
> GmbH  wrote:
>> Putting some higher load via cephfs on the cluster leads to messages
>> like mds0: Client X failing to respond to capability release after some
>> minutes. Requests from other clients start to block after a while.
>> Rebooting the client named client resolves the issue.
> There are some bugs around this functionality, but I *think* your
> clients are new enough it shouldn't be an issue.
> However, it's entirely possible your clients are actually making use
> of enough inodes that the MDS server is running into its default
> limits. If your MDS has memory available, you probably want to
> increase the cache size from its default 100k (mds cache size = X).
mds_cache_size is already 400 and so a lot higher than usual.
(google said I should increase ...)

> Or maybe your kernels are too old; Zheng would know.
We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x
for the clients should be possible if that might help.

-- 
Kind regards
 Michael

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-03 Thread Gregory Farnum

On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
GmbH  wrote:
> Hi,
>
> we're experiencing some strange issues running ceph 0.87 in our, I
> think, quite large cluster (taking number of objects as a measurement).
>
>  mdsmap e721086: 1/1/1 up {0=storagemds01=up:active}, 2 up:standby
>  osdmap e143048: 92 osds: 92 up, 92 in
> flags noout,noscrub,nodeep-scrub
>   pgmap v45790682: 4736 pgs, 6 pools, 109 TB data, 3841 Mobjects
> 255 TB used, 48892 GB / 303 TB avail
>
> Putting some higher load via cephfs on the cluster leads to messages
> like mds0: Client X failing to respond to capability release after some
> minutes. Requests from other clients start to block after a while.
>
> Rebooting the client named client resolves the issue.

There are some bugs around this functionality, but I *think* your
clients are new enough it shouldn't be an issue.
However, it's entirely possible your clients are actually making use
of enough inodes that the MDS server is running into its default
limits. If your MDS has memory available, you probably want to
increase the cache size from its default 100k (mds cache size = X).

Or maybe your kernels are too old; Zheng would know.
-Greg

>
> Clients are a mix of CentOS6 & CentOS7 running kernel
> 4.1.4-1.el7.elrepo.x86_64
> 4.1.4-1.el6.elrepo.x86_64
> 4.4.0-2.el6.elrepo.x86_64
> but other releases show the same behavior.
>
> Currently running 3 OSD Nodes and 3 combined MDS/MON-Nodes.
>
> What information do you need to further track down this issue? Quite
> unsure so this is only a rough overview of the setup.
>
>
> We have another issue with sometimes broken files ; bad checksums after
> storage, but I think I will start a new thread for this ;-)
>
> Thanks!
>
> --
> Kind Regards
>  Michael
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] mds0: Client X failing to respond to capability release

2016-02-02 Thread Michael Metz-Martini | SpeedPartner GmbH

Hi,

we're experiencing some strange issues running ceph 0.87 in our, I
think, quite large cluster (taking number of objects as a measurement).

 mdsmap e721086: 1/1/1 up {0=storagemds01=up:active}, 2 up:standby
 osdmap e143048: 92 osds: 92 up, 92 in
flags noout,noscrub,nodeep-scrub
  pgmap v45790682: 4736 pgs, 6 pools, 109 TB data, 3841 Mobjects
255 TB used, 48892 GB / 303 TB avail

Putting some higher load via cephfs on the cluster leads to messages
like mds0: Client X failing to respond to capability release after some
minutes. Requests from other clients start to block after a while.

Rebooting the client named client resolves the issue.

Clients are a mix of CentOS6 & CentOS7 running kernel
4.1.4-1.el7.elrepo.x86_64
4.1.4-1.el6.elrepo.x86_64
4.4.0-2.el6.elrepo.x86_64
but other releases show the same behavior.

Currently running 3 OSD Nodes and 3 combined MDS/MON-Nodes.

What information do you need to further track down this issue? Quite
unsure so this is only a rough overview of the setup.


We have another issue with sometimes broken files ; bad checksums after
storage, but I think I will start a new thread for this ;-)

Thanks!

-- 
Kind Regards
 Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds0: Client X failing to respond to capability release

Re: [ceph-users] mds0: Client X failing to respond to capability release

Re: [ceph-users] mds0: Client X failing to respond to capability release

Re: [ceph-users] mds0: Client X failing to respond to capability release

Re: [ceph-users] mds0: Client X failing to respond to capability release

Re: [ceph-users] mds0: Client X failing to respond to capability release

Re: [ceph-users] mds0: Client X failing to respond to capability release

Re: [ceph-users] mds0: Client X failing to respond to capability release

Re: [ceph-users] mds0: Client X failing to respond to capability release

Re: [ceph-users] mds0: Client X failing to respond to capability release

Re: [ceph-users] mds0: Client X failing to respond to capability release

Re: [ceph-users] mds0: Client X failing to respond to capability release

Re: [ceph-users] mds0: Client X failing to respond to capability release

[ceph-users] mds0: Client X failing to respond to capability release

14 matches

Site Navigation

Mail list logo

Footer information