Re: [ceph-users] large memory leak on scrubbing

2013-08-19 Thread Mostowiec Dominik
Hi,
 Is that the only slow request message you see?
No.
Full log: https://www.dropbox.com/s/i3ep5dcimndwvj1/slow_requests.txt.tar.gz 
It start from:
2013-08-16 09:43:39.662878 mon.0 10.174.81.132:6788/0 4276384 : [DBG] osd.4 
10.174.81.131:6805/31460 reported failed by osd.50 10.174.81.135:6842/26019
2013-08-16 09:43:40.711911 mon.0 10.174.81.132:6788/0 4276386 : [DBG] osd.4 
10.174.81.131:6805/31460 reported failed by osd.14 10.174.81.132:6836/2958
2013-08-16 09:43:41.043016 mon.0 10.174.81.132:6788/0 4276388 : [DBG] osd.4 
10.174.81.131:6805/31460 reported failed by osd.13 10.174.81.132:6830/2482
2013-08-16 09:43:41.043047 mon.0 10.174.81.132:6788/0 4276389 : [INF] osd.4 
10.174.81.131:6805/31460 failed (3 reports from 3 peers after 2013-08-16 
09:43:56.042983 = grace 20.00)
2013-08-16 09:43:41.122326 mon.0 10.174.81.132:6788/0 4276390 : [INF] osdmap 
e10294: 144 osds: 143 up, 143 in
2013-08-16 09:43:38.798833 osd.4 10.174.81.131:6805/31460 913 : [WRN] 6 slow 
requests, 6 included below; oldest blocked for  30.190146 secs
2013-08-16 09:43:38.798843 osd.4 10.174.81.131:6805/31460 914 : [WRN] slow 
request 30.190146 seconds old, received at 2013-08-16 09:43:08.585504: 
osd_op(client.22301645.0:48987 .dir.1585245.1 [call rgw.bucket_complete_op] 
16.33d5ea80) v4 currently waiting for subops from [25,133]
2013-08-16 09:43:38.798854 osd.4 10.174.81.131:6805/31460 915 : [WRN] slow 
request 30.189643 seconds old, received at 2013-08-16 09:43:08.586007: 
osd_op(client.22301855.0:49374 .dir.1585245.1 [call rgw.bucket_complete_op] 
16.33d5ea80) v4 currently waiting for subops from [25,133]
2013-08-16 09:43:38.798859 osd.4 10.174.81.131:6805/31460 916 : [WRN] slow 
request 30.188236 seconds old, received at 2013-08-16 09:43:08.587414: 
osd_op(client.22307596.0:47674 .dir.1585245.1 [call rgw.bucket_complete_op] 
16.33d5ea80) v4 currently waiting for subops from [25,133]
2013-08-16 09:43:38.798862 osd.4 10.174.81.131:6805/31460 917 : [WRN] slow 
request 30.187853 seconds old, received at 2013-08-16 09:43:08.587797: 
osd_op(client.22303894.0:51846 .dir.1585245.1 [call rgw.bucket_complete_op] 
16.33d5ea80) v4 currently waiting for subops from [25,133]
...
2013-08-16 09:44:18.126318 mon.0 10.174.81.132:6788/0 4276427 : [INF] osd.4 
10.174.81.131:6805/31460 boot
...
2013-08-16 09:44:23.215918 mon.0 10.174.81.132:6788/0 4276437 : [DBG] osd.25 
10.174.81.133:6810/2961 reported failed by osd.83 10.174.81.137:6837/27963
2013-08-16 09:44:23.704769 mon.0 10.174.81.132:6788/0 4276438 : [INF] pgmap 
v17035051: 32424 pgs: 1 stale+active+clean+scrubbing+deep, 2 active, 31965 
active+clean, 7 stale+active+clean, 29 peering, 415 active+degraded, 5 
active+clean+scrubbing; 6630 GB data, 21420 GB used, 371 TB / 392 TB avail; 
246065/61089697 degraded (0.403%)
2013-08-16 09:44:23.711244 mon.0 10.174.81.132:6788/0 4276439 : [DBG] osd.133 
10.174.81.142:6803/21366 reported failed by osd.26 10.174.81.133:6814/3674
2013-08-16 09:44:23.713597 mon.0 10.174.81.132:6788/0 4276440 : [DBG] osd.133 
10.174.81.142:6803/21366 reported failed by osd.17 10.174.81.132:6806/9188
2013-08-16 09:44:23.753952 mon.0 10.174.81.132:6788/0 4276441 : [DBG] osd.133 
10.174.81.142:6803/21366 reported failed by osd.27 10.174.81.133:6822/5389
2013-08-16 09:44:23.753982 mon.0 10.174.81.132:6788/0 4276442 : [INF] osd.133 
10.174.81.142:6803/21366 failed (3 reports from 3 peers after 2013-08-16 
09:44:38.753913 = grace 20.00)


2013-08-16 09:47:10.229099 mon.0 10.174.81.132:6788/0 4276646 : [INF] pgmap 
v17035216: 32424 pgs: 32424 active+clean; 6630 GB data, 21420 GB used, 371 TB / 
392 TB avail; 0B/s rd, 622KB/s wr, 85op/s

Why osd's are 'reported failed' on scrubbing?

--
Regards 
Dominik 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] large memory leak on scrubbing

2013-08-17 Thread Sage Weil
Hi Dominic,

There is a bug fixed a couple of months back that fixes excessive memory 
consumption during scrub.  You can upgrade to the latest 'bobtail' branch.  
See

 http://ceph.com/docs/master/install/debian/#development-testing-packages

Installing that package should clear this up.

sage


On Fri, 16 Aug 2013, Mostowiec Dominik wrote:

 Hi,
 We noticed some issues on CEPH/S3 cluster, I think it related with scrubbing: 
 large memory leaks.
 
 Logs 09.xx: 
 https://www.dropbox.com/s/4z1fzg239j43igs/ceph-osd.4.log_09xx.tar.gz
 From 09.30 to 09.44 (14 minutes) osd.4 proces grows up to 28G. 
 
 I think this is something curious:
 2013-08-16 09:43:48.801331 7f6570d2e700  0 log [WRN] : slow request 32.794125 
 seconds old, received at 2013-08-16 09:43:16.007104: osd_sub_op(unknown.0.0:0 
 16.113d 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[]) v7 
 currently no flag points reached
 
 We have large rgw index and a lot of large files than on this cluster.
 ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)
 Setup: 
 - 12 servers x 12 OSD 
 - 3 mons
 Default scrubbing settings.
 Journal and filestore settings:
 journal aio = true
 filestore flush min = 0
 filestore flusher = false
 filestore fiemap = false
 filestore op threads = 4
 filestore queue max ops = 4096
 filestore queue max bytes = 10485760
 filestore queue committing max bytes = 10485760
 journal max write bytes = 10485760
 journal queue max bytes = 10485760
 ms dispatch throttle bytes = 10485760
 objecter infilght op bytes = 10485760
 
 Is this a known bug in this version?
 (Do you know some workaround to fix this?)
 
 ---
 Regards
 Dominik
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] large memory leak on scrubbing

2013-08-16 Thread Sylvain Munaut
Hi,


 Is this a known bug in this version?

Yes.

 (Do you know some workaround to fix this?)

Upgrade.


Cheers,

Sylvain
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com