Re: [ceph-users] large memory leak on scrubbing

2013-08-19 Thread Mostowiec Dominik
Hi,
> Is that the only slow request message you see?
No.
Full log: https://www.dropbox.com/s/i3ep5dcimndwvj1/slow_requests.txt.tar.gz 
It start from:
2013-08-16 09:43:39.662878 mon.0 10.174.81.132:6788/0 4276384 : [DBG] osd.4 
10.174.81.131:6805/31460 reported failed by osd.50 10.174.81.135:6842/26019
2013-08-16 09:43:40.711911 mon.0 10.174.81.132:6788/0 4276386 : [DBG] osd.4 
10.174.81.131:6805/31460 reported failed by osd.14 10.174.81.132:6836/2958
2013-08-16 09:43:41.043016 mon.0 10.174.81.132:6788/0 4276388 : [DBG] osd.4 
10.174.81.131:6805/31460 reported failed by osd.13 10.174.81.132:6830/2482
2013-08-16 09:43:41.043047 mon.0 10.174.81.132:6788/0 4276389 : [INF] osd.4 
10.174.81.131:6805/31460 failed (3 reports from 3 peers after 2013-08-16 
09:43:56.042983 >= grace 20.00)
2013-08-16 09:43:41.122326 mon.0 10.174.81.132:6788/0 4276390 : [INF] osdmap 
e10294: 144 osds: 143 up, 143 in
2013-08-16 09:43:38.798833 osd.4 10.174.81.131:6805/31460 913 : [WRN] 6 slow 
requests, 6 included below; oldest blocked for > 30.190146 secs
2013-08-16 09:43:38.798843 osd.4 10.174.81.131:6805/31460 914 : [WRN] slow 
request 30.190146 seconds old, received at 2013-08-16 09:43:08.585504: 
osd_op(client.22301645.0:48987 .dir.1585245.1 [call rgw.bucket_complete_op] 
16.33d5ea80) v4 currently waiting for subops from [25,133]
2013-08-16 09:43:38.798854 osd.4 10.174.81.131:6805/31460 915 : [WRN] slow 
request 30.189643 seconds old, received at 2013-08-16 09:43:08.586007: 
osd_op(client.22301855.0:49374 .dir.1585245.1 [call rgw.bucket_complete_op] 
16.33d5ea80) v4 currently waiting for subops from [25,133]
2013-08-16 09:43:38.798859 osd.4 10.174.81.131:6805/31460 916 : [WRN] slow 
request 30.188236 seconds old, received at 2013-08-16 09:43:08.587414: 
osd_op(client.22307596.0:47674 .dir.1585245.1 [call rgw.bucket_complete_op] 
16.33d5ea80) v4 currently waiting for subops from [25,133]
2013-08-16 09:43:38.798862 osd.4 10.174.81.131:6805/31460 917 : [WRN] slow 
request 30.187853 seconds old, received at 2013-08-16 09:43:08.587797: 
osd_op(client.22303894.0:51846 .dir.1585245.1 [call rgw.bucket_complete_op] 
16.33d5ea80) v4 currently waiting for subops from [25,133]
...
2013-08-16 09:44:18.126318 mon.0 10.174.81.132:6788/0 4276427 : [INF] osd.4 
10.174.81.131:6805/31460 boot
...
2013-08-16 09:44:23.215918 mon.0 10.174.81.132:6788/0 4276437 : [DBG] osd.25 
10.174.81.133:6810/2961 reported failed by osd.83 10.174.81.137:6837/27963
2013-08-16 09:44:23.704769 mon.0 10.174.81.132:6788/0 4276438 : [INF] pgmap 
v17035051: 32424 pgs: 1 stale+active+clean+scrubbing+deep, 2 active, 31965 
active+clean, 7 stale+active+clean, 29 peering, 415 active+degraded, 5 
active+clean+scrubbing; 6630 GB data, 21420 GB used, 371 TB / 392 TB avail; 
246065/61089697 degraded (0.403%)
2013-08-16 09:44:23.711244 mon.0 10.174.81.132:6788/0 4276439 : [DBG] osd.133 
10.174.81.142:6803/21366 reported failed by osd.26 10.174.81.133:6814/3674
2013-08-16 09:44:23.713597 mon.0 10.174.81.132:6788/0 4276440 : [DBG] osd.133 
10.174.81.142:6803/21366 reported failed by osd.17 10.174.81.132:6806/9188
2013-08-16 09:44:23.753952 mon.0 10.174.81.132:6788/0 4276441 : [DBG] osd.133 
10.174.81.142:6803/21366 reported failed by osd.27 10.174.81.133:6822/5389
2013-08-16 09:44:23.753982 mon.0 10.174.81.132:6788/0 4276442 : [INF] osd.133 
10.174.81.142:6803/21366 failed (3 reports from 3 peers after 2013-08-16 
09:44:38.753913 >= grace 20.00)


2013-08-16 09:47:10.229099 mon.0 10.174.81.132:6788/0 4276646 : [INF] pgmap 
v17035216: 32424 pgs: 32424 active+clean; 6630 GB data, 21420 GB used, 371 TB / 
392 TB avail; 0B/s rd, 622KB/s wr, 85op/s

Why osd's are 'reported failed' on scrubbing?

--
Regards 
Dominik 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] large memory leak on scrubbing

2013-08-19 Thread Sage Weil
On Mon, 19 Aug 2013, Mostowiec Dominik wrote:
> Thanks for your response.
> Great.
> 
> In latest cuttlefish it is also fixed I think?
> 
> We have two problems with scrubbing:
> - memory leaks
> - slow requests and wrongly mark osd with bucket index down (when scrubbing)

The slow requests can trigger if you have very large objects (including 
a very large rgw bucket index object).  But the message you quote below is 
for a scrub-reserve operation, which should really be excluded from the op 
warnings entirely.  Is that the only slow request message you see?

> Now we decided to turn off scrubbing and trigger it on maintenance window.
> I noticed that "ceph osd scrub", or "ceph osd deep-scrub" trigger scrub on 
> osd but not for all PG.
> It is possible to trigger scrubbing all PG on one osd?

It should trigger a scrub on all PGs that are clean.  If a PG is 
recovering it will be skipped.

sage


> 
> --
> Regards 
> Dominik
> 
> 
> -Original Message-
> From: Sage Weil [mailto:s...@inktank.com] 
> Sent: Saturday, August 17, 2013 5:11 PM
> To: Mostowiec Dominik
> Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com; Studzi?ski 
> Krzysztof; Sydor Bohdan
> Subject: Re: [ceph-users] large memory leak on scrubbing
> 
> Hi Dominic,
> 
> There is a bug fixed a couple of months back that fixes excessive memory 
> consumption during scrub.  You can upgrade to the latest 'bobtail' branch.  
> See
> 
>  http://ceph.com/docs/master/install/debian/#development-testing-packages
> 
> Installing that package should clear this up.
> 
> sage
> 
> 
> On Fri, 16 Aug 2013, Mostowiec Dominik wrote:
> 
> > Hi,
> > We noticed some issues on CEPH/S3 cluster, I think it related with 
> > scrubbing: large memory leaks.
> > 
> > Logs 09.xx: 
> > https://www.dropbox.com/s/4z1fzg239j43igs/ceph-osd.4.log_09xx.tar.gz
> > >From 09.30 to 09.44 (14 minutes) osd.4 proces grows up to 28G. 
> > 
> > I think this is something curious:
> > 2013-08-16 09:43:48.801331 7f6570d2e700  0 log [WRN] : slow request 
> > 32.794125 seconds old, received at 2013-08-16 09:43:16.007104: 
> > osd_sub_op(unknown.0.0:0 16.113d 0//0//-1 [scrub-reserve] v 0'0 
> > snapset=0=[]:[] snapc=0=[]) v7 currently no flag points reached
> > 
> > We have large rgw index and a lot of large files than on this cluster.
> > ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)
> > Setup: 
> > - 12 servers x 12 OSD
> > - 3 mons
> > Default scrubbing settings.
> > Journal and filestore settings:
> > journal aio = true
> > filestore flush min = 0
> > filestore flusher = false
> > filestore fiemap = false
> > filestore op threads = 4
> > filestore queue max ops = 4096
> > filestore queue max bytes = 10485760
> > filestore queue committing max bytes = 10485760
> > journal max write bytes = 10485760
> > journal queue max bytes = 10485760
> > ms dispatch throttle bytes = 10485760
> > objecter infilght op bytes = 10485760
> > 
> > Is this a known bug in this version?
> > (Do you know some workaround to fix this?)
> > 
> > ---
> > Regards
> > Dominik
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] large memory leak on scrubbing

2013-08-19 Thread Mostowiec Dominik
Thanks for your response.
Great.

In latest cuttlefish it is also fixed I think?

We have two problems with scrubbing:
- memory leaks
- slow requests and wrongly mark osd with bucket index down (when scrubbing)

Now we decided to turn off scrubbing and trigger it on maintenance window.
I noticed that "ceph osd scrub", or "ceph osd deep-scrub" trigger scrub on osd 
but not for all PG.
It is possible to trigger scrubbing all PG on one osd?

--
Regards 
Dominik


-Original Message-
From: Sage Weil [mailto:s...@inktank.com] 
Sent: Saturday, August 17, 2013 5:11 PM
To: Mostowiec Dominik
Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com; StudziƄski 
Krzysztof; Sydor Bohdan
Subject: Re: [ceph-users] large memory leak on scrubbing

Hi Dominic,

There is a bug fixed a couple of months back that fixes excessive memory 
consumption during scrub.  You can upgrade to the latest 'bobtail' branch.  
See

 http://ceph.com/docs/master/install/debian/#development-testing-packages

Installing that package should clear this up.

sage


On Fri, 16 Aug 2013, Mostowiec Dominik wrote:

> Hi,
> We noticed some issues on CEPH/S3 cluster, I think it related with scrubbing: 
> large memory leaks.
> 
> Logs 09.xx: 
> https://www.dropbox.com/s/4z1fzg239j43igs/ceph-osd.4.log_09xx.tar.gz
> >From 09.30 to 09.44 (14 minutes) osd.4 proces grows up to 28G. 
> 
> I think this is something curious:
> 2013-08-16 09:43:48.801331 7f6570d2e700  0 log [WRN] : slow request 
> 32.794125 seconds old, received at 2013-08-16 09:43:16.007104: 
> osd_sub_op(unknown.0.0:0 16.113d 0//0//-1 [scrub-reserve] v 0'0 
> snapset=0=[]:[] snapc=0=[]) v7 currently no flag points reached
> 
> We have large rgw index and a lot of large files than on this cluster.
> ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)
> Setup: 
> - 12 servers x 12 OSD
> - 3 mons
> Default scrubbing settings.
> Journal and filestore settings:
> journal aio = true
> filestore flush min = 0
> filestore flusher = false
> filestore fiemap = false
> filestore op threads = 4
> filestore queue max ops = 4096
> filestore queue max bytes = 10485760
> filestore queue committing max bytes = 10485760
> journal max write bytes = 10485760
> journal queue max bytes = 10485760
> ms dispatch throttle bytes = 10485760
> objecter infilght op bytes = 10485760
> 
> Is this a known bug in this version?
> (Do you know some workaround to fix this?)
> 
> ---
> Regards
> Dominik
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] large memory leak on scrubbing

2013-08-17 Thread Sage Weil
Hi Dominic,

There is a bug fixed a couple of months back that fixes excessive memory 
consumption during scrub.  You can upgrade to the latest 'bobtail' branch.  
See

 http://ceph.com/docs/master/install/debian/#development-testing-packages

Installing that package should clear this up.

sage


On Fri, 16 Aug 2013, Mostowiec Dominik wrote:

> Hi,
> We noticed some issues on CEPH/S3 cluster, I think it related with scrubbing: 
> large memory leaks.
> 
> Logs 09.xx: 
> https://www.dropbox.com/s/4z1fzg239j43igs/ceph-osd.4.log_09xx.tar.gz
> >From 09.30 to 09.44 (14 minutes) osd.4 proces grows up to 28G. 
> 
> I think this is something curious:
> 2013-08-16 09:43:48.801331 7f6570d2e700  0 log [WRN] : slow request 32.794125 
> seconds old, received at 2013-08-16 09:43:16.007104: osd_sub_op(unknown.0.0:0 
> 16.113d 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[]) v7 
> currently no flag points reached
> 
> We have large rgw index and a lot of large files than on this cluster.
> ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)
> Setup: 
> - 12 servers x 12 OSD 
> - 3 mons
> Default scrubbing settings.
> Journal and filestore settings:
> journal aio = true
> filestore flush min = 0
> filestore flusher = false
> filestore fiemap = false
> filestore op threads = 4
> filestore queue max ops = 4096
> filestore queue max bytes = 10485760
> filestore queue committing max bytes = 10485760
> journal max write bytes = 10485760
> journal queue max bytes = 10485760
> ms dispatch throttle bytes = 10485760
> objecter infilght op bytes = 10485760
> 
> Is this a known bug in this version?
> (Do you know some workaround to fix this?)
> 
> ---
> Regards
> Dominik
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] large memory leak on scrubbing

2013-08-16 Thread Sylvain Munaut
Hi,


> Is this a known bug in this version?

Yes.

> (Do you know some workaround to fix this?)

Upgrade.


Cheers,

Sylvain
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com