Re: [ceph-users] Bad performances in recovery

Somnath Roy Wed, 19 Aug 2015 13:54:55 -0700

Also, check if scrubbing started in the cluster or not. That may considerably 
slow down the cluster.


-----Original Message-----
From: Somnath Roy 
Sent: Wednesday, August 19, 2015 1:35 PM
To: 'J-P Methot'; ceph-us...@ceph.com
Subject: RE: [ceph-users] Bad performances in recovery

All the writes will go through the journal.
It may happen your SSDs are not preconditioned well and after a lot of writes 
during recovery IOs are stabilized to lower number. This is quite common for 
SSDs if that is the case.

Thanks & Regards
Somnath

-----Original Message-----
From: J-P Methot [mailto:jpmet...@gtcomm.net]
Sent: Wednesday, August 19, 2015 1:03 PM
To: Somnath Roy; ceph-us...@ceph.com
Subject: Re: [ceph-users] Bad performances in recovery

Hi,

Thank you for the quick reply. However, we do have those exact settings for 
recovery and it still strongly affects client io. I have looked at various ceph 
logs and osd logs and nothing is out of the ordinary.
Here's an idea though, please tell me if I am wrong.

We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was 
explained several times on this mailing list, Samsung SSDs suck in ceph.
They have horrible O_dsync speed and die easily, when used as journal.
That's why we're using Intel ssds for journaling, so that we didn't end up 
putting 96 samsung SSDs in the trash.

In recovery though, what is the ceph behaviour? What kind of write does it do 
on the OSD SSDs? Does it write directly to the SSDs or through the journal?

Additionally, something else we notice: the ceph cluster is MUCH slower after 
recovery than before. Clearly there is a bottleneck somewhere and that 
bottleneck does not get cleared up after the recovery is done.


On 2015-08-19 3:32 PM, Somnath Roy wrote:
> If you are concerned about *client io performance* during recovery, use these 
> settings..
> 
> osd recovery max active = 1
> osd max backfills = 1
> osd recovery threads = 1
> osd recovery op priority = 1
> 
> If you are concerned about *recovery performance*, you may want to bump this 
> up, but I doubt it will help much from default settings..
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of J-P Methot
> Sent: Wednesday, August 19, 2015 12:17 PM
> To: ceph-us...@ceph.com
> Subject: [ceph-users] Bad performances in recovery
> 
> Hi,
> 
> Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total 
> of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph 
> version is hammer v0.94.1 . There is a performance overhead because we're 
> using SSDs (I've heard it gets better in infernalis, but we're not upgrading 
> just yet) but we can reach numbers that I would consider "alright".
> 
> Now, the issue is, when the cluster goes into recovery it's very fast at 
> first, but then slows down to ridiculous levels as it moves forward. You can 
> go from 7% to 2% to recover in ten minutes, but it may take 2 hours to 
> recover the last 2%. While this happens, the attached openstack setup becomes 
> incredibly slow, even though there is only a small fraction of objects still 
> recovering (less than 1%). The settings that may affect recovery speed are 
> very low, as they are by default, yet they still affect client io speed way 
> more than it should.
> 
> Why would ceph recovery become so slow as it progress and affect client io 
> even though it's recovering at a snail's pace? And by a snail's pace, I mean 
> a few kb/second on 10gbps uplinks.
> --
> ======================
> Jean-Philippe Méthot
> Administrateur système / System administrator GloboTech Communications
> Phone: 1-514-907-0050
> Toll Free: 1-(888)-GTCOMM1
> Fax: 1-(514)-907-0750
> jpmet...@gtcomm.net
> http://www.gtcomm.net
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is 
> intended only for the use of the designated recipient(s) named above. If the 
> reader of this message is not the intended recipient, you are hereby notified 
> that you have received this message in error and that any review, 
> dissemination, distribution, or copying of this message is strictly 
> prohibited. If you have received this communication in error, please notify 
> the sender by telephone or e-mail (as shown above) immediately and destroy 
> any and all copies of this message in your possession (whether hard copies or 
> electronically stored copies).
> 


--
======================
Jean-Philippe Méthot
Administrateur système / System administrator GloboTech Communications
Phone: 1-514-907-0050
Toll Free: 1-(888)-GTCOMM1
Fax: 1-(514)-907-0750
jpmet...@gtcomm.net
http://www.gtcomm.net
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bad performances in recovery

Reply via email to