Re: [ceph-users] Bad performances in recovery

2015-08-21 Thread J-P Methot
Hi,

First of all, we are sure that the return to the default configuration
fixed it. As soon as we restarted only one of the ceph nodes with the
default configuration, it sped up recovery tremedously. We had already
restarted before with the old conf and recovery was never that fast.

Regarding the configuration, here's the old one with comments :

[global]
fsid = *
mon_initial_members = cephmon1
mon_host = ***
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true   //
  Let's you use xattributes of xfs/ext4/btrfs filesystems
osd_pool_default_pgp_num = 450   //
default pgp number for new pools
osd_pg_bits = 12  //
 number of bits used to designate pgps. Lets you have 2^12 pgps
osd_pool_default_size = 3   //
 default copy number for new pools
osd_pool_default_pg_num = 450//
default pg number for new pools
public_network = *
cluster_network = ***
osd_pgp_bits = 12   //
 number of bits used to designate pgps. Let's you have 2^12 pgps

[osd]
filestore_queue_max_ops = 5000// set to 500 by default Defines the
maximum number of in progress operations the file store accepts before
blocking on queuing new operations.
filestore_fd_cache_random = true//  
journal_queue_max_ops = 100   //   set
to 500 by default. Number of operations allowed in the journal queue
filestore_omap_header_cache_size = 100  //   Determines
the size of the LRU used to cache object omap headers. Larger values use
more memory but may reduce lookups on omap.
filestore_fd_cache_size = 100 //
not in the ceph documentation. Seems to be a common tweak for SSD
clusters though.
max_open_files = 100 //
  lets ceph set the max file descriptor in the OS to prevent running out
of file descriptors
osd_journal_size = 1   //
journal max size for each OSD

New conf:

[global]
fsid = *
mon_initial_members = cephmon1
mon_host = 
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = **
cluster_network = **

You might notice, I have a few undocumented settings in the old
configuration. These are settings I took from a certain openstack summit
presentation and they may have contributed to this whole problem. Here's
a list of settings that I think might be a possible cause for these
speed issues:

filestore_fd_cache_random = true
filestore_fd_cache_size = 100

Additionally, my colleague thinks these settings may have contributed :

filestore_queue_max_ops = 5000
journal_queue_max_ops = 100

We will do further tests on these settings once we have our lab ceph
test environment as we are also curious as to exactly what caused this.


On 2015-08-20 11:43 AM, Alex Gorbachev wrote:

 Just to update the mailing list, we ended up going back to default
 ceph.conf without any additional settings than what is mandatory. We are
 now reaching speeds we never reached before, both in recovery and in
 regular usage. There was definitely something we set in the ceph.conf
 bogging everything down.
 
 Could you please share the old and new ceph.conf, or the section that
 was removed?
 
 Best regards,
 Alex
 


 On 2015-08-20 4:06 AM, Christian Balzer wrote:

 Hello,

 from all the pertinent points by Somnath, the one about pre-conditioning
 would be pretty high on my list, especially if this slowness persists and
 nothing else (scrub) is going on.

 This might be fixed by doing a fstrim.

 Additionally the levelDB's per OSD are of course sync'ing heavily during
 reconstruction, so that might not be the favorite thing for your type of
 SSDs.

 But ultimately situational awareness is very important, as in what is
 actually going and slowing things down.
 As usual my recommendations would be to use atop, iostat or similar on all
 your nodes and see if your OSD SSDs are indeed the bottleneck or if it is
 maybe just one of them or something else entirely.

 Christian

 On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote:

 Also, check if scrubbing started in the cluster or not. That may
 considerably slow down the cluster.

 -Original Message-
 From: Somnath Roy
 Sent: Wednesday, August 19, 2015 1:35 PM
 To: 'J-P Methot'; ceph-us...@ceph.com
 Subject: RE: [ceph-users] Bad performances in recovery

 All the writes will go through the journal.
 It may happen your SSDs are not preconditioned well and after a lot of
 writes during recovery IOs are stabilized to lower number. This is quite
 common for SSDs

Re: [ceph-users] Bad performances in recovery

2015-08-21 Thread Shinobu Kinjo
 about
 pre-conditioning
  would be pretty high on my list, especially if this slowness persists
 and
  nothing else (scrub) is going on.
 
  This might be fixed by doing a fstrim.
 
  Additionally the levelDB's per OSD are of course sync'ing heavily
 during
  reconstruction, so that might not be the favorite thing for your type
 of
  SSDs.
 
  But ultimately situational awareness is very important, as in what
 is
  actually going and slowing things down.
  As usual my recommendations would be to use atop, iostat or similar
 on all
  your nodes and see if your OSD SSDs are indeed the bottleneck or if
 it is
  maybe just one of them or something else entirely.
 
  Christian
 
  On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote:
 
  Also, check if scrubbing started in the cluster or not. That may
  considerably slow down the cluster.
 
  -Original Message-
  From: Somnath Roy
  Sent: Wednesday, August 19, 2015 1:35 PM
  To: 'J-P Methot'; ceph-us...@ceph.com
  Subject: RE: [ceph-users] Bad performances in recovery
 
  All the writes will go through the journal.
  It may happen your SSDs are not preconditioned well and after a lot
 of
  writes during recovery IOs are stabilized to lower number. This is
 quite
  common for SSDs if that is the case.
 
  Thanks  Regards
  Somnath
 
  -Original Message-
  From: J-P Methot [mailto:jpmet...@gtcomm.net]
  Sent: Wednesday, August 19, 2015 1:03 PM
  To: Somnath Roy; ceph-us...@ceph.com
  Subject: Re: [ceph-users] Bad performances in recovery
 
  Hi,
 
  Thank you for the quick reply. However, we do have those exact
 settings
  for recovery and it still strongly affects client io. I have looked
 at
  various ceph logs and osd logs and nothing is out of the ordinary.
  Here's an idea though, please tell me if I am wrong.
 
  We use intel SSDs for journaling and samsung SSDs as proper OSDs. As
 was
  explained several times on this mailing list, Samsung SSDs suck in
 ceph.
  They have horrible O_dsync speed and die easily, when used as
 journal.
  That's why we're using Intel ssds for journaling, so that we didn't
 end
  up putting 96 samsung SSDs in the trash.
 
  In recovery though, what is the ceph behaviour? What kind of write
 does
  it do on the OSD SSDs? Does it write directly to the SSDs or through
 the
  journal?
 
  Additionally, something else we notice: the ceph cluster is MUCH
 slower
  after recovery than before. Clearly there is a bottleneck somewhere
 and
  that bottleneck does not get cleared up after the recovery is done.
 
 
  On 2015-08-19 3:32 PM, Somnath Roy wrote:
  If you are concerned about *client io performance* during recovery,
  use these settings..
 
  osd recovery max active = 1
  osd max backfills = 1
  osd recovery threads = 1
  osd recovery op priority = 1
 
  If you are concerned about *recovery performance*, you may want to
  bump this up, but I doubt it will help much from default settings..
 
  Thanks  Regards
  Somnath
 
  -Original Message-
  From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
 Behalf
  Of J-P Methot
  Sent: Wednesday, August 19, 2015 12:17 PM
  To: ceph-us...@ceph.com
  Subject: [ceph-users] Bad performances in recovery
 
  Hi,
 
  Our setup is currently comprised of 5 OSD nodes with 12 OSD each,
 for
  a total of 60 OSDs. All of these are SSDs with 4 SSD journals on
 each.
  The ceph version is hammer v0.94.1 . There is a performance overhead
  because we're using SSDs (I've heard it gets better in infernalis,
 but
  we're not upgrading just yet) but we can reach numbers that I would
  consider alright.
 
  Now, the issue is, when the cluster goes into recovery it's very
 fast
  at first, but then slows down to ridiculous levels as it moves
  forward. You can go from 7% to 2% to recover in ten minutes, but it
  may take 2 hours to recover the last 2%. While this happens, the
  attached openstack setup becomes incredibly slow, even though there
 is
  only a small fraction of objects still recovering (less than 1%).
 The
  settings that may affect recovery speed are very low, as they are by
  default, yet they still affect client io speed way more than it
 should.
 
  Why would ceph recovery become so slow as it progress and affect
  client io even though it's recovering at a snail's pace? And by a
  snail's pace, I mean a few kb/second on 10gbps uplinks. --
  == Jean-Philippe Méthot
  Administrateur système / System administrator GloboTech
 Communications
  Phone: 1-514-907-0050
  Toll Free: 1-(888)-GTCOMM1
  Fax: 1-(514)-907-0750
  jpmet...@gtcomm.net
  http://www.gtcomm.net
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
  
 
  PLEASE NOTE: The information contained in this electronic mail
 message
  is intended only for the use of the designated recipient(s) named
  above. If the reader of this message

Re: [ceph-users] Bad performances in recovery

2015-08-21 Thread Jan Schermer
 are of course sync'ing heavily during
 reconstruction, so that might not be the favorite thing for your type of
 SSDs.
 
 But ultimately situational awareness is very important, as in what is
 actually going and slowing things down.
 As usual my recommendations would be to use atop, iostat or similar on all
 your nodes and see if your OSD SSDs are indeed the bottleneck or if it is
 maybe just one of them or something else entirely.
 
 Christian
 
 On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote:
 
 Also, check if scrubbing started in the cluster or not. That may
 considerably slow down the cluster.
 
 -Original Message-
 From: Somnath Roy
 Sent: Wednesday, August 19, 2015 1:35 PM
 To: 'J-P Methot'; ceph-us...@ceph.com
 Subject: RE: [ceph-users] Bad performances in recovery
 
 All the writes will go through the journal.
 It may happen your SSDs are not preconditioned well and after a lot of
 writes during recovery IOs are stabilized to lower number. This is quite
 common for SSDs if that is the case.
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: J-P Methot [mailto:jpmet...@gtcomm.net]
 Sent: Wednesday, August 19, 2015 1:03 PM
 To: Somnath Roy; ceph-us...@ceph.com
 Subject: Re: [ceph-users] Bad performances in recovery
 
 Hi,
 
 Thank you for the quick reply. However, we do have those exact settings
 for recovery and it still strongly affects client io. I have looked at
 various ceph logs and osd logs and nothing is out of the ordinary.
 Here's an idea though, please tell me if I am wrong.
 
 We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was
 explained several times on this mailing list, Samsung SSDs suck in ceph.
 They have horrible O_dsync speed and die easily, when used as journal.
 That's why we're using Intel ssds for journaling, so that we didn't end
 up putting 96 samsung SSDs in the trash.
 
 In recovery though, what is the ceph behaviour? What kind of write does
 it do on the OSD SSDs? Does it write directly to the SSDs or through the
 journal?
 
 Additionally, something else we notice: the ceph cluster is MUCH slower
 after recovery than before. Clearly there is a bottleneck somewhere and
 that bottleneck does not get cleared up after the recovery is done.
 
 
 On 2015-08-19 3:32 PM, Somnath Roy wrote:
 If you are concerned about *client io performance* during recovery,
 use these settings..
 
 osd recovery max active = 1
 osd max backfills = 1
 osd recovery threads = 1
 osd recovery op priority = 1
 
 If you are concerned about *recovery performance*, you may want to
 bump this up, but I doubt it will help much from default settings..
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
 Of J-P Methot
 Sent: Wednesday, August 19, 2015 12:17 PM
 To: ceph-us...@ceph.com
 Subject: [ceph-users] Bad performances in recovery
 
 Hi,
 
 Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for
 a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each.
 The ceph version is hammer v0.94.1 . There is a performance overhead
 because we're using SSDs (I've heard it gets better in infernalis, but
 we're not upgrading just yet) but we can reach numbers that I would
 consider alright.
 
 Now, the issue is, when the cluster goes into recovery it's very fast
 at first, but then slows down to ridiculous levels as it moves
 forward. You can go from 7% to 2% to recover in ten minutes, but it
 may take 2 hours to recover the last 2%. While this happens, the
 attached openstack setup becomes incredibly slow, even though there is
 only a small fraction of objects still recovering (less than 1%). The
 settings that may affect recovery speed are very low, as they are by
 default, yet they still affect client io speed way more than it should.
 
 Why would ceph recovery become so slow as it progress and affect
 client io even though it's recovering at a snail's pace? And by a
 snail's pace, I mean a few kb/second on 10gbps uplinks. --
 == Jean-Philippe Méthot
 Administrateur système / System administrator GloboTech Communications
 Phone: 1-514-907-0050
 Toll Free: 1-(888)-GTCOMM1
 Fax: 1-(514)-907-0750
 jpmet...@gtcomm.net
 http://www.gtcomm.net
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 PLEASE NOTE: The information contained in this electronic mail message
 is intended only for the use of the designated recipient(s) named
 above. If the reader of this message is not the intended recipient,
 you are hereby notified that you have received this message in error
 and that any review, dissemination, distribution, or copying of this
 message is strictly prohibited. If you have received this
 communication in error, please notify the sender by telephone or
 e-mail (as shown above) immediately and destroy any

Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread Christian Balzer

Hello,

from all the pertinent points by Somnath, the one about pre-conditioning
would be pretty high on my list, especially if this slowness persists and
nothing else (scrub) is going on.

This might be fixed by doing a fstrim.

Additionally the levelDB's per OSD are of course sync'ing heavily during
reconstruction, so that might not be the favorite thing for your type of
SSDs.

But ultimately situational awareness is very important, as in what is
actually going and slowing things down. 
As usual my recommendations would be to use atop, iostat or similar on all
your nodes and see if your OSD SSDs are indeed the bottleneck or if it is
maybe just one of them or something else entirely.

Christian

On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote:

 Also, check if scrubbing started in the cluster or not. That may
 considerably slow down the cluster.
 
 -Original Message-
 From: Somnath Roy 
 Sent: Wednesday, August 19, 2015 1:35 PM
 To: 'J-P Methot'; ceph-us...@ceph.com
 Subject: RE: [ceph-users] Bad performances in recovery
 
 All the writes will go through the journal.
 It may happen your SSDs are not preconditioned well and after a lot of
 writes during recovery IOs are stabilized to lower number. This is quite
 common for SSDs if that is the case.
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: J-P Methot [mailto:jpmet...@gtcomm.net]
 Sent: Wednesday, August 19, 2015 1:03 PM
 To: Somnath Roy; ceph-us...@ceph.com
 Subject: Re: [ceph-users] Bad performances in recovery
 
 Hi,
 
 Thank you for the quick reply. However, we do have those exact settings
 for recovery and it still strongly affects client io. I have looked at
 various ceph logs and osd logs and nothing is out of the ordinary.
 Here's an idea though, please tell me if I am wrong.
 
 We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was
 explained several times on this mailing list, Samsung SSDs suck in ceph.
 They have horrible O_dsync speed and die easily, when used as journal.
 That's why we're using Intel ssds for journaling, so that we didn't end
 up putting 96 samsung SSDs in the trash.
 
 In recovery though, what is the ceph behaviour? What kind of write does
 it do on the OSD SSDs? Does it write directly to the SSDs or through the
 journal?
 
 Additionally, something else we notice: the ceph cluster is MUCH slower
 after recovery than before. Clearly there is a bottleneck somewhere and
 that bottleneck does not get cleared up after the recovery is done.
 
 
 On 2015-08-19 3:32 PM, Somnath Roy wrote:
  If you are concerned about *client io performance* during recovery,
  use these settings..
  
  osd recovery max active = 1
  osd max backfills = 1
  osd recovery threads = 1
  osd recovery op priority = 1
  
  If you are concerned about *recovery performance*, you may want to
  bump this up, but I doubt it will help much from default settings..
  
  Thanks  Regards
  Somnath
  
  -Original Message-
  From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
  Of J-P Methot
  Sent: Wednesday, August 19, 2015 12:17 PM
  To: ceph-us...@ceph.com
  Subject: [ceph-users] Bad performances in recovery
  
  Hi,
  
  Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for
  a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each.
  The ceph version is hammer v0.94.1 . There is a performance overhead
  because we're using SSDs (I've heard it gets better in infernalis, but
  we're not upgrading just yet) but we can reach numbers that I would
  consider alright.
  
  Now, the issue is, when the cluster goes into recovery it's very fast
  at first, but then slows down to ridiculous levels as it moves
  forward. You can go from 7% to 2% to recover in ten minutes, but it
  may take 2 hours to recover the last 2%. While this happens, the
  attached openstack setup becomes incredibly slow, even though there is
  only a small fraction of objects still recovering (less than 1%). The
  settings that may affect recovery speed are very low, as they are by
  default, yet they still affect client io speed way more than it should.
  
  Why would ceph recovery become so slow as it progress and affect
  client io even though it's recovering at a snail's pace? And by a
  snail's pace, I mean a few kb/second on 10gbps uplinks. --
  == Jean-Philippe Méthot
  Administrateur système / System administrator GloboTech Communications
  Phone: 1-514-907-0050
  Toll Free: 1-(888)-GTCOMM1
  Fax: 1-(514)-907-0750
  jpmet...@gtcomm.net
  http://www.gtcomm.net
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
  
  
  PLEASE NOTE: The information contained in this electronic mail message
  is intended only for the use of the designated recipient(s) named
  above. If the reader of this message is not the intended recipient

Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread J-P Methot
Hi,

Just to update the mailing list, we ended up going back to default
ceph.conf without any additional settings than what is mandatory. We are
now reaching speeds we never reached before, both in recovery and in
regular usage. There was definitely something we set in the ceph.conf
bogging everything down.


On 2015-08-20 4:06 AM, Christian Balzer wrote:
 
 Hello,
 
 from all the pertinent points by Somnath, the one about pre-conditioning
 would be pretty high on my list, especially if this slowness persists and
 nothing else (scrub) is going on.
 
 This might be fixed by doing a fstrim.
 
 Additionally the levelDB's per OSD are of course sync'ing heavily during
 reconstruction, so that might not be the favorite thing for your type of
 SSDs.
 
 But ultimately situational awareness is very important, as in what is
 actually going and slowing things down. 
 As usual my recommendations would be to use atop, iostat or similar on all
 your nodes and see if your OSD SSDs are indeed the bottleneck or if it is
 maybe just one of them or something else entirely.
 
 Christian
 
 On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote:
 
 Also, check if scrubbing started in the cluster or not. That may
 considerably slow down the cluster.

 -Original Message-
 From: Somnath Roy 
 Sent: Wednesday, August 19, 2015 1:35 PM
 To: 'J-P Methot'; ceph-us...@ceph.com
 Subject: RE: [ceph-users] Bad performances in recovery

 All the writes will go through the journal.
 It may happen your SSDs are not preconditioned well and after a lot of
 writes during recovery IOs are stabilized to lower number. This is quite
 common for SSDs if that is the case.

 Thanks  Regards
 Somnath

 -Original Message-
 From: J-P Methot [mailto:jpmet...@gtcomm.net]
 Sent: Wednesday, August 19, 2015 1:03 PM
 To: Somnath Roy; ceph-us...@ceph.com
 Subject: Re: [ceph-users] Bad performances in recovery

 Hi,

 Thank you for the quick reply. However, we do have those exact settings
 for recovery and it still strongly affects client io. I have looked at
 various ceph logs and osd logs and nothing is out of the ordinary.
 Here's an idea though, please tell me if I am wrong.

 We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was
 explained several times on this mailing list, Samsung SSDs suck in ceph.
 They have horrible O_dsync speed and die easily, when used as journal.
 That's why we're using Intel ssds for journaling, so that we didn't end
 up putting 96 samsung SSDs in the trash.

 In recovery though, what is the ceph behaviour? What kind of write does
 it do on the OSD SSDs? Does it write directly to the SSDs or through the
 journal?

 Additionally, something else we notice: the ceph cluster is MUCH slower
 after recovery than before. Clearly there is a bottleneck somewhere and
 that bottleneck does not get cleared up after the recovery is done.


 On 2015-08-19 3:32 PM, Somnath Roy wrote:
 If you are concerned about *client io performance* during recovery,
 use these settings..

 osd recovery max active = 1
 osd max backfills = 1
 osd recovery threads = 1
 osd recovery op priority = 1

 If you are concerned about *recovery performance*, you may want to
 bump this up, but I doubt it will help much from default settings..

 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of J-P Methot
 Sent: Wednesday, August 19, 2015 12:17 PM
 To: ceph-us...@ceph.com
 Subject: [ceph-users] Bad performances in recovery

 Hi,

 Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for
 a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each.
 The ceph version is hammer v0.94.1 . There is a performance overhead
 because we're using SSDs (I've heard it gets better in infernalis, but
 we're not upgrading just yet) but we can reach numbers that I would
 consider alright.

 Now, the issue is, when the cluster goes into recovery it's very fast
 at first, but then slows down to ridiculous levels as it moves
 forward. You can go from 7% to 2% to recover in ten minutes, but it
 may take 2 hours to recover the last 2%. While this happens, the
 attached openstack setup becomes incredibly slow, even though there is
 only a small fraction of objects still recovering (less than 1%). The
 settings that may affect recovery speed are very low, as they are by
 default, yet they still affect client io speed way more than it should.

 Why would ceph recovery become so slow as it progress and affect
 client io even though it's recovering at a snail's pace? And by a
 snail's pace, I mean a few kb/second on 10gbps uplinks. --
 == Jean-Philippe Méthot
 Administrateur système / System administrator GloboTech Communications
 Phone: 1-514-907-0050
 Toll Free: 1-(888)-GTCOMM1
 Fax: 1-(514)-907-0750
 jpmet...@gtcomm.net
 http://www.gtcomm.net
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com

Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread Alex Gorbachev

 Just to update the mailing list, we ended up going back to default
 ceph.conf without any additional settings than what is mandatory. We are
 now reaching speeds we never reached before, both in recovery and in
 regular usage. There was definitely something we set in the ceph.conf
 bogging everything down.

Could you please share the old and new ceph.conf, or the section that
was removed?

Best regards,
Alex



 On 2015-08-20 4:06 AM, Christian Balzer wrote:

 Hello,

 from all the pertinent points by Somnath, the one about pre-conditioning
 would be pretty high on my list, especially if this slowness persists and
 nothing else (scrub) is going on.

 This might be fixed by doing a fstrim.

 Additionally the levelDB's per OSD are of course sync'ing heavily during
 reconstruction, so that might not be the favorite thing for your type of
 SSDs.

 But ultimately situational awareness is very important, as in what is
 actually going and slowing things down.
 As usual my recommendations would be to use atop, iostat or similar on all
 your nodes and see if your OSD SSDs are indeed the bottleneck or if it is
 maybe just one of them or something else entirely.

 Christian

 On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote:

 Also, check if scrubbing started in the cluster or not. That may
 considerably slow down the cluster.

 -Original Message-
 From: Somnath Roy
 Sent: Wednesday, August 19, 2015 1:35 PM
 To: 'J-P Methot'; ceph-us...@ceph.com
 Subject: RE: [ceph-users] Bad performances in recovery

 All the writes will go through the journal.
 It may happen your SSDs are not preconditioned well and after a lot of
 writes during recovery IOs are stabilized to lower number. This is quite
 common for SSDs if that is the case.

 Thanks  Regards
 Somnath

 -Original Message-
 From: J-P Methot [mailto:jpmet...@gtcomm.net]
 Sent: Wednesday, August 19, 2015 1:03 PM
 To: Somnath Roy; ceph-us...@ceph.com
 Subject: Re: [ceph-users] Bad performances in recovery

 Hi,

 Thank you for the quick reply. However, we do have those exact settings
 for recovery and it still strongly affects client io. I have looked at
 various ceph logs and osd logs and nothing is out of the ordinary.
 Here's an idea though, please tell me if I am wrong.

 We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was
 explained several times on this mailing list, Samsung SSDs suck in ceph.
 They have horrible O_dsync speed and die easily, when used as journal.
 That's why we're using Intel ssds for journaling, so that we didn't end
 up putting 96 samsung SSDs in the trash.

 In recovery though, what is the ceph behaviour? What kind of write does
 it do on the OSD SSDs? Does it write directly to the SSDs or through the
 journal?

 Additionally, something else we notice: the ceph cluster is MUCH slower
 after recovery than before. Clearly there is a bottleneck somewhere and
 that bottleneck does not get cleared up after the recovery is done.


 On 2015-08-19 3:32 PM, Somnath Roy wrote:
 If you are concerned about *client io performance* during recovery,
 use these settings..

 osd recovery max active = 1
 osd max backfills = 1
 osd recovery threads = 1
 osd recovery op priority = 1

 If you are concerned about *recovery performance*, you may want to
 bump this up, but I doubt it will help much from default settings..

 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
 Of J-P Methot
 Sent: Wednesday, August 19, 2015 12:17 PM
 To: ceph-us...@ceph.com
 Subject: [ceph-users] Bad performances in recovery

 Hi,

 Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for
 a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each.
 The ceph version is hammer v0.94.1 . There is a performance overhead
 because we're using SSDs (I've heard it gets better in infernalis, but
 we're not upgrading just yet) but we can reach numbers that I would
 consider alright.

 Now, the issue is, when the cluster goes into recovery it's very fast
 at first, but then slows down to ridiculous levels as it moves
 forward. You can go from 7% to 2% to recover in ten minutes, but it
 may take 2 hours to recover the last 2%. While this happens, the
 attached openstack setup becomes incredibly slow, even though there is
 only a small fraction of objects still recovering (less than 1%). The
 settings that may affect recovery speed are very low, as they are by
 default, yet they still affect client io speed way more than it should.

 Why would ceph recovery become so slow as it progress and affect
 client io even though it's recovering at a snail's pace? And by a
 snail's pace, I mean a few kb/second on 10gbps uplinks. --
 == Jean-Philippe Méthot
 Administrateur système / System administrator GloboTech Communications
 Phone: 1-514-907-0050
 Toll Free: 1-(888)-GTCOMM1
 Fax: 1-(514)-907-0750
 jpmet...@gtcomm.net
 http://www.gtcomm.net

Re: [ceph-users] Bad performances in recovery

2015-08-20 Thread Jan Schermer
Are you sure it was because of configuration changes?
Maybe it was restarting the OSDs that fixed it?
We often hit an issue with backfill_toofull where the recovery/backfill 
processes get stuck until we restart the daemons (sometimes setting 
recovery_max_active helps as well). It still shows recovery of few objects now 
and then (few KB/s) and then stops completely.

Jan

 On 20 Aug 2015, at 17:43, Alex Gorbachev a...@iss-integration.com wrote:
 
 
 Just to update the mailing list, we ended up going back to default
 ceph.conf without any additional settings than what is mandatory. We are
 now reaching speeds we never reached before, both in recovery and in
 regular usage. There was definitely something we set in the ceph.conf
 bogging everything down.
 
 Could you please share the old and new ceph.conf, or the section that
 was removed?
 
 Best regards,
 Alex
 
 
 
 On 2015-08-20 4:06 AM, Christian Balzer wrote:
 
 Hello,
 
 from all the pertinent points by Somnath, the one about pre-conditioning
 would be pretty high on my list, especially if this slowness persists and
 nothing else (scrub) is going on.
 
 This might be fixed by doing a fstrim.
 
 Additionally the levelDB's per OSD are of course sync'ing heavily during
 reconstruction, so that might not be the favorite thing for your type of
 SSDs.
 
 But ultimately situational awareness is very important, as in what is
 actually going and slowing things down.
 As usual my recommendations would be to use atop, iostat or similar on all
 your nodes and see if your OSD SSDs are indeed the bottleneck or if it is
 maybe just one of them or something else entirely.
 
 Christian
 
 On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote:
 
 Also, check if scrubbing started in the cluster or not. That may
 considerably slow down the cluster.
 
 -Original Message-
 From: Somnath Roy
 Sent: Wednesday, August 19, 2015 1:35 PM
 To: 'J-P Methot'; ceph-us...@ceph.com
 Subject: RE: [ceph-users] Bad performances in recovery
 
 All the writes will go through the journal.
 It may happen your SSDs are not preconditioned well and after a lot of
 writes during recovery IOs are stabilized to lower number. This is quite
 common for SSDs if that is the case.
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: J-P Methot [mailto:jpmet...@gtcomm.net]
 Sent: Wednesday, August 19, 2015 1:03 PM
 To: Somnath Roy; ceph-us...@ceph.com
 Subject: Re: [ceph-users] Bad performances in recovery
 
 Hi,
 
 Thank you for the quick reply. However, we do have those exact settings
 for recovery and it still strongly affects client io. I have looked at
 various ceph logs and osd logs and nothing is out of the ordinary.
 Here's an idea though, please tell me if I am wrong.
 
 We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was
 explained several times on this mailing list, Samsung SSDs suck in ceph.
 They have horrible O_dsync speed and die easily, when used as journal.
 That's why we're using Intel ssds for journaling, so that we didn't end
 up putting 96 samsung SSDs in the trash.
 
 In recovery though, what is the ceph behaviour? What kind of write does
 it do on the OSD SSDs? Does it write directly to the SSDs or through the
 journal?
 
 Additionally, something else we notice: the ceph cluster is MUCH slower
 after recovery than before. Clearly there is a bottleneck somewhere and
 that bottleneck does not get cleared up after the recovery is done.
 
 
 On 2015-08-19 3:32 PM, Somnath Roy wrote:
 If you are concerned about *client io performance* during recovery,
 use these settings..
 
 osd recovery max active = 1
 osd max backfills = 1
 osd recovery threads = 1
 osd recovery op priority = 1
 
 If you are concerned about *recovery performance*, you may want to
 bump this up, but I doubt it will help much from default settings..
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
 Of J-P Methot
 Sent: Wednesday, August 19, 2015 12:17 PM
 To: ceph-us...@ceph.com
 Subject: [ceph-users] Bad performances in recovery
 
 Hi,
 
 Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for
 a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each.
 The ceph version is hammer v0.94.1 . There is a performance overhead
 because we're using SSDs (I've heard it gets better in infernalis, but
 we're not upgrading just yet) but we can reach numbers that I would
 consider alright.
 
 Now, the issue is, when the cluster goes into recovery it's very fast
 at first, but then slows down to ridiculous levels as it moves
 forward. You can go from 7% to 2% to recover in ten minutes, but it
 may take 2 hours to recover the last 2%. While this happens, the
 attached openstack setup becomes incredibly slow, even though there is
 only a small fraction of objects still recovering (less than 1%). The
 settings that may affect recovery speed are very low, as they are by
 default, yet

Re: [ceph-users] Bad performances in recovery

2015-08-19 Thread J-P Methot
Hi,

Thank you for the quick reply. However, we do have those exact settings
for recovery and it still strongly affects client io. I have looked at
various ceph logs and osd logs and nothing is out of the ordinary.
Here's an idea though, please tell me if I am wrong.

We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was
explained several times on this mailing list, Samsung SSDs suck in ceph.
They have horrible O_dsync speed and die easily, when used as journal.
That's why we're using Intel ssds for journaling, so that we didn't end
up putting 96 samsung SSDs in the trash.

In recovery though, what is the ceph behaviour? What kind of write does
it do on the OSD SSDs? Does it write directly to the SSDs or through the
journal?

Additionally, something else we notice: the ceph cluster is MUCH slower
after recovery than before. Clearly there is a bottleneck somewhere and
that bottleneck does not get cleared up after the recovery is done.


On 2015-08-19 3:32 PM, Somnath Roy wrote:
 If you are concerned about *client io performance* during recovery, use these 
 settings..
 
 osd recovery max active = 1
 osd max backfills = 1
 osd recovery threads = 1
 osd recovery op priority = 1
 
 If you are concerned about *recovery performance*, you may want to bump this 
 up, but I doubt it will help much from default settings..
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P 
 Methot
 Sent: Wednesday, August 19, 2015 12:17 PM
 To: ceph-us...@ceph.com
 Subject: [ceph-users] Bad performances in recovery
 
 Hi,
 
 Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total 
 of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph 
 version is hammer v0.94.1 . There is a performance overhead because we're 
 using SSDs (I've heard it gets better in infernalis, but we're not upgrading 
 just yet) but we can reach numbers that I would consider alright.
 
 Now, the issue is, when the cluster goes into recovery it's very fast at 
 first, but then slows down to ridiculous levels as it moves forward. You can 
 go from 7% to 2% to recover in ten minutes, but it may take 2 hours to 
 recover the last 2%. While this happens, the attached openstack setup becomes 
 incredibly slow, even though there is only a small fraction of objects still 
 recovering (less than 1%). The settings that may affect recovery speed are 
 very low, as they are by default, yet they still affect client io speed way 
 more than it should.
 
 Why would ceph recovery become so slow as it progress and affect client io 
 even though it's recovering at a snail's pace? And by a snail's pace, I mean 
 a few kb/second on 10gbps uplinks.
 --
 ==
 Jean-Philippe Méthot
 Administrateur système / System administrator GloboTech Communications
 Phone: 1-514-907-0050
 Toll Free: 1-(888)-GTCOMM1
 Fax: 1-(514)-907-0750
 jpmet...@gtcomm.net
 http://www.gtcomm.net
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies or 
 electronically stored copies).
 


-- 
==
Jean-Philippe Méthot
Administrateur système / System administrator
GloboTech Communications
Phone: 1-514-907-0050
Toll Free: 1-(888)-GTCOMM1
Fax: 1-(514)-907-0750
jpmet...@gtcomm.net
http://www.gtcomm.net
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bad performances in recovery

2015-08-19 Thread Somnath Roy
All the writes will go through the journal.
It may happen your SSDs are not preconditioned well and after a lot of writes 
during recovery IOs are stabilized to lower number. This is quite common for 
SSDs if that is the case.

Thanks  Regards
Somnath

-Original Message-
From: J-P Methot [mailto:jpmet...@gtcomm.net] 
Sent: Wednesday, August 19, 2015 1:03 PM
To: Somnath Roy; ceph-us...@ceph.com
Subject: Re: [ceph-users] Bad performances in recovery

Hi,

Thank you for the quick reply. However, we do have those exact settings for 
recovery and it still strongly affects client io. I have looked at various ceph 
logs and osd logs and nothing is out of the ordinary.
Here's an idea though, please tell me if I am wrong.

We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was 
explained several times on this mailing list, Samsung SSDs suck in ceph.
They have horrible O_dsync speed and die easily, when used as journal.
That's why we're using Intel ssds for journaling, so that we didn't end up 
putting 96 samsung SSDs in the trash.

In recovery though, what is the ceph behaviour? What kind of write does it do 
on the OSD SSDs? Does it write directly to the SSDs or through the journal?

Additionally, something else we notice: the ceph cluster is MUCH slower after 
recovery than before. Clearly there is a bottleneck somewhere and that 
bottleneck does not get cleared up after the recovery is done.


On 2015-08-19 3:32 PM, Somnath Roy wrote:
 If you are concerned about *client io performance* during recovery, use these 
 settings..
 
 osd recovery max active = 1
 osd max backfills = 1
 osd recovery threads = 1
 osd recovery op priority = 1
 
 If you are concerned about *recovery performance*, you may want to bump this 
 up, but I doubt it will help much from default settings..
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of J-P Methot
 Sent: Wednesday, August 19, 2015 12:17 PM
 To: ceph-us...@ceph.com
 Subject: [ceph-users] Bad performances in recovery
 
 Hi,
 
 Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total 
 of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph 
 version is hammer v0.94.1 . There is a performance overhead because we're 
 using SSDs (I've heard it gets better in infernalis, but we're not upgrading 
 just yet) but we can reach numbers that I would consider alright.
 
 Now, the issue is, when the cluster goes into recovery it's very fast at 
 first, but then slows down to ridiculous levels as it moves forward. You can 
 go from 7% to 2% to recover in ten minutes, but it may take 2 hours to 
 recover the last 2%. While this happens, the attached openstack setup becomes 
 incredibly slow, even though there is only a small fraction of objects still 
 recovering (less than 1%). The settings that may affect recovery speed are 
 very low, as they are by default, yet they still affect client io speed way 
 more than it should.
 
 Why would ceph recovery become so slow as it progress and affect client io 
 even though it's recovering at a snail's pace? And by a snail's pace, I mean 
 a few kb/second on 10gbps uplinks.
 --
 ==
 Jean-Philippe Méthot
 Administrateur système / System administrator GloboTech Communications
 Phone: 1-514-907-0050
 Toll Free: 1-(888)-GTCOMM1
 Fax: 1-(514)-907-0750
 jpmet...@gtcomm.net
 http://www.gtcomm.net
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies or 
 electronically stored copies).
 


--
==
Jean-Philippe Méthot
Administrateur système / System administrator GloboTech Communications
Phone: 1-514-907-0050
Toll Free: 1-(888)-GTCOMM1
Fax: 1-(514)-907-0750
jpmet...@gtcomm.net
http://www.gtcomm.net
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bad performances in recovery

2015-08-19 Thread J-P Methot
Hi,

Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a
total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The
ceph version is hammer v0.94.1 . There is a performance overhead because
we're using SSDs (I've heard it gets better in infernalis, but we're not
upgrading just yet) but we can reach numbers that I would consider
alright.

Now, the issue is, when the cluster goes into recovery it's very fast at
first, but then slows down to ridiculous levels as it moves forward. You
can go from 7% to 2% to recover in ten minutes, but it may take 2 hours
to recover the last 2%. While this happens, the attached openstack setup
becomes incredibly slow, even though there is only a small fraction of
objects still recovering (less than 1%). The settings that may affect
recovery speed are very low, as they are by default, yet they still
affect client io speed way more than it should.

Why would ceph recovery become so slow as it progress and affect client
io even though it's recovering at a snail's pace? And by a snail's pace,
I mean a few kb/second on 10gbps uplinks.
-- 
==
Jean-Philippe Méthot
Administrateur système / System administrator
GloboTech Communications
Phone: 1-514-907-0050
Toll Free: 1-(888)-GTCOMM1
Fax: 1-(514)-907-0750
jpmet...@gtcomm.net
http://www.gtcomm.net
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bad performances in recovery

2015-08-19 Thread Somnath Roy
If you are concerned about *client io performance* during recovery, use these 
settings..

osd recovery max active = 1
osd max backfills = 1
osd recovery threads = 1
osd recovery op priority = 1

If you are concerned about *recovery performance*, you may want to bump this 
up, but I doubt it will help much from default settings..

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P 
Methot
Sent: Wednesday, August 19, 2015 12:17 PM
To: ceph-us...@ceph.com
Subject: [ceph-users] Bad performances in recovery

Hi,

Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total 
of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version 
is hammer v0.94.1 . There is a performance overhead because we're using SSDs 
(I've heard it gets better in infernalis, but we're not upgrading just yet) but 
we can reach numbers that I would consider alright.

Now, the issue is, when the cluster goes into recovery it's very fast at first, 
but then slows down to ridiculous levels as it moves forward. You can go from 
7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 
2%. While this happens, the attached openstack setup becomes incredibly slow, 
even though there is only a small fraction of objects still recovering (less 
than 1%). The settings that may affect recovery speed are very low, as they are 
by default, yet they still affect client io speed way more than it should.

Why would ceph recovery become so slow as it progress and affect client io even 
though it's recovering at a snail's pace? And by a snail's pace, I mean a few 
kb/second on 10gbps uplinks.
--
==
Jean-Philippe Méthot
Administrateur système / System administrator GloboTech Communications
Phone: 1-514-907-0050
Toll Free: 1-(888)-GTCOMM1
Fax: 1-(514)-907-0750
jpmet...@gtcomm.net
http://www.gtcomm.net
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bad performances in recovery

2015-08-19 Thread Somnath Roy
Also, check if scrubbing started in the cluster or not. That may considerably 
slow down the cluster.

-Original Message-
From: Somnath Roy 
Sent: Wednesday, August 19, 2015 1:35 PM
To: 'J-P Methot'; ceph-us...@ceph.com
Subject: RE: [ceph-users] Bad performances in recovery

All the writes will go through the journal.
It may happen your SSDs are not preconditioned well and after a lot of writes 
during recovery IOs are stabilized to lower number. This is quite common for 
SSDs if that is the case.

Thanks  Regards
Somnath

-Original Message-
From: J-P Methot [mailto:jpmet...@gtcomm.net]
Sent: Wednesday, August 19, 2015 1:03 PM
To: Somnath Roy; ceph-us...@ceph.com
Subject: Re: [ceph-users] Bad performances in recovery

Hi,

Thank you for the quick reply. However, we do have those exact settings for 
recovery and it still strongly affects client io. I have looked at various ceph 
logs and osd logs and nothing is out of the ordinary.
Here's an idea though, please tell me if I am wrong.

We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was 
explained several times on this mailing list, Samsung SSDs suck in ceph.
They have horrible O_dsync speed and die easily, when used as journal.
That's why we're using Intel ssds for journaling, so that we didn't end up 
putting 96 samsung SSDs in the trash.

In recovery though, what is the ceph behaviour? What kind of write does it do 
on the OSD SSDs? Does it write directly to the SSDs or through the journal?

Additionally, something else we notice: the ceph cluster is MUCH slower after 
recovery than before. Clearly there is a bottleneck somewhere and that 
bottleneck does not get cleared up after the recovery is done.


On 2015-08-19 3:32 PM, Somnath Roy wrote:
 If you are concerned about *client io performance* during recovery, use these 
 settings..
 
 osd recovery max active = 1
 osd max backfills = 1
 osd recovery threads = 1
 osd recovery op priority = 1
 
 If you are concerned about *recovery performance*, you may want to bump this 
 up, but I doubt it will help much from default settings..
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of J-P Methot
 Sent: Wednesday, August 19, 2015 12:17 PM
 To: ceph-us...@ceph.com
 Subject: [ceph-users] Bad performances in recovery
 
 Hi,
 
 Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total 
 of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph 
 version is hammer v0.94.1 . There is a performance overhead because we're 
 using SSDs (I've heard it gets better in infernalis, but we're not upgrading 
 just yet) but we can reach numbers that I would consider alright.
 
 Now, the issue is, when the cluster goes into recovery it's very fast at 
 first, but then slows down to ridiculous levels as it moves forward. You can 
 go from 7% to 2% to recover in ten minutes, but it may take 2 hours to 
 recover the last 2%. While this happens, the attached openstack setup becomes 
 incredibly slow, even though there is only a small fraction of objects still 
 recovering (less than 1%). The settings that may affect recovery speed are 
 very low, as they are by default, yet they still affect client io speed way 
 more than it should.
 
 Why would ceph recovery become so slow as it progress and affect client io 
 even though it's recovering at a snail's pace? And by a snail's pace, I mean 
 a few kb/second on 10gbps uplinks.
 --
 ==
 Jean-Philippe Méthot
 Administrateur système / System administrator GloboTech Communications
 Phone: 1-514-907-0050
 Toll Free: 1-(888)-GTCOMM1
 Fax: 1-(514)-907-0750
 jpmet...@gtcomm.net
 http://www.gtcomm.net
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies or 
 electronically stored copies).
 


--
==
Jean-Philippe Méthot
Administrateur système / System administrator GloboTech Communications
Phone: 1-514-907-0050
Toll Free: 1-(888)-GTCOMM1
Fax: 1-(514)-907-0750
jpmet...@gtcomm.net
http://www.gtcomm.net
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com