Re: [ceph-users] Bad performances in recovery
Hi, First of all, we are sure that the return to the default configuration fixed it. As soon as we restarted only one of the ceph nodes with the default configuration, it sped up recovery tremedously. We had already restarted before with the old conf and recovery was never that fast. Regarding the configuration, here's the old one with comments : [global] fsid = * mon_initial_members = cephmon1 mon_host = *** auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true // Let's you use xattributes of xfs/ext4/btrfs filesystems osd_pool_default_pgp_num = 450 // default pgp number for new pools osd_pg_bits = 12 // number of bits used to designate pgps. Lets you have 2^12 pgps osd_pool_default_size = 3 // default copy number for new pools osd_pool_default_pg_num = 450// default pg number for new pools public_network = * cluster_network = *** osd_pgp_bits = 12 // number of bits used to designate pgps. Let's you have 2^12 pgps [osd] filestore_queue_max_ops = 5000// set to 500 by default Defines the maximum number of in progress operations the file store accepts before blocking on queuing new operations. filestore_fd_cache_random = true// journal_queue_max_ops = 100 // set to 500 by default. Number of operations allowed in the journal queue filestore_omap_header_cache_size = 100 // Determines the size of the LRU used to cache object omap headers. Larger values use more memory but may reduce lookups on omap. filestore_fd_cache_size = 100 // not in the ceph documentation. Seems to be a common tweak for SSD clusters though. max_open_files = 100 // lets ceph set the max file descriptor in the OS to prevent running out of file descriptors osd_journal_size = 1 // journal max size for each OSD New conf: [global] fsid = * mon_initial_members = cephmon1 mon_host = auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public_network = ** cluster_network = ** You might notice, I have a few undocumented settings in the old configuration. These are settings I took from a certain openstack summit presentation and they may have contributed to this whole problem. Here's a list of settings that I think might be a possible cause for these speed issues: filestore_fd_cache_random = true filestore_fd_cache_size = 100 Additionally, my colleague thinks these settings may have contributed : filestore_queue_max_ops = 5000 journal_queue_max_ops = 100 We will do further tests on these settings once we have our lab ceph test environment as we are also curious as to exactly what caused this. On 2015-08-20 11:43 AM, Alex Gorbachev wrote: Just to update the mailing list, we ended up going back to default ceph.conf without any additional settings than what is mandatory. We are now reaching speeds we never reached before, both in recovery and in regular usage. There was definitely something we set in the ceph.conf bogging everything down. Could you please share the old and new ceph.conf, or the section that was removed? Best regards, Alex On 2015-08-20 4:06 AM, Christian Balzer wrote: Hello, from all the pertinent points by Somnath, the one about pre-conditioning would be pretty high on my list, especially if this slowness persists and nothing else (scrub) is going on. This might be fixed by doing a fstrim. Additionally the levelDB's per OSD are of course sync'ing heavily during reconstruction, so that might not be the favorite thing for your type of SSDs. But ultimately situational awareness is very important, as in what is actually going and slowing things down. As usual my recommendations would be to use atop, iostat or similar on all your nodes and see if your OSD SSDs are indeed the bottleneck or if it is maybe just one of them or something else entirely. Christian On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote: Also, check if scrubbing started in the cluster or not. That may considerably slow down the cluster. -Original Message- From: Somnath Roy Sent: Wednesday, August 19, 2015 1:35 PM To: 'J-P Methot'; ceph-us...@ceph.com Subject: RE: [ceph-users] Bad performances in recovery All the writes will go through the journal. It may happen your SSDs are not preconditioned well and after a lot of writes during recovery IOs are stabilized to lower number. This is quite common for SSDs
Re: [ceph-users] Bad performances in recovery
about pre-conditioning would be pretty high on my list, especially if this slowness persists and nothing else (scrub) is going on. This might be fixed by doing a fstrim. Additionally the levelDB's per OSD are of course sync'ing heavily during reconstruction, so that might not be the favorite thing for your type of SSDs. But ultimately situational awareness is very important, as in what is actually going and slowing things down. As usual my recommendations would be to use atop, iostat or similar on all your nodes and see if your OSD SSDs are indeed the bottleneck or if it is maybe just one of them or something else entirely. Christian On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote: Also, check if scrubbing started in the cluster or not. That may considerably slow down the cluster. -Original Message- From: Somnath Roy Sent: Wednesday, August 19, 2015 1:35 PM To: 'J-P Methot'; ceph-us...@ceph.com Subject: RE: [ceph-users] Bad performances in recovery All the writes will go through the journal. It may happen your SSDs are not preconditioned well and after a lot of writes during recovery IOs are stabilized to lower number. This is quite common for SSDs if that is the case. Thanks Regards Somnath -Original Message- From: J-P Methot [mailto:jpmet...@gtcomm.net] Sent: Wednesday, August 19, 2015 1:03 PM To: Somnath Roy; ceph-us...@ceph.com Subject: Re: [ceph-users] Bad performances in recovery Hi, Thank you for the quick reply. However, we do have those exact settings for recovery and it still strongly affects client io. I have looked at various ceph logs and osd logs and nothing is out of the ordinary. Here's an idea though, please tell me if I am wrong. We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was explained several times on this mailing list, Samsung SSDs suck in ceph. They have horrible O_dsync speed and die easily, when used as journal. That's why we're using Intel ssds for journaling, so that we didn't end up putting 96 samsung SSDs in the trash. In recovery though, what is the ceph behaviour? What kind of write does it do on the OSD SSDs? Does it write directly to the SSDs or through the journal? Additionally, something else we notice: the ceph cluster is MUCH slower after recovery than before. Clearly there is a bottleneck somewhere and that bottleneck does not get cleared up after the recovery is done. On 2015-08-19 3:32 PM, Somnath Roy wrote: If you are concerned about *client io performance* during recovery, use these settings.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 If you are concerned about *recovery performance*, you may want to bump this up, but I doubt it will help much from default settings.. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P Methot Sent: Wednesday, August 19, 2015 12:17 PM To: ceph-us...@ceph.com Subject: [ceph-users] Bad performances in recovery Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider alright. Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet they still affect client io speed way more than it should. Why would ceph recovery become so slow as it progress and affect client io even though it's recovering at a snail's pace? And by a snail's pace, I mean a few kb/second on 10gbps uplinks. -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message
Re: [ceph-users] Bad performances in recovery
are of course sync'ing heavily during reconstruction, so that might not be the favorite thing for your type of SSDs. But ultimately situational awareness is very important, as in what is actually going and slowing things down. As usual my recommendations would be to use atop, iostat or similar on all your nodes and see if your OSD SSDs are indeed the bottleneck or if it is maybe just one of them or something else entirely. Christian On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote: Also, check if scrubbing started in the cluster or not. That may considerably slow down the cluster. -Original Message- From: Somnath Roy Sent: Wednesday, August 19, 2015 1:35 PM To: 'J-P Methot'; ceph-us...@ceph.com Subject: RE: [ceph-users] Bad performances in recovery All the writes will go through the journal. It may happen your SSDs are not preconditioned well and after a lot of writes during recovery IOs are stabilized to lower number. This is quite common for SSDs if that is the case. Thanks Regards Somnath -Original Message- From: J-P Methot [mailto:jpmet...@gtcomm.net] Sent: Wednesday, August 19, 2015 1:03 PM To: Somnath Roy; ceph-us...@ceph.com Subject: Re: [ceph-users] Bad performances in recovery Hi, Thank you for the quick reply. However, we do have those exact settings for recovery and it still strongly affects client io. I have looked at various ceph logs and osd logs and nothing is out of the ordinary. Here's an idea though, please tell me if I am wrong. We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was explained several times on this mailing list, Samsung SSDs suck in ceph. They have horrible O_dsync speed and die easily, when used as journal. That's why we're using Intel ssds for journaling, so that we didn't end up putting 96 samsung SSDs in the trash. In recovery though, what is the ceph behaviour? What kind of write does it do on the OSD SSDs? Does it write directly to the SSDs or through the journal? Additionally, something else we notice: the ceph cluster is MUCH slower after recovery than before. Clearly there is a bottleneck somewhere and that bottleneck does not get cleared up after the recovery is done. On 2015-08-19 3:32 PM, Somnath Roy wrote: If you are concerned about *client io performance* during recovery, use these settings.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 If you are concerned about *recovery performance*, you may want to bump this up, but I doubt it will help much from default settings.. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P Methot Sent: Wednesday, August 19, 2015 12:17 PM To: ceph-us...@ceph.com Subject: [ceph-users] Bad performances in recovery Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider alright. Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet they still affect client io speed way more than it should. Why would ceph recovery become so slow as it progress and affect client io even though it's recovering at a snail's pace? And by a snail's pace, I mean a few kb/second on 10gbps uplinks. -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any
Re: [ceph-users] Bad performances in recovery
Hello, from all the pertinent points by Somnath, the one about pre-conditioning would be pretty high on my list, especially if this slowness persists and nothing else (scrub) is going on. This might be fixed by doing a fstrim. Additionally the levelDB's per OSD are of course sync'ing heavily during reconstruction, so that might not be the favorite thing for your type of SSDs. But ultimately situational awareness is very important, as in what is actually going and slowing things down. As usual my recommendations would be to use atop, iostat or similar on all your nodes and see if your OSD SSDs are indeed the bottleneck or if it is maybe just one of them or something else entirely. Christian On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote: Also, check if scrubbing started in the cluster or not. That may considerably slow down the cluster. -Original Message- From: Somnath Roy Sent: Wednesday, August 19, 2015 1:35 PM To: 'J-P Methot'; ceph-us...@ceph.com Subject: RE: [ceph-users] Bad performances in recovery All the writes will go through the journal. It may happen your SSDs are not preconditioned well and after a lot of writes during recovery IOs are stabilized to lower number. This is quite common for SSDs if that is the case. Thanks Regards Somnath -Original Message- From: J-P Methot [mailto:jpmet...@gtcomm.net] Sent: Wednesday, August 19, 2015 1:03 PM To: Somnath Roy; ceph-us...@ceph.com Subject: Re: [ceph-users] Bad performances in recovery Hi, Thank you for the quick reply. However, we do have those exact settings for recovery and it still strongly affects client io. I have looked at various ceph logs and osd logs and nothing is out of the ordinary. Here's an idea though, please tell me if I am wrong. We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was explained several times on this mailing list, Samsung SSDs suck in ceph. They have horrible O_dsync speed and die easily, when used as journal. That's why we're using Intel ssds for journaling, so that we didn't end up putting 96 samsung SSDs in the trash. In recovery though, what is the ceph behaviour? What kind of write does it do on the OSD SSDs? Does it write directly to the SSDs or through the journal? Additionally, something else we notice: the ceph cluster is MUCH slower after recovery than before. Clearly there is a bottleneck somewhere and that bottleneck does not get cleared up after the recovery is done. On 2015-08-19 3:32 PM, Somnath Roy wrote: If you are concerned about *client io performance* during recovery, use these settings.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 If you are concerned about *recovery performance*, you may want to bump this up, but I doubt it will help much from default settings.. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P Methot Sent: Wednesday, August 19, 2015 12:17 PM To: ceph-us...@ceph.com Subject: [ceph-users] Bad performances in recovery Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider alright. Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet they still affect client io speed way more than it should. Why would ceph recovery become so slow as it progress and affect client io even though it's recovering at a snail's pace? And by a snail's pace, I mean a few kb/second on 10gbps uplinks. -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient
Re: [ceph-users] Bad performances in recovery
Hi, Just to update the mailing list, we ended up going back to default ceph.conf without any additional settings than what is mandatory. We are now reaching speeds we never reached before, both in recovery and in regular usage. There was definitely something we set in the ceph.conf bogging everything down. On 2015-08-20 4:06 AM, Christian Balzer wrote: Hello, from all the pertinent points by Somnath, the one about pre-conditioning would be pretty high on my list, especially if this slowness persists and nothing else (scrub) is going on. This might be fixed by doing a fstrim. Additionally the levelDB's per OSD are of course sync'ing heavily during reconstruction, so that might not be the favorite thing for your type of SSDs. But ultimately situational awareness is very important, as in what is actually going and slowing things down. As usual my recommendations would be to use atop, iostat or similar on all your nodes and see if your OSD SSDs are indeed the bottleneck or if it is maybe just one of them or something else entirely. Christian On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote: Also, check if scrubbing started in the cluster or not. That may considerably slow down the cluster. -Original Message- From: Somnath Roy Sent: Wednesday, August 19, 2015 1:35 PM To: 'J-P Methot'; ceph-us...@ceph.com Subject: RE: [ceph-users] Bad performances in recovery All the writes will go through the journal. It may happen your SSDs are not preconditioned well and after a lot of writes during recovery IOs are stabilized to lower number. This is quite common for SSDs if that is the case. Thanks Regards Somnath -Original Message- From: J-P Methot [mailto:jpmet...@gtcomm.net] Sent: Wednesday, August 19, 2015 1:03 PM To: Somnath Roy; ceph-us...@ceph.com Subject: Re: [ceph-users] Bad performances in recovery Hi, Thank you for the quick reply. However, we do have those exact settings for recovery and it still strongly affects client io. I have looked at various ceph logs and osd logs and nothing is out of the ordinary. Here's an idea though, please tell me if I am wrong. We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was explained several times on this mailing list, Samsung SSDs suck in ceph. They have horrible O_dsync speed and die easily, when used as journal. That's why we're using Intel ssds for journaling, so that we didn't end up putting 96 samsung SSDs in the trash. In recovery though, what is the ceph behaviour? What kind of write does it do on the OSD SSDs? Does it write directly to the SSDs or through the journal? Additionally, something else we notice: the ceph cluster is MUCH slower after recovery than before. Clearly there is a bottleneck somewhere and that bottleneck does not get cleared up after the recovery is done. On 2015-08-19 3:32 PM, Somnath Roy wrote: If you are concerned about *client io performance* during recovery, use these settings.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 If you are concerned about *recovery performance*, you may want to bump this up, but I doubt it will help much from default settings.. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P Methot Sent: Wednesday, August 19, 2015 12:17 PM To: ceph-us...@ceph.com Subject: [ceph-users] Bad performances in recovery Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider alright. Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet they still affect client io speed way more than it should. Why would ceph recovery become so slow as it progress and affect client io even though it's recovering at a snail's pace? And by a snail's pace, I mean a few kb/second on 10gbps uplinks. -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] Bad performances in recovery
Just to update the mailing list, we ended up going back to default ceph.conf without any additional settings than what is mandatory. We are now reaching speeds we never reached before, both in recovery and in regular usage. There was definitely something we set in the ceph.conf bogging everything down. Could you please share the old and new ceph.conf, or the section that was removed? Best regards, Alex On 2015-08-20 4:06 AM, Christian Balzer wrote: Hello, from all the pertinent points by Somnath, the one about pre-conditioning would be pretty high on my list, especially if this slowness persists and nothing else (scrub) is going on. This might be fixed by doing a fstrim. Additionally the levelDB's per OSD are of course sync'ing heavily during reconstruction, so that might not be the favorite thing for your type of SSDs. But ultimately situational awareness is very important, as in what is actually going and slowing things down. As usual my recommendations would be to use atop, iostat or similar on all your nodes and see if your OSD SSDs are indeed the bottleneck or if it is maybe just one of them or something else entirely. Christian On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote: Also, check if scrubbing started in the cluster or not. That may considerably slow down the cluster. -Original Message- From: Somnath Roy Sent: Wednesday, August 19, 2015 1:35 PM To: 'J-P Methot'; ceph-us...@ceph.com Subject: RE: [ceph-users] Bad performances in recovery All the writes will go through the journal. It may happen your SSDs are not preconditioned well and after a lot of writes during recovery IOs are stabilized to lower number. This is quite common for SSDs if that is the case. Thanks Regards Somnath -Original Message- From: J-P Methot [mailto:jpmet...@gtcomm.net] Sent: Wednesday, August 19, 2015 1:03 PM To: Somnath Roy; ceph-us...@ceph.com Subject: Re: [ceph-users] Bad performances in recovery Hi, Thank you for the quick reply. However, we do have those exact settings for recovery and it still strongly affects client io. I have looked at various ceph logs and osd logs and nothing is out of the ordinary. Here's an idea though, please tell me if I am wrong. We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was explained several times on this mailing list, Samsung SSDs suck in ceph. They have horrible O_dsync speed and die easily, when used as journal. That's why we're using Intel ssds for journaling, so that we didn't end up putting 96 samsung SSDs in the trash. In recovery though, what is the ceph behaviour? What kind of write does it do on the OSD SSDs? Does it write directly to the SSDs or through the journal? Additionally, something else we notice: the ceph cluster is MUCH slower after recovery than before. Clearly there is a bottleneck somewhere and that bottleneck does not get cleared up after the recovery is done. On 2015-08-19 3:32 PM, Somnath Roy wrote: If you are concerned about *client io performance* during recovery, use these settings.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 If you are concerned about *recovery performance*, you may want to bump this up, but I doubt it will help much from default settings.. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P Methot Sent: Wednesday, August 19, 2015 12:17 PM To: ceph-us...@ceph.com Subject: [ceph-users] Bad performances in recovery Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider alright. Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet they still affect client io speed way more than it should. Why would ceph recovery become so slow as it progress and affect client io even though it's recovering at a snail's pace? And by a snail's pace, I mean a few kb/second on 10gbps uplinks. -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net
Re: [ceph-users] Bad performances in recovery
Are you sure it was because of configuration changes? Maybe it was restarting the OSDs that fixed it? We often hit an issue with backfill_toofull where the recovery/backfill processes get stuck until we restart the daemons (sometimes setting recovery_max_active helps as well). It still shows recovery of few objects now and then (few KB/s) and then stops completely. Jan On 20 Aug 2015, at 17:43, Alex Gorbachev a...@iss-integration.com wrote: Just to update the mailing list, we ended up going back to default ceph.conf without any additional settings than what is mandatory. We are now reaching speeds we never reached before, both in recovery and in regular usage. There was definitely something we set in the ceph.conf bogging everything down. Could you please share the old and new ceph.conf, or the section that was removed? Best regards, Alex On 2015-08-20 4:06 AM, Christian Balzer wrote: Hello, from all the pertinent points by Somnath, the one about pre-conditioning would be pretty high on my list, especially if this slowness persists and nothing else (scrub) is going on. This might be fixed by doing a fstrim. Additionally the levelDB's per OSD are of course sync'ing heavily during reconstruction, so that might not be the favorite thing for your type of SSDs. But ultimately situational awareness is very important, as in what is actually going and slowing things down. As usual my recommendations would be to use atop, iostat or similar on all your nodes and see if your OSD SSDs are indeed the bottleneck or if it is maybe just one of them or something else entirely. Christian On Wed, 19 Aug 2015 20:54:11 + Somnath Roy wrote: Also, check if scrubbing started in the cluster or not. That may considerably slow down the cluster. -Original Message- From: Somnath Roy Sent: Wednesday, August 19, 2015 1:35 PM To: 'J-P Methot'; ceph-us...@ceph.com Subject: RE: [ceph-users] Bad performances in recovery All the writes will go through the journal. It may happen your SSDs are not preconditioned well and after a lot of writes during recovery IOs are stabilized to lower number. This is quite common for SSDs if that is the case. Thanks Regards Somnath -Original Message- From: J-P Methot [mailto:jpmet...@gtcomm.net] Sent: Wednesday, August 19, 2015 1:03 PM To: Somnath Roy; ceph-us...@ceph.com Subject: Re: [ceph-users] Bad performances in recovery Hi, Thank you for the quick reply. However, we do have those exact settings for recovery and it still strongly affects client io. I have looked at various ceph logs and osd logs and nothing is out of the ordinary. Here's an idea though, please tell me if I am wrong. We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was explained several times on this mailing list, Samsung SSDs suck in ceph. They have horrible O_dsync speed and die easily, when used as journal. That's why we're using Intel ssds for journaling, so that we didn't end up putting 96 samsung SSDs in the trash. In recovery though, what is the ceph behaviour? What kind of write does it do on the OSD SSDs? Does it write directly to the SSDs or through the journal? Additionally, something else we notice: the ceph cluster is MUCH slower after recovery than before. Clearly there is a bottleneck somewhere and that bottleneck does not get cleared up after the recovery is done. On 2015-08-19 3:32 PM, Somnath Roy wrote: If you are concerned about *client io performance* during recovery, use these settings.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 If you are concerned about *recovery performance*, you may want to bump this up, but I doubt it will help much from default settings.. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P Methot Sent: Wednesday, August 19, 2015 12:17 PM To: ceph-us...@ceph.com Subject: [ceph-users] Bad performances in recovery Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider alright. Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet
Re: [ceph-users] Bad performances in recovery
Hi, Thank you for the quick reply. However, we do have those exact settings for recovery and it still strongly affects client io. I have looked at various ceph logs and osd logs and nothing is out of the ordinary. Here's an idea though, please tell me if I am wrong. We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was explained several times on this mailing list, Samsung SSDs suck in ceph. They have horrible O_dsync speed and die easily, when used as journal. That's why we're using Intel ssds for journaling, so that we didn't end up putting 96 samsung SSDs in the trash. In recovery though, what is the ceph behaviour? What kind of write does it do on the OSD SSDs? Does it write directly to the SSDs or through the journal? Additionally, something else we notice: the ceph cluster is MUCH slower after recovery than before. Clearly there is a bottleneck somewhere and that bottleneck does not get cleared up after the recovery is done. On 2015-08-19 3:32 PM, Somnath Roy wrote: If you are concerned about *client io performance* during recovery, use these settings.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 If you are concerned about *recovery performance*, you may want to bump this up, but I doubt it will help much from default settings.. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P Methot Sent: Wednesday, August 19, 2015 12:17 PM To: ceph-us...@ceph.com Subject: [ceph-users] Bad performances in recovery Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider alright. Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet they still affect client io speed way more than it should. Why would ceph recovery become so slow as it progress and affect client io even though it's recovering at a snail's pace? And by a snail's pace, I mean a few kb/second on 10gbps uplinks. -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bad performances in recovery
All the writes will go through the journal. It may happen your SSDs are not preconditioned well and after a lot of writes during recovery IOs are stabilized to lower number. This is quite common for SSDs if that is the case. Thanks Regards Somnath -Original Message- From: J-P Methot [mailto:jpmet...@gtcomm.net] Sent: Wednesday, August 19, 2015 1:03 PM To: Somnath Roy; ceph-us...@ceph.com Subject: Re: [ceph-users] Bad performances in recovery Hi, Thank you for the quick reply. However, we do have those exact settings for recovery and it still strongly affects client io. I have looked at various ceph logs and osd logs and nothing is out of the ordinary. Here's an idea though, please tell me if I am wrong. We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was explained several times on this mailing list, Samsung SSDs suck in ceph. They have horrible O_dsync speed and die easily, when used as journal. That's why we're using Intel ssds for journaling, so that we didn't end up putting 96 samsung SSDs in the trash. In recovery though, what is the ceph behaviour? What kind of write does it do on the OSD SSDs? Does it write directly to the SSDs or through the journal? Additionally, something else we notice: the ceph cluster is MUCH slower after recovery than before. Clearly there is a bottleneck somewhere and that bottleneck does not get cleared up after the recovery is done. On 2015-08-19 3:32 PM, Somnath Roy wrote: If you are concerned about *client io performance* during recovery, use these settings.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 If you are concerned about *recovery performance*, you may want to bump this up, but I doubt it will help much from default settings.. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P Methot Sent: Wednesday, August 19, 2015 12:17 PM To: ceph-us...@ceph.com Subject: [ceph-users] Bad performances in recovery Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider alright. Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet they still affect client io speed way more than it should. Why would ceph recovery become so slow as it progress and affect client io even though it's recovering at a snail's pace? And by a snail's pace, I mean a few kb/second on 10gbps uplinks. -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Bad performances in recovery
Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider alright. Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet they still affect client io speed way more than it should. Why would ceph recovery become so slow as it progress and affect client io even though it's recovering at a snail's pace? And by a snail's pace, I mean a few kb/second on 10gbps uplinks. -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bad performances in recovery
If you are concerned about *client io performance* during recovery, use these settings.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 If you are concerned about *recovery performance*, you may want to bump this up, but I doubt it will help much from default settings.. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P Methot Sent: Wednesday, August 19, 2015 12:17 PM To: ceph-us...@ceph.com Subject: [ceph-users] Bad performances in recovery Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider alright. Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet they still affect client io speed way more than it should. Why would ceph recovery become so slow as it progress and affect client io even though it's recovering at a snail's pace? And by a snail's pace, I mean a few kb/second on 10gbps uplinks. -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bad performances in recovery
Also, check if scrubbing started in the cluster or not. That may considerably slow down the cluster. -Original Message- From: Somnath Roy Sent: Wednesday, August 19, 2015 1:35 PM To: 'J-P Methot'; ceph-us...@ceph.com Subject: RE: [ceph-users] Bad performances in recovery All the writes will go through the journal. It may happen your SSDs are not preconditioned well and after a lot of writes during recovery IOs are stabilized to lower number. This is quite common for SSDs if that is the case. Thanks Regards Somnath -Original Message- From: J-P Methot [mailto:jpmet...@gtcomm.net] Sent: Wednesday, August 19, 2015 1:03 PM To: Somnath Roy; ceph-us...@ceph.com Subject: Re: [ceph-users] Bad performances in recovery Hi, Thank you for the quick reply. However, we do have those exact settings for recovery and it still strongly affects client io. I have looked at various ceph logs and osd logs and nothing is out of the ordinary. Here's an idea though, please tell me if I am wrong. We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was explained several times on this mailing list, Samsung SSDs suck in ceph. They have horrible O_dsync speed and die easily, when used as journal. That's why we're using Intel ssds for journaling, so that we didn't end up putting 96 samsung SSDs in the trash. In recovery though, what is the ceph behaviour? What kind of write does it do on the OSD SSDs? Does it write directly to the SSDs or through the journal? Additionally, something else we notice: the ceph cluster is MUCH slower after recovery than before. Clearly there is a bottleneck somewhere and that bottleneck does not get cleared up after the recovery is done. On 2015-08-19 3:32 PM, Somnath Roy wrote: If you are concerned about *client io performance* during recovery, use these settings.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 If you are concerned about *recovery performance*, you may want to bump this up, but I doubt it will help much from default settings.. Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J-P Methot Sent: Wednesday, August 19, 2015 12:17 PM To: ceph-us...@ceph.com Subject: [ceph-users] Bad performances in recovery Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider alright. Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet they still affect client io speed way more than it should. Why would ceph recovery become so slow as it progress and affect client io even though it's recovering at a snail's pace? And by a snail's pace, I mean a few kb/second on 10gbps uplinks. -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com