[ceph-users] How to estimate whether putting a journal on SSD will help with performance?
Is there any way to confirm (beforehand) that using SSDs for journals will help? We're seeing very disappointing Ceph performance. We have 10GigE interconnect (as a shared public/internal network). We're wondering whether it makes sense to buy SSDs and put journals on them. But we're looking for a way to verify that this will actually help BEFORE we splash cash on SSDs. The problem is that the way we have things configured now, with journals on spinning HDDs (shared with OSDs as the backend storage), apart from slow read/write performance to Ceph I already mention, we're also seeing fairly low disk utilization on OSDs. This low disk utilization suggests that journals are not really used to their max, which begs for the questions whether buying SSDs for journals will help. This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we cannot really confirm that. Our typical data access use case is a lot of small random read/writes. We're doing a lot of rsyncing (entire regular linux filesystems) from one VM to another. We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't really help all that much. So, is there any way to confirm beforehand that using SSDs for journals will help in our case? Kind Regards, Piotr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?
Thanks for your answer, Nick. Typically it's a single rsync session at a time (sometimes two, but rarely more concurrently). So it's a single ~5GB typical linux filesystem from one random VM to another random VM. Apart from using RBD Cache, is there any other way to improve the overall performance of such a use case in a Ceph cluster? In theory I guess we could always tarball it, and rsync the tarball, thus effectively using sequential IO rather than random. But that's simply not feasible for us at the moment. Any other ways? Sidequestion: does using RBDCache impact the way data is stored on the client? (e.g. a write call returning after data has been written to Journal (fast) vs written all the way to the OSD data store(slow)). I'm guessing it's always the first one, regardless of whether client uses RBDCache or not, right? My logic here is that otherwise that would imply that clients can impact the way OSDs behave, which could be dangerous in some situations. Kind Regards, Piotr On Fri, May 1, 2015 at 10:59 AM, Nick Fisk n...@fisk.me.uk wrote: How many Rsync's are doing at a time? If it is only a couple, you will not be able to take advantage of the full number of OSD's, as each block of data is only located on 1 OSD (not including replicas). When you look at disk statistics you are seeing an average over time, so it will look like the OSD's are not very busy, when in fact each one is busy for a very brief period. SSD journals will help your write latency, probably going down from around 15-30ms to under 5ms *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Piotr Wachowicz *Sent:* 01 May 2015 09:31 *To:* ceph-users@lists.ceph.com *Subject:* [ceph-users] How to estimate whether putting a journal on SSD will help with performance? Is there any way to confirm (beforehand) that using SSDs for journals will help? We're seeing very disappointing Ceph performance. We have 10GigE interconnect (as a shared public/internal network). We're wondering whether it makes sense to buy SSDs and put journals on them. But we're looking for a way to verify that this will actually help BEFORE we splash cash on SSDs. The problem is that the way we have things configured now, with journals on spinning HDDs (shared with OSDs as the backend storage), apart from slow read/write performance to Ceph I already mention, we're also seeing fairly low disk utilization on OSDs. This low disk utilization suggests that journals are not really used to their max, which begs for the questions whether buying SSDs for journals will help. This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we cannot really confirm that. Our typical data access use case is a lot of small random read/writes. We're doing a lot of rsyncing (entire regular linux filesystems) from one VM to another. We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't really help all that much. So, is there any way to confirm beforehand that using SSDs for journals will help in our case? Kind Regards, Piotr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?
On 01-05-15 11:42, Nick Fisk wrote: Yeah, that’s your problem, doing a single thread rsync when you have quite poor write latency will not be quick. SSD journals should give you a fair performance boost, otherwise you need to coalesce the writes at the client so that Ceph is given bigger IOs at higher queue depths. Exactly. But Ceph doesn't excell in serial I/O streams like these. It performs best when I/O is done in parallel. So if you can figure a way put to run multiple rsyncs at the same time you might see a great performance boost. This way all OSDs can process the I/O instead of one by one. RBD Cache can help here as well as potentially FS tuning to buffer more aggressively. If writeback RBD cache is enabled, data will be buffered by RBD until a sync is called by the client, so data loss can occur during this period if the app is not issuing fsyncs properly. Once a sync is called data is flushed to the journals and then later to the actual OSD store. *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Piotr Wachowicz *Sent:* 01 May 2015 10:14 *To:* Nick Fisk *Cc:* ceph-users@lists.ceph.com *Subject:* Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance? Thanks for your answer, Nick. Typically it's a single rsync session at a time (sometimes two, but rarely more concurrently). So it's a single ~5GB typical linux filesystem from one random VM to another random VM. Apart from using RBD Cache, is there any other way to improve the overall performance of such a use case in a Ceph cluster? In theory I guess we could always tarball it, and rsync the tarball, thus effectively using sequential IO rather than random. But that's simply not feasible for us at the moment. Any other ways? Sidequestion: does using RBDCache impact the way data is stored on the client? (e.g. a write call returning after data has been written to Journal (fast) vs written all the way to the OSD data store(slow)). I'm guessing it's always the first one, regardless of whether client uses RBDCache or not, right? My logic here is that otherwise that would imply that clients can impact the way OSDs behave, which could be dangerous in some situations. Kind Regards, Piotr On Fri, May 1, 2015 at 10:59 AM, Nick Fisk n...@fisk.me.uk mailto:n...@fisk.me.uk wrote: How many Rsync’s are doing at a time? If it is only a couple, you will not be able to take advantage of the full number of OSD’s, as each block of data is only located on 1 OSD (not including replicas). When you look at disk statistics you are seeing an average over time, so it will look like the OSD’s are not very busy, when in fact each one is busy for a very brief period. SSD journals will help your write latency, probably going down from around 15-30ms to under 5ms *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Piotr Wachowicz *Sent:* 01 May 2015 09:31 *To:* ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com *Subject:* [ceph-users] How to estimate whether putting a journal on SSD will help with performance? Is there any way to confirm (beforehand) that using SSDs for journals will help? We're seeing very disappointing Ceph performance. We have 10GigE interconnect (as a shared public/internal network). We're wondering whether it makes sense to buy SSDs and put journals on them. But we're looking for a way to verify that this will actually help BEFORE we splash cash on SSDs. The problem is that the way we have things configured now, with journals on spinning HDDs (shared with OSDs as the backend storage), apart from slow read/write performance to Ceph I already mention, we're also seeing fairly low disk utilization on OSDs. This low disk utilization suggests that journals are not really used to their max, which begs for the questions whether buying SSDs for journals will help. This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we cannot really confirm that. Our typical data access use case is a lot of small random read/writes. We're doing a lot of rsyncing (entire regular linux filesystems) from one VM to another. We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't really help all that much. So, is there any way to confirm beforehand that using SSDs for journals will help in our case? Kind Regards, Piotr Image removed by sender. ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?
How many Rsync's are doing at a time? If it is only a couple, you will not be able to take advantage of the full number of OSD's, as each block of data is only located on 1 OSD (not including replicas). When you look at disk statistics you are seeing an average over time, so it will look like the OSD's are not very busy, when in fact each one is busy for a very brief period. SSD journals will help your write latency, probably going down from around 15-30ms to under 5ms From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Piotr Wachowicz Sent: 01 May 2015 09:31 To: ceph-users@lists.ceph.com Subject: [ceph-users] How to estimate whether putting a journal on SSD will help with performance? Is there any way to confirm (beforehand) that using SSDs for journals will help? We're seeing very disappointing Ceph performance. We have 10GigE interconnect (as a shared public/internal network). We're wondering whether it makes sense to buy SSDs and put journals on them. But we're looking for a way to verify that this will actually help BEFORE we splash cash on SSDs. The problem is that the way we have things configured now, with journals on spinning HDDs (shared with OSDs as the backend storage), apart from slow read/write performance to Ceph I already mention, we're also seeing fairly low disk utilization on OSDs. This low disk utilization suggests that journals are not really used to their max, which begs for the questions whether buying SSDs for journals will help. This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we cannot really confirm that. Our typical data access use case is a lot of small random read/writes. We're doing a lot of rsyncing (entire regular linux filesystems) from one VM to another. We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't really help all that much. So, is there any way to confirm beforehand that using SSDs for journals will help in our case? Kind Regards, Piotr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?
Also remember to drive your Ceph cluster as hard as you got means to, eg. tuning the VM OSes/IO sub systems like using multiple RBD devices per VM (to issue more out standing IOPs from VM IO subsystem), best IO scheduler, CPU power + memory per VM, also ensure low network latency + bandwidth between your rsyncing VMs etc. On 01/05/2015, at 11.13, Piotr Wachowicz piotr.wachow...@brightcomputing.com wrote: Thanks for your answer, Nick. Typically it's a single rsync session at a time (sometimes two, but rarely more concurrently). So it's a single ~5GB typical linux filesystem from one random VM to another random VM. Apart from using RBD Cache, is there any other way to improve the overall performance of such a use case in a Ceph cluster? In theory I guess we could always tarball it, and rsync the tarball, thus effectively using sequential IO rather than random. But that's simply not feasible for us at the moment. Any other ways? Sidequestion: does using RBDCache impact the way data is stored on the client? (e.g. a write call returning after data has been written to Journal (fast) vs written all the way to the OSD data store(slow)). I'm guessing it's always the first one, regardless of whether client uses RBDCache or not, right? My logic here is that otherwise that would imply that clients can impact the way OSDs behave, which could be dangerous in some situations. Kind Regards, Piotr On Fri, May 1, 2015 at 10:59 AM, Nick Fisk n...@fisk.me.uk mailto:n...@fisk.me.uk wrote: How many Rsync’s are doing at a time? If it is only a couple, you will not be able to take advantage of the full number of OSD’s, as each block of data is only located on 1 OSD (not including replicas). When you look at disk statistics you are seeing an average over time, so it will look like the OSD’s are not very busy, when in fact each one is busy for a very brief period. SSD journals will help your write latency, probably going down from around 15-30ms to under 5ms From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Piotr Wachowicz Sent: 01 May 2015 09:31 To: ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com Subject: [ceph-users] How to estimate whether putting a journal on SSD will help with performance? Is there any way to confirm (beforehand) that using SSDs for journals will help? We're seeing very disappointing Ceph performance. We have 10GigE interconnect (as a shared public/internal network). We're wondering whether it makes sense to buy SSDs and put journals on them. But we're looking for a way to verify that this will actually help BEFORE we splash cash on SSDs. The problem is that the way we have things configured now, with journals on spinning HDDs (shared with OSDs as the backend storage), apart from slow read/write performance to Ceph I already mention, we're also seeing fairly low disk utilization on OSDs. This low disk utilization suggests that journals are not really used to their max, which begs for the questions whether buying SSDs for journals will help. This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we cannot really confirm that. Our typical data access use case is a lot of small random read/writes. We're doing a lot of rsyncing (entire regular linux filesystems) from one VM to another. We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't really help all that much. So, is there any way to confirm beforehand that using SSDs for journals will help in our case? Kind Regards, Piotr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?
Yeah, that's your problem, doing a single thread rsync when you have quite poor write latency will not be quick. SSD journals should give you a fair performance boost, otherwise you need to coalesce the writes at the client so that Ceph is given bigger IOs at higher queue depths. RBD Cache can help here as well as potentially FS tuning to buffer more aggressively. If writeback RBD cache is enabled, data will be buffered by RBD until a sync is called by the client, so data loss can occur during this period if the app is not issuing fsyncs properly. Once a sync is called data is flushed to the journals and then later to the actual OSD store. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Piotr Wachowicz Sent: 01 May 2015 10:14 To: Nick Fisk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance? Thanks for your answer, Nick. Typically it's a single rsync session at a time (sometimes two, but rarely more concurrently). So it's a single ~5GB typical linux filesystem from one random VM to another random VM. Apart from using RBD Cache, is there any other way to improve the overall performance of such a use case in a Ceph cluster? In theory I guess we could always tarball it, and rsync the tarball, thus effectively using sequential IO rather than random. But that's simply not feasible for us at the moment. Any other ways? Sidequestion: does using RBDCache impact the way data is stored on the client? (e.g. a write call returning after data has been written to Journal (fast) vs written all the way to the OSD data store(slow)). I'm guessing it's always the first one, regardless of whether client uses RBDCache or not, right? My logic here is that otherwise that would imply that clients can impact the way OSDs behave, which could be dangerous in some situations. Kind Regards, Piotr On Fri, May 1, 2015 at 10:59 AM, Nick Fisk n...@fisk.me.uk mailto:n...@fisk.me.uk wrote: How many Rsync's are doing at a time? If it is only a couple, you will not be able to take advantage of the full number of OSD's, as each block of data is only located on 1 OSD (not including replicas). When you look at disk statistics you are seeing an average over time, so it will look like the OSD's are not very busy, when in fact each one is busy for a very brief period. SSD journals will help your write latency, probably going down from around 15-30ms to under 5ms From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com mailto:ceph-users-boun...@lists.ceph.com ] On Behalf Of Piotr Wachowicz Sent: 01 May 2015 09:31 To: ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com Subject: [ceph-users] How to estimate whether putting a journal on SSD will help with performance? Is there any way to confirm (beforehand) that using SSDs for journals will help? We're seeing very disappointing Ceph performance. We have 10GigE interconnect (as a shared public/internal network). We're wondering whether it makes sense to buy SSDs and put journals on them. But we're looking for a way to verify that this will actually help BEFORE we splash cash on SSDs. The problem is that the way we have things configured now, with journals on spinning HDDs (shared with OSDs as the backend storage), apart from slow read/write performance to Ceph I already mention, we're also seeing fairly low disk utilization on OSDs. This low disk utilization suggests that journals are not really used to their max, which begs for the questions whether buying SSDs for journals will help. This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we cannot really confirm that. Our typical data access use case is a lot of small random read/writes. We're doing a lot of rsyncing (entire regular linux filesystems) from one VM to another. We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't really help all that much. So, is there any way to confirm beforehand that using SSDs for journals will help in our case? Kind Regards, Piotr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?
Piotr, You may also investigate if the cache tier made of a couple of ssds could help you. Not sure how the data is used in your company, but if you have a bunch of hot data that moves around from one vm to another it might greatly speed up the rsync. On the other hand, if a lot of rsync data is cold, it might have an adverse effect on performance. As a test, you could try to create a small pool with a couple of ssds in a cache tier on top of your spinning osds. You don't need to purchase tons of ssds in advance. As a test case, I would suggest 2-4 ssds in a cache tier should be okay for the PoC. Andrei - Original Message - From: Nick Fisk n...@fisk.me.uk To: Piotr Wachowicz piotr.wachow...@brightcomputing.com Cc: ceph-users@lists.ceph.com Sent: Friday, 1 May, 2015 10:42:12 AM Subject: Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance? Yeah, that’s your problem, doing a single thread rsync when you have quite poor write latency will not be quick. SSD journals should give you a fair performance boost, otherwise you need to coalesce the writes at the client so that Ceph is given bigger IOs at higher queue depths. RBD Cache can help here as well as potentially FS tuning to buffer more aggressively. If writeback RBD cache is enabled, data will be buffered by RBD until a sync is called by the client, so data loss can occur during this period if the app is not issuing fsyncs properly. Once a sync is called data is flushed to the journals and then later to the actual OSD store. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Piotr Wachowicz Sent: 01 May 2015 10:14 To: Nick Fisk Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance? Thanks for your answer, Nick. Typically it's a single rsync session at a time (sometimes two, but rarely more concurrently). So it's a single ~5GB typical linux filesystem from one random VM to another random VM. Apart from using RBD Cache, is there any other way to improve the overall performance of such a use case in a Ceph cluster? In theory I guess we could always tarball it, and rsync the tarball, thus effectively using sequential IO rather than random. But that's simply not feasible for us at the moment. Any other ways? Sidequestion: does using RBDCache impact the way data is stored on the client? (e.g. a write call returning after data has been written to Journal (fast) vs written all the way to the OSD data store(slow)). I'm guessing it's always the first one, regardless of whether client uses RBDCache or not, right? My logic here is that otherwise that would imply that clients can impact the way OSDs behave, which could be dangerous in some situations. Kind Regards, Piotr On Fri, May 1, 2015 at 10:59 AM, Nick Fisk n...@fisk.me.uk wrote: How many Rsync’s are doing at a time? If it is only a couple, you will not be able to take advantage of the full number of OSD’s, as each block of data is only located on 1 OSD (not including replicas). When you look at disk statistics you are seeing an average over time, so it will look like the OSD’s are not very busy, when in fact each one is busy for a very brief period. SSD journals will help your write latency, probably going down from around 15-30ms to under 5ms From: ceph-users [mailto: ceph-users-boun...@lists.ceph.com ] On Behalf Of Piotr Wachowicz Sent: 01 May 2015 09:31 To: ceph-users@lists.ceph.com Subject: [ceph-users] How to estimate whether putting a journal on SSD will help with performance? Is there any way to confirm (beforehand) that using SSDs for journals will help? We're seeing very disappointing Ceph performance. We have 10GigE interconnect (as a shared public/internal network). We're wondering whether it makes sense to buy SSDs and put journals on them. But we're looking for a way to verify that this will actually help BEFORE we splash cash on SSDs. The problem is that the way we have things configured now, with journals on spinning HDDs (shared with OSDs as the backend storage), apart from slow read/write performance to Ceph I already mention, we're also seeing fairly low disk utilization on OSDs. This low disk utilization suggests that journals are not really used to their max, which begs for the questions whether buying SSDs for journals will help. This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we cannot really confirm that. Our typical data access use case is a lot of small random read/writes. We're doing a lot of rsyncing (entire regular linux filesystems) from one VM to another. We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't really help all that much. So, is there any way to confirm beforehand that using SSDs for journals
Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?
Hi, On 01.05.2015 10:30, Piotr Wachowicz wrote: Is there any way to confirm (beforehand) that using SSDs for journals will help? yes SSD-Journal helps a lot (if you use the right SSDs) for write speed, and I made the experiences that this also helped (but not too much) for read-performance. We're seeing very disappointing Ceph performance. We have 10GigE interconnect (as a shared public/internal network). Which kind of CPU do you use for the OSD-hosts? We're wondering whether it makes sense to buy SSDs and put journals on them. But we're looking for a way to verify that this will actually help BEFORE we splash cash on SSDs. I can recommend the Intel DC S3700 SSD for journaling! In the beginning I started with different much cheaper models, but this was the wrong decision. The problem is that the way we have things configured now, with journals on spinning HDDs (shared with OSDs as the backend storage), apart from slow read/write performance to Ceph I already mention, we're also seeing fairly low disk utilization on OSDs. This low disk utilization suggests that journals are not really used to their max, which begs for the questions whether buying SSDs for journals will help. This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we cannot really confirm that. Our typical data access use case is a lot of small random read/writes. We're doing a lot of rsyncing (entire regular linux filesystems) from one VM to another. We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't really help all that much. The read speed can be optimized with an bigger read ahead cache inside the VM, like: echo 4096 /sys/block/vda/queue/read_ahead_kb Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?
yes SSD-Journal helps a lot (if you use the right SSDs) What SSDs to avoid for journaling from your experience? Why? We're seeing very disappointing Ceph performance. We have 10GigE interconnect (as a shared public/internal network). Which kind of CPU do you use for the OSD-hosts? Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz FYI, we are hosting VMs on our OSD nodes, but the VMs use very small amounts of CPUs and RAM We're wondering whether it makes sense to buy SSDs and put journals on them. But we're looking for a way to verify that this will actually help BEFORE we splash cash on SSDs. I can recommend the Intel DC S3700 SSD for journaling! In the beginning I started with different much cheaper models, but this was the wrong decision. What, apart from the price, made the difference? sustained read/write bandwidth? IOPS? We're considering this one (PCI-e SSD). What do you think? http://www.plextor-digital.com/index.php/en/M6e-BK/m6e-bk.html PX-128M6e-BK Also, we're thinking about sharing one SSD between two OSDs. Any reason why this would be a bad idea? We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't really help all that much. The read speed can be optimized with an bigger read ahead cache inside the VM, like: echo 4096 /sys/block/vda/queue/read_ahead_kb Thanks, we will try that. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com