Re: [ceph-users] SSDs for journals vs SSDs for a cache tier, which is better?
Thanks for your reply. > > Let's consider both cases: > > Journals on SSDs - for writes, the write operation returns right after > > data lands on the Journal's SSDs, but before it's written to the backing > > HDD. So, for writes, SSD journal approach should be comparable to having > > a SSD cache tier. > Not quite, see below. > > Could you elaborate a bit more? Are you saying that with a Journal on a SSD writes from clients, before they can return from the operation to the client, must end up on both the SSD (Journal) *and* HDD (actual data store behind that journal)? I was under the impression that one of the benefits of having a journal on a SSD is deferring the write to the slow HDD to a later time, until after the write call returns to the client. Is that not the case? If so, that would mean SSD cache tier should be much faster in terms of write latency than SSD journal. > In your specific case writes to the OSDs (HDDs) will be (at least) 50% > slower if your journals are on disk instead of the SSD. > Is that because of the above -- with Journal on the same disk (HDD) as the data, writes have to be written twice (assuming no btrfs/zfs cow) to the HDD (journal, and data). Whereas with a Journal on the SSD write to the Journal and disk can be done in parallel with write to the HDD? (But still both of those have to be completed before the write operation returns to the client). > (Which SSDs do you plan to use anyway?) > Intel DC S3700 Thanks, Piotr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] SSDs for journals vs SSDs for a cache tier, which is better?
Hey, Which one's "better": to use SSDs for storing journals, vs to use them as a writeback cache tier? All other things being equal. The usecase is a 15 osd-node cluster, with 6 HDDs and 1 SSDs per node. Used for block storage for a typical 20-hypervisor OpenStack cloud (with bunch of VMs running Linux). 10GigE public net + 10 GigE replication network. Let's consider both cases: Journals on SSDs - for writes, the write operation returns right after data lands on the Journal's SSDs, but before it's written to the backing HDD. So, for writes, SSD journal approach should be comparable to having a SSD cache tier. In both cases we're writing to an SSD (and to replica's SSDs), and returning to the client immediately after that. Data is only flushed to HDD later on. However for reads (of hot data) I would expect a SSD Cache Tier to be faster/better. That's because, in the case of having journals on SSDs, even if data is in the journal, it's always read from the (slow) backing disk anyway, right? But with a SSD cache tier, if the data is hot, it would be read from the (fast) SSD. I'm sure both approaches have their own merits, and might be better for some specific tasks, but with all other things being equal, I would expect that using SSDs as the "Writeback" cache tier should, on average, provide better performance than suing the same SSDs for Journals. Specifically in the area of read throughput/latency. The main difference, I suspect, between the two approaches is that in the case of multiple HDDs (multiple ceph-osd processes), all of those processes share access to the same shared SSD storing their journals. Whereas it's likely not the case with Cache tiering, right? Though I must say I failed to find any detailed info on this. Any clarification will be appreciated. So, is the above correct, or am I missing some pieces here? Any other major differences between the two approaches? Thanks. P. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?
> yes SSD-Journal helps a lot (if you use the right SSDs) > What SSDs to avoid for journaling from your experience? Why? > > > We're seeing very disappointing Ceph performance. We have 10GigE > > interconnect (as a shared public/internal network). > Which kind of CPU do you use for the OSD-hosts? > > Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz FYI, we are hosting VMs on our OSD nodes, but the VMs use very small amounts of CPUs and RAM > > > We're wondering whether it makes sense to buy SSDs and put journals on > > them. But we're looking for a way to verify that this will actually > > help BEFORE we splash cash on SSDs. > I can recommend the Intel DC S3700 SSD for journaling! In the beginning > I started with different much cheaper models, but this was the wrong > decision. > What, apart from the price, made the difference? sustained read/write bandwidth? IOPS? We're considering this one (PCI-e SSD). What do you think? http://www.plextor-digital.com/index.php/en/M6e-BK/m6e-bk.html PX-128M6e-BK Also, we're thinking about sharing one SSD between two OSDs. Any reason why this would be a bad idea? > > We're using Ceph for OpenStack storage (kvm). Enabling RBD cache > > didn't really help all that much. > The read speed can be optimized with an bigger read ahead cache inside > the VM, like: > echo 4096 > /sys/block/vda/queue/read_ahead_kb Thanks, we will try that. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?
Thanks for your answer, Nick. Typically it's a single rsync session at a time (sometimes two, but rarely more concurrently). So it's a single ~5GB typical linux filesystem from one random VM to another random VM. Apart from using RBD Cache, is there any other way to improve the overall performance of such a use case in a Ceph cluster? In theory I guess we could always tarball it, and rsync the tarball, thus effectively using sequential IO rather than random. But that's simply not feasible for us at the moment. Any other ways? Sidequestion: does using RBDCache impact the way data is stored on the client? (e.g. a write call returning after data has been written to Journal (fast) vs written all the way to the OSD data store(slow)). I'm guessing it's always the first one, regardless of whether client uses RBDCache or not, right? My logic here is that otherwise that would imply that clients can impact the way OSDs behave, which could be dangerous in some situations. Kind Regards, Piotr On Fri, May 1, 2015 at 10:59 AM, Nick Fisk wrote: > How many Rsync's are doing at a time? If it is only a couple, you will not > be able to take advantage of the full number of OSD's, as each block of > data is only located on 1 OSD (not including replicas). When you look at > disk statistics you are seeing an average over time, so it will look like > the OSD's are not very busy, when in fact each one is busy for a very brief > period. > > > > SSD journals will help your write latency, probably going down from around > 15-30ms to under 5ms > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *Piotr Wachowicz > *Sent:* 01 May 2015 09:31 > *To:* ceph-users@lists.ceph.com > *Subject:* [ceph-users] How to estimate whether putting a journal on SSD > will help with performance? > > > > Is there any way to confirm (beforehand) that using SSDs for journals will > help? > > We're seeing very disappointing Ceph performance. We have 10GigE > interconnect (as a shared public/internal network). > > > > We're wondering whether it makes sense to buy SSDs and put journals on > them. But we're looking for a way to verify that this will actually help > BEFORE we splash cash on SSDs. > > > > The problem is that the way we have things configured now, with journals > on spinning HDDs (shared with OSDs as the backend storage), apart from slow > read/write performance to Ceph I already mention, we're also seeing fairly > low disk utilization on OSDs. > > > > This low disk utilization suggests that journals are not really used to > their max, which begs for the questions whether buying SSDs for journals > will help. > > > > This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we > cannot really confirm that. > > > > Our typical data access use case is a lot of small random read/writes. > We're doing a lot of rsyncing (entire regular linux filesystems) from one > VM to another. > > > > We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't > really help all that much. > > > > So, is there any way to confirm beforehand that using SSDs for journals > will help in our case? > > > > Kind Regards, > Piotr > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to estimate whether putting a journal on SSD will help with performance?
Is there any way to confirm (beforehand) that using SSDs for journals will help? We're seeing very disappointing Ceph performance. We have 10GigE interconnect (as a shared public/internal network). We're wondering whether it makes sense to buy SSDs and put journals on them. But we're looking for a way to verify that this will actually help BEFORE we splash cash on SSDs. The problem is that the way we have things configured now, with journals on spinning HDDs (shared with OSDs as the backend storage), apart from slow read/write performance to Ceph I already mention, we're also seeing fairly low disk utilization on OSDs. This low disk utilization suggests that journals are not really used to their max, which begs for the questions whether buying SSDs for journals will help. This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we cannot really confirm that. Our typical data access use case is a lot of small random read/writes. We're doing a lot of rsyncing (entire regular linux filesystems) from one VM to another. We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't really help all that much. So, is there any way to confirm beforehand that using SSDs for journals will help in our case? Kind Regards, Piotr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Why is running OSDs on a Hypervisors a bad idea?
Hey, We keep hearing that running Hypervisors (KVM) on the OSD nodes is a bad idea. But why exactly is that the case? In our usecase, under normal operations our VMs use relatively low amounts of CPU resources. So are the OSD services, so why not combine them? (We use ceph for openstack volume/images storage, 7 shared OSD/KVM nodes, 2 pools, 128 PGs per pool, 2 OSDs per node, 10GigE) I know that during recovery the OSD memory usage spikes. So I guess that might be one of the reasons. But are there any other concrete examples of situations when the hypervisor could compete for CPU/mem resources with the OSD services running on the same node in a way which would noticeably impact the performance of either? Kind Regards, Piotr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com