Re: [ceph-users] SSDs for journals vs SSDs for a cache tier, which is better?

2016-02-17 Thread Piotr Wachowicz
Thanks for your reply.


> > Let's consider both cases:
> > Journals on SSDs - for writes, the write operation returns right after
> > data lands on the Journal's SSDs, but before it's written to the backing
> > HDD. So, for writes, SSD journal approach should be comparable to having
> > a SSD cache tier.
> Not quite, see below.
>
>
Could you elaborate a bit more?

Are you saying that with a Journal on a SSD writes from clients, before
they can return from the operation to the client, must end up on both the
SSD (Journal) *and* HDD (actual data store behind that journal)? I was
under the impression that one of the benefits of having a journal on a SSD
is deferring the write to the slow HDD to a later time, until after the
write call returns to the client. Is that not the case? If so, that would
mean SSD cache tier should be much faster in terms of write latency than
SSD journal.


> In your specific case writes to the OSDs (HDDs) will be (at least) 50%
> slower if your journals are on disk instead of the SSD.
>

Is that because of the above -- with Journal on the same disk (HDD) as the
data, writes have to be written twice (assuming no btrfs/zfs cow) to the
HDD (journal, and data). Whereas with a Journal on the SSD write to the
Journal and disk can be done in parallel with write to the HDD? (But still
both of those have to be completed before the write operation returns to
the client).



> (Which SSDs do you plan to use anyway?)
>

Intel DC S3700


Thanks,
Piotr
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SSDs for journals vs SSDs for a cache tier, which is better?

2016-02-16 Thread Piotr Wachowicz
Hey,

Which one's "better": to use SSDs for storing journals, vs to use them as a
writeback cache tier? All other things being equal.

The usecase is a 15 osd-node cluster, with 6 HDDs and 1 SSDs per node.
Used for block storage for a typical 20-hypervisor OpenStack cloud (with
bunch of VMs running Linux). 10GigE public net + 10 GigE replication
network.

Let's consider both cases:
Journals on SSDs - for writes, the write operation returns right after data
lands on the Journal's SSDs, but before it's written to the backing HDD.
So, for writes, SSD journal approach should be comparable to having a SSD
cache tier. In both cases we're writing to an SSD (and to replica's SSDs),
and returning to the client immediately after that. Data is only flushed to
HDD later on.

However for reads (of hot data) I would expect a SSD Cache Tier to be
faster/better. That's because, in the case of having journals on SSDs, even
if data is in the journal, it's always read from the (slow) backing disk
anyway, right? But with a SSD cache tier, if the data is hot, it would be
read from the (fast) SSD.

I'm sure both approaches have their own merits, and might be better for
some specific tasks, but with all other things being equal, I would expect
that using SSDs as the "Writeback" cache tier should, on average, provide
better performance than suing the same SSDs for Journals. Specifically in
the area of read throughput/latency.

The main difference, I suspect, between the two approaches is that in the
case of multiple HDDs (multiple ceph-osd processes), all of those processes
share access to the same shared SSD storing their journals. Whereas it's
likely not the case with Cache tiering, right? Though I must say I failed
to find any detailed info on this. Any clarification will be appreciated.

So, is the above correct, or am I missing some pieces here? Any other major
differences between the two approaches?

Thanks.
P.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?

2015-05-01 Thread Piotr Wachowicz
> yes SSD-Journal helps a lot (if you use the right SSDs)
>

What SSDs to avoid for journaling from your experience? Why?

>
> > We're seeing very disappointing Ceph performance. We have 10GigE
> > interconnect (as a shared public/internal network).
> Which kind of CPU do you use for the OSD-hosts?
>
>
Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz

FYI, we are hosting VMs on our OSD nodes, but the VMs use very small
amounts of CPUs and RAM

>
> > We're wondering whether it makes sense to buy SSDs and put journals on
> > them. But we're looking for a way to verify that this will actually
> > help BEFORE we splash cash on SSDs.
> I can recommend the Intel DC S3700 SSD for journaling! In the beginning
> I started with different much cheaper models, but this was the wrong
> decision.
>

What, apart from the price, made the difference? sustained read/write
bandwidth? IOPS?

We're considering this one (PCI-e SSD). What do you think?
http://www.plextor-digital.com/index.php/en/M6e-BK/m6e-bk.html
PX-128M6e-BK


Also, we're thinking about sharing one SSD between two OSDs. Any reason why
this would be a bad idea?


> > We're using Ceph for OpenStack storage (kvm). Enabling RBD cache
> > didn't really help all that much.
> The read speed can be optimized with an bigger read ahead cache inside
> the VM, like:
> echo 4096 > /sys/block/vda/queue/read_ahead_kb



Thanks, we will try that.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?

2015-05-01 Thread Piotr Wachowicz
Thanks for your answer, Nick.

Typically it's a single rsync session at a time (sometimes two, but rarely
more concurrently). So it's a single ~5GB typical linux filesystem from one
random VM to another random VM.

Apart from using RBD Cache, is there any other way to improve the overall
performance of such a use case in a Ceph cluster?

In theory I guess we could always tarball it, and rsync the tarball, thus
effectively using sequential IO rather than random. But that's simply not
feasible for us at the moment. Any other ways?

Sidequestion: does using RBDCache impact the way data is stored on the
client? (e.g. a write call returning after data has been written to Journal
(fast) vs  written all the way to the OSD data store(slow)). I'm guessing
it's always the first one, regardless of whether client uses RBDCache or
not, right? My logic here is that otherwise that would imply that clients
can impact the way OSDs behave, which could be dangerous in some situations.

Kind Regards,
Piotr



On Fri, May 1, 2015 at 10:59 AM, Nick Fisk  wrote:

> How many Rsync's are doing at a time? If it is only a couple, you will not
> be able to take advantage of the full number of OSD's, as each block of
> data is only located on 1 OSD (not including replicas). When you look at
> disk statistics you are seeing an average over time, so it will look like
> the OSD's are not very busy, when in fact each one is busy for a very brief
> period.
>
>
>
> SSD journals will help your write latency, probably going down from around
> 15-30ms to under 5ms
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Piotr Wachowicz
> *Sent:* 01 May 2015 09:31
> *To:* ceph-users@lists.ceph.com
> *Subject:* [ceph-users] How to estimate whether putting a journal on SSD
> will help with performance?
>
>
>
> Is there any way to confirm (beforehand) that using SSDs for journals will
> help?
>
> We're seeing very disappointing Ceph performance. We have 10GigE
> interconnect (as a shared public/internal network).
>
>
>
> We're wondering whether it makes sense to buy SSDs and put journals on
> them. But we're looking for a way to verify that this will actually help
> BEFORE we splash cash on SSDs.
>
>
>
> The problem is that the way we have things configured now, with journals
> on spinning HDDs (shared with OSDs as the backend storage), apart from slow
> read/write performance to Ceph I already mention, we're also seeing fairly
> low disk utilization on OSDs.
>
>
>
> This low disk utilization suggests that journals are not really used to
> their max, which begs for the questions whether buying SSDs for journals
> will help.
>
>
>
> This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we
> cannot really confirm that.
>
>
>
> Our typical data access use case is a lot of small random read/writes.
> We're doing a lot of rsyncing (entire regular linux filesystems) from one
> VM to another.
>
>
>
> We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't
> really help all that much.
>
>
>
> So, is there any way to confirm beforehand that using SSDs for journals
> will help in our case?
>
>
>
> Kind Regards,
> Piotr
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to estimate whether putting a journal on SSD will help with performance?

2015-05-01 Thread Piotr Wachowicz
Is there any way to confirm (beforehand) that using SSDs for journals will
help?

We're seeing very disappointing Ceph performance. We have 10GigE
interconnect (as a shared public/internal network).

We're wondering whether it makes sense to buy SSDs and put journals on
them. But we're looking for a way to verify that this will actually help
BEFORE we splash cash on SSDs.

The problem is that the way we have things configured now, with journals on
spinning HDDs (shared with OSDs as the backend storage), apart from slow
read/write performance to Ceph I already mention, we're also seeing fairly
low disk utilization on OSDs.

This low disk utilization suggests that journals are not really used to
their max, which begs for the questions whether buying SSDs for journals
will help.

This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we
cannot really confirm that.

Our typical data access use case is a lot of small random read/writes.
We're doing a lot of rsyncing (entire regular linux filesystems) from one
VM to another.

We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't
really help all that much.

So, is there any way to confirm beforehand that using SSDs for journals
will help in our case?

Kind Regards,
Piotr
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Why is running OSDs on a Hypervisors a bad idea?

2015-04-06 Thread Piotr Wachowicz
Hey,

We keep hearing that running Hypervisors (KVM) on the OSD nodes is a bad
idea. But why exactly is that the case?

In our usecase, under normal operations our VMs use relatively low amounts
of CPU resources. So are the OSD services, so why not combine them? (We use
ceph for openstack volume/images storage, 7 shared OSD/KVM nodes, 2 pools,
128 PGs per pool, 2 OSDs per node, 10GigE)

I know that during recovery the OSD memory usage spikes. So I guess that
might be one of the reasons.

But are there any other concrete examples of situations when the hypervisor
could compete for CPU/mem resources with the OSD services running on the
same node in a way which would noticeably impact the performance of either?

Kind Regards,
Piotr
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com