Re: [ceph-users] To put journals to SSD or not?

Fuchs, Andreas (SwissTXT) Mon, 02 Sep 2013 02:20:09 -0700

How do you test the random behavior of the disks, what's a good setup?
If I understand ceph writes in 4M blocks I also expect a 50%/50% rw ratio of 
our workloads, what else to I have to take into consideration.


Also what I not yet understand, in my performance test I get pretty nice rados 
bench results:
(osd nodes have 1Gb public and 1Gb sync interface, testnode has 10Gb nic to 
public)

rados bench -p test 30 write --no-cleanup
Bandwidth (MB/sec):     128.801 (here the 1Gb sync network is clearly the 
bottleneck)

rados bench -p test 30 seq
Bandwidth (MB/sec):    303.140 ( here it's the 1Gb public interface of the 3 
nodes)

But if I test still sequential workloads to a rbd device with the same pool 
settings as the testpool above, results are as follows

sudo dd if=/dev/zero of=/mnt/testfile bs=4M count=100 oflag=direct
419430400 bytes (419 MB) copied, 5.97832 s, 70.2 MB/s

I cannot identify the bottleneck here, no network interface is at his limit, 
cpu's are < 10%, iostat shows all disk working with ok numbers. The only 
difference I see ceph -w shows much more ops than with the rados bench. Any 
idea how I could identify the bottleneck here? Or is it just the single dd 
thread?

Regards
Andi


-----Original Message-----
From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Martin Rudat
Sent: Montag, 2. September 2013 01:44
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] To put journals to SSD or not?

On 2013-09-02 05:19, Fuchs, Andreas (SwissTXT) wrote:
> Reading through the documentation and talking to several peaople leads to the 
> conclusion that it's a best practice to place the journal of an OSD instance 
> to a separate SSD disk to speed writing up.
>
> But is this true? i have 3 new dell servers for testing available with 12 x 4 
> TB SATA and 2 x 100GB SSD disks. I don't have the exact specs at hand but 
> tests show:
>
> The SATA's sequential write speed is 300MB/s The SSD which is in RAID1 
> config is only 270MB/s ! was probably not the most expensive.
>
> When we put the journals on the OSD's i can expect a sequential wtite speed 
> of 12 x 150MB/s (on write to journal, one to disk), this is 1800MB/s per Node.
The thing is that, unless you've got a magical workload, you're not going to be 
seeing sequential write speeds from your spinning disks, because, at a minimum, 
a write to the journal at the beginning of the disk, and a write to data at a 
different portion of the disk is going to perform the same as random i/o... 
because the disk is going to have to seek, on average half-way across the 
platter each time it commits a new transaction to disk... this gets worse when 
you also take into account random reads, which also cause more disk seeks.

Sequential read on the disks I've got is at about 180M/s (they're cheap slow 
disk), random read/write on the array seems to be peaking around 10M/s a disk.

I'd benchmark your random i/o performance, and use that to choose how much, and 
how fast, a set of SSDs you will need.

I've actually got a 4-disk external hot-swap sata cage on order, that connects 
over a usb3 or esata link... sequential read/write even with the slow disk I've 
got will saturate the link... but filled with spinning disk doing random i/o, 
there should be plenty of headroom available... it'll be interesting to see if 
it's a worthwhile investment, as opposed to having to open a computer up to 
change disks.

--
Martin Rudat


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] To put journals to SSD or not?

Reply via email to