On 05/13/2013 09:52 AM, Greg wrote:
Le 13/05/2013 15:55, Mark Nelson a écrit :
On 05/13/2013 07:26 AM, Greg wrote:
Le 13/05/2013 07:38, Olivier Bonvalet a écrit :
Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :
Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to
verify the
disks are not the bottleneck for 1 client). All nodes are connected
on a
1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :
    sec Cur ops   started  finished avg MB/s  cur MB/s  last lat
avg lat
      0       0         0         0         0 0 -         0
      1      16        39        23   91.9586        92 0.966117
0.431249
      2      16        64        48   95.9602       100 0.513435
0.53849
      3      16        90        74   98.6317       104 0.25631
0.55494
      4      11        95        84   83.9735        40 1.80038
0.58712
  Total time run:        4.165747
Total reads made:     95
Read size:            4194304
Bandwidth (MB/sec):    91.220

Average Latency:       0.678901
Max latency:           1.80038
Min latency:           0.104719
91MB read performance, quite good !

Now the RBD performance :
root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s
There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?
You should try to increase read_ahead to 512K instead of the defaults
128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
on reads with that.

Olivier,

thanks a lot for pointing this out, it indeed makes a *huge*
difference !
# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s
(caches dropped before each test of course)

Mark, this is probably something you will want to investigate and
explain in a "tweaking" topic of the documentation.

Regards,

Out of curiosity, has your rados bench performance improved as well?
We've also seen improvements for sequential read throughput when
increasing read_ahead_kb. (it may decrease random iops in some cases
though!)  The reason I didn't think to mention it here though is
because I was just focused on the difference between rados bench and
rbd.  It would be interesting to know if rbd has improved more
dramatically than rados bench.
Mark, the read ahead is set on the RBD block device (on the client), so
it doesn't improve benchmark results as the benchmark doesn't use the
block layer.

Ah, I was thinking you had increased it on the OSDs (which can also help). On the OSD side, if you are targeting spinning disks, it can depend a lot on how much data is stored per track and the cost of head switches and track switches.


1 question remains : why did I have poor performance with 1 single
writing thread ?

In general, parallelism is really helpful because it hides latency and also helps you spread the load over all of your OSDs. Even on a single disk, having concurrent requests lets the scheduler/controller do a better job of ordering requests. Even on high performance distributed file systems like lustre you generally are going to do best with lots of IO nodes reading/writing multiple files.


Regards,

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to