Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Mark Nelson

On 05/13/2013 09:52 AM, Greg wrote:

Le 13/05/2013 15:55, Mark Nelson a écrit :

On 05/13/2013 07:26 AM, Greg wrote:

Le 13/05/2013 07:38, Olivier Bonvalet a écrit :

Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to
verify the
disks are not the bottleneck for 1 client). All nodes are connected
on a
1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

sec Cur ops   started  finished avg MB/s  cur MB/s  last lat
avg lat
  0   0 0 0 0 0 - 0
  1  163923   91.958692 0.966117
0.431249
  2  166448   95.9602   100 0.513435
0.53849
  3  169074   98.6317   104 0.25631
0.55494
  4  119584   83.973540 1.80038
0.58712
  Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719

91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s

There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?

You should try to increase read_ahead to 512K instead of the defaults
128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
on reads with that.


Olivier,

thanks a lot for pointing this out, it indeed makes a *huge*
difference !

# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s

(caches dropped before each test of course)

Mark, this is probably something you will want to investigate and
explain in a "tweaking" topic of the documentation.

Regards,


Out of curiosity, has your rados bench performance improved as well?
We've also seen improvements for sequential read throughput when
increasing read_ahead_kb. (it may decrease random iops in some cases
though!)  The reason I didn't think to mention it here though is
because I was just focused on the difference between rados bench and
rbd.  It would be interesting to know if rbd has improved more
dramatically than rados bench.

Mark, the read ahead is set on the RBD block device (on the client), so
it doesn't improve benchmark results as the benchmark doesn't use the
block layer.


Ah, I was thinking you had increased it on the OSDs (which can also 
help).  On the OSD side, if you are targeting spinning disks, it can 
depend a lot on how much data is stored per track and the cost of head 
switches and track switches.




1 question remains : why did I have poor performance with 1 single
writing thread ?


In general, parallelism is really helpful because it hides latency and 
also helps you spread the load over all of your OSDs.  Even on a single 
disk, having concurrent requests lets the scheduler/controller do a 
better job of ordering requests.  Even on high performance distributed 
file systems like lustre you generally are going to do best with lots of 
IO nodes reading/writing multiple files.




Regards,


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Greg

Le 13/05/2013 17:01, Gandalf Corvotempesta a écrit :

2013/5/13 Greg :

thanks a lot for pointing this out, it indeed makes a *huge* difference !

# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100

100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s

(caches dropped before each test of course)

What if you set 1024 or greater value ?
Is bandwidth relative to the read ahead size?
Setting the value too high degrades performance, especially random IO 
performance.

You have to determine  the right choice for your usage.

Cheers,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Mark Nelson

On 05/13/2013 10:01 AM, Gandalf Corvotempesta wrote:

2013/5/13 Greg :

thanks a lot for pointing this out, it indeed makes a *huge* difference !


# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100

100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s


(caches dropped before each test of course)


What if you set 1024 or greater value ?
Is bandwidth relative to the read ahead size?


It may help with sequential reads, but it may also slow down small 
random reads if you set it too big.  Probably a whole new article could 
be written on testing the effects of read_ahead at difference levels in 
the storage stack.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Gandalf Corvotempesta
2013/5/13 Greg :
> thanks a lot for pointing this out, it indeed makes a *huge* difference !
>>
>> # dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
>>
>> 100+0 records in
>> 100+0 records out
>> 419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s
>
> (caches dropped before each test of course)

What if you set 1024 or greater value ?
Is bandwidth relative to the read ahead size?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Greg

Le 13/05/2013 15:55, Mark Nelson a écrit :

On 05/13/2013 07:26 AM, Greg wrote:

Le 13/05/2013 07:38, Olivier Bonvalet a écrit :

Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to 
verify the
disks are not the bottleneck for 1 client). All nodes are connected 
on a

1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

sec Cur ops   started  finished avg MB/s  cur MB/s  last lat
avg lat
  0   0 0 0 0 0 - 0
  1  163923   91.958692 0.966117
0.431249
  2  166448   95.9602   100 0.513435
0.53849
  3  169074   98.6317   104 0.25631
0.55494
  4  119584   83.973540 1.80038
0.58712
  Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719

91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s

There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?

You should try to increase read_ahead to 512K instead of the defaults
128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
on reads with that.


Olivier,

thanks a lot for pointing this out, it indeed makes a *huge* 
difference !

# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s

(caches dropped before each test of course)

Mark, this is probably something you will want to investigate and
explain in a "tweaking" topic of the documentation.

Regards,


Out of curiosity, has your rados bench performance improved as well? 
We've also seen improvements for sequential read throughput when 
increasing read_ahead_kb. (it may decrease random iops in some cases 
though!)  The reason I didn't think to mention it here though is 
because I was just focused on the difference between rados bench and 
rbd.  It would be interesting to know if rbd has improved more 
dramatically than rados bench.
Mark, the read ahead is set on the RBD block device (on the client), so 
it doesn't improve benchmark results as the benchmark doesn't use the 
block layer.


1 question remains : why did I have poor performance with 1 single 
writing thread ?


Regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Mark Nelson

On 05/13/2013 07:26 AM, Greg wrote:

Le 13/05/2013 07:38, Olivier Bonvalet a écrit :

Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to verify the
disks are not the bottleneck for 1 client). All nodes are connected on a
1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat
avg lat
  0   0 0 0 0 0 - 0
  1  163923   91.958692 0.966117
0.431249
  2  166448   95.9602   100 0.513435
0.53849
  3  169074   98.6317   104 0.25631
0.55494
  4  119584   83.973540 1.80038
0.58712
  Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719

91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s

There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?

You should try to increase read_ahead to 512K instead of the defaults
128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
on reads with that.


Olivier,

thanks a lot for pointing this out, it indeed makes a *huge* difference !

# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s

(caches dropped before each test of course)

Mark, this is probably something you will want to investigate and
explain in a "tweaking" topic of the documentation.

Regards,


Out of curiosity, has your rados bench performance improved as well? 
We've also seen improvements for sequential read throughput when 
increasing read_ahead_kb. (it may decrease random iops in some cases 
though!)  The reason I didn't think to mention it here though is because 
I was just focused on the difference between rados bench and rbd.  It 
would be interesting to know if rbd has improved more dramatically than 
rados bench.


Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-13 Thread Greg

Le 13/05/2013 07:38, Olivier Bonvalet a écrit :

Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to verify the
disks are not the bottleneck for 1 client). All nodes are connected on a
1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
  0   0 0 0 0 0 - 0
  1  163923   91.958692 0.966117  0.431249
  2  166448   95.9602   100 0.513435   0.53849
  3  169074   98.6317   104 0.25631   0.55494
  4  119584   83.973540 1.80038   0.58712
  Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719

91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s

There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?

You should try to increase read_ahead to 512K instead of the defaults
128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
on reads with that.


Olivier,

thanks a lot for pointing this out, it indeed makes a *huge* difference !

# dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s

(caches dropped before each test of course)

Mark, this is probably something you will want to investigate and 
explain in a "tweaking" topic of the documentation.


Regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-12 Thread Olivier Bonvalet
Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :
> Hello folks,
> 
> I'm in the process of testing CEPH and RBD, I have set up a small 
> cluster of  hosts running each a MON and an OSD with both journal and 
> data on the same SSD (ok this is stupid but this is simple to verify the 
> disks are not the bottleneck for 1 client). All nodes are connected on a 
> 1Gb network (no dedicated network for OSDs, shame on me :).
> 
> Summary : the RBD performance is poor compared to benchmark
> 
> A 5 seconds seq read benchmark shows something like this :
> >sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
> >  0   0 0 0 0 0 - 0
> >  1  163923   91.958692 0.966117  0.431249
> >  2  166448   95.9602   100 0.513435   0.53849
> >  3  169074   98.6317   104 0.25631   0.55494
> >  4  119584   83.973540 1.80038   0.58712
> >  Total time run:4.165747
> > Total reads made: 95
> > Read size:4194304
> > Bandwidth (MB/sec):91.220
> >
> > Average Latency:   0.678901
> > Max latency:   1.80038
> > Min latency:   0.104719
> 
> 91MB read performance, quite good !
> 
> Now the RBD performance :
> > root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
> > 100+0 records in
> > 100+0 records out
> > 419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s
> 
> There is a 3x performance factor (same for write: ~60M benchmark, ~20M 
> dd on block device)
> 
> The network is ok, the CPU is also ok on all OSDs.
> CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some 
> patches for the SoC being used)
> 
> Can you show me the starting point for digging into this ?
> 
> Thanks!
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

You should try to increase read_ahead to 512K instead of the defaults
128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
on reads with that.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-12 Thread w sun
that helps. thx

CC: pi...@pioto.org; ceph-users@lists.ceph.com
From: j.michael.l...@gmail.com
Subject: Re: [ceph-users] RBD vs RADOS benchmark performance
Date: Sat, 11 May 2013 13:16:18 -0400
To: ws...@hotmail.com

Hmm try searching the libvirt git for josh as an author you should see the 
commit from Josh Durgan about white listing rbd migration.


On May 11, 2013, at 10:53 AM, w sun  wrote:




The reference Mike provided is not valid to me.  Anyone else has the same 
problem? --weiguo

From: j.michael.l...@gmail.com
Date: Sat, 11 May 2013 08:45:41 -0400
To: pi...@pioto.org
CC: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] RBD vs RADOS benchmark performance

I believe that this is fixed in the most recent versions of libvirt, sheepdog 
and rbd were marked erroneously as unsafe.
http://libvirt.org/git/?p=libvirt.git;a=commit;h=78290b1641e95304c862062ee0aca95395c5926c

Sent from my iPad
On May 11, 2013, at 8:36 AM, Mike Kelly  wrote:

(Sorry for sending this twice... Forgot to reply to the list)
Is rbd caching safe to enable when you may need to do a live migration of the 
guest later on? It was my understanding that it wasn't, and that libvirt 
prevented you from doing the migration of it knew about the caching setting.

If it isn't, is there anything else that could help performance? Like, some 
tuning of block size parameters for the rbd image or the qemu 
On May 10, 2013 8:57 PM, "Mark Nelson"  wrote:

On 05/10/2013 07:21 PM, Yun Mao wrote:


Hi Mark,



Given the same hardware, optimal configuration (I have no idea what that

means exactly but feel free to specify), which is supposed to perform

better, kernel rbd or qemu/kvm? Thanks,



Yun




Hi Yun,



I'm in the process of actually running some tests right now.



In previous testing, it looked like kernel rbd and qemu/kvm performed about the 
same with cache off.  With cache on (in cuttlefish), small sequential write 
performance improved pretty dramatically vs without cache.  Large write 
performance seemed to take more concurrency to reach peak performance, but 
ultimately aggregate throughput was about the same.




Hopefully I should have some new results published in the near future.



Mark








On Fri, May 10, 2013 at 6:56 PM, Mark Nelson mailto:mark.nel...@inktank.com>> wrote:



On 05/10/2013 12:16 PM, Greg wrote:



Hello folks,



I'm in the process of testing CEPH and RBD, I have set up a small

cluster of  hosts running each a MON and an OSD with both

journal and

data on the same SSD (ok this is stupid but this is simple to

verify the

disks are not the bottleneck for 1 client). All nodes are

connected on a

1Gb network (no dedicated network for OSDs, shame on me :).



Summary : the RBD performance is poor compared to benchmark



A 5 seconds seq read benchmark shows something like this :



sec Cur ops   started  finished  avg MB/s  cur MB/s

  last lat   avg

lat

  0   0 0 0 0 0 -

   0

  1  163923   91.958692

0.966117  0.431249

  2  166448   95.9602   100

0.513435   0.53849

  3  169074   98.6317   104

0.25631   0.55494

  4  119584   83.973540

1.80038   0.58712

  Total time run:4.165747

Total reads made: 95

Read size:4194304

Bandwidth (MB/sec):91.220



Average Latency:   0.678901

Max latency:   1.80038

Min latency:   0.104719





91MB read performance, quite good !



Now the RBD performance :



root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100

100+0 records in

100+0 records out

419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s





There is a 3x performance factor (same for write: ~60M

benchmark, ~20M

dd on block device)



The network is ok, the CPU is also ok on all OSDs.

CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some

patches for the SoC being used)



Can you show me the starting point for digging into this ?





Hi Greg, First things first, are you doing kernel rbd or qemu/kvm?

  If you are doing qemu/kvm, make sure you are using virtio disks.

  This can have a pretty big performance impact.  Next, are you

using RBD cache? With 0.56.4 there are some performance issues with

large sequential writes if cache is on, but it does provide benefit

for small sequential writes.  In general RBD cache behaviour has

improved with Cuttlefis

Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-11 Thread Mike Lowe
Hmm try searching the libvirt git for josh as an author you should see the 
commit from Josh Durgan about white listing rbd migration.




On May 11, 2013, at 10:53 AM, w sun  wrote:

> The reference Mike provided is not valid to me.  Anyone else has the same 
> problem? --weiguo
> 
> From: j.michael.l...@gmail.com
> Date: Sat, 11 May 2013 08:45:41 -0400
> To: pi...@pioto.org
> CC: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] RBD vs RADOS benchmark performance
> 
> I believe that this is fixed in the most recent versions of libvirt, sheepdog 
> and rbd were marked erroneously as unsafe.
> 
> http://libvirt.org/git/?p=libvirt.git;a=commit;h=78290b1641e95304c862062ee0aca95395c5926c
> 
> Sent from my iPad
> 
> On May 11, 2013, at 8:36 AM, Mike Kelly  wrote:
> 
> (Sorry for sending this twice... Forgot to reply to the list)
> 
> Is rbd caching safe to enable when you may need to do a live migration of the 
> guest later on? It was my understanding that it wasn't, and that libvirt 
> prevented you from doing the migration of it knew about the caching setting.
> 
> If it isn't, is there anything else that could help performance? Like, some 
> tuning of block size parameters for the rbd image or the qemu
> 
> On May 10, 2013 8:57 PM, "Mark Nelson"  wrote:
> On 05/10/2013 07:21 PM, Yun Mao wrote:
> Hi Mark,
> 
> Given the same hardware, optimal configuration (I have no idea what that
> means exactly but feel free to specify), which is supposed to perform
> better, kernel rbd or qemu/kvm? Thanks,
> 
> Yun
> 
> Hi Yun,
> 
> I'm in the process of actually running some tests right now.
> 
> In previous testing, it looked like kernel rbd and qemu/kvm performed about 
> the same with cache off.  With cache on (in cuttlefish), small sequential 
> write performance improved pretty dramatically vs without cache.  Large write 
> performance seemed to take more concurrency to reach peak performance, but 
> ultimately aggregate throughput was about the same.
> 
> Hopefully I should have some new results published in the near future.
> 
> Mark
> 
> 
> 
> On Fri, May 10, 2013 at 6:56 PM, Mark Nelson  <mailto:mark.nel...@inktank.com>> wrote:
> 
> On 05/10/2013 12:16 PM, Greg wrote:
> 
> Hello folks,
> 
> I'm in the process of testing CEPH and RBD, I have set up a small
> cluster of  hosts running each a MON and an OSD with both
> journal and
> data on the same SSD (ok this is stupid but this is simple to
> verify the
> disks are not the bottleneck for 1 client). All nodes are
> connected on a
> 1Gb network (no dedicated network for OSDs, shame on me :).
> 
> Summary : the RBD performance is poor compared to benchmark
> 
> A 5 seconds seq read benchmark shows something like this :
> 
> sec Cur ops   started  finished  avg MB/s  cur MB/s
>   last lat   avg
> lat
>   0   0 0 0 0 0 -
>0
>   1  163923   91.958692
> 0.966117  0.431249
>   2  166448   95.9602   100
> 0.513435   0.53849
>   3  169074   98.6317   104
> 0.25631   0.55494
>   4  119584   83.973540
> 1.80038   0.58712
>   Total time run:4.165747
> Total reads made: 95
> Read size:4194304
> Bandwidth (MB/sec):91.220
> 
> Average Latency:   0.678901
> Max latency:   1.80038
> Min latency:   0.104719
> 
> 
> 91MB read performance, quite good !
> 
> Now the RBD performance :
> 
> root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
> 100+0 records in
> 100+0 records out
> 419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s
> 
> 
> There is a 3x performance factor (same for write: ~60M
> benchmark, ~20M
> dd on block device)
> 
> The network is ok, the CPU is also ok on all OSDs.
> CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
> patches for the SoC being used)
> 
> Can you show me the starting point for digging into this ?
> 
> 
> Hi Greg, First things first, are you doing kernel rbd or qemu/kvm?
>   If you are doing qemu/kvm, make sure you are using virtio disks.
>   This can ha

Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-11 Thread w sun
The reference Mike provided is not valid to me.  Anyone else has the same 
problem? --weiguo

From: j.michael.l...@gmail.com
Date: Sat, 11 May 2013 08:45:41 -0400
To: pi...@pioto.org
CC: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] RBD vs RADOS benchmark performance

I believe that this is fixed in the most recent versions of libvirt, sheepdog 
and rbd were marked erroneously as unsafe.
http://libvirt.org/git/?p=libvirt.git;a=commit;h=78290b1641e95304c862062ee0aca95395c5926c

Sent from my iPad
On May 11, 2013, at 8:36 AM, Mike Kelly  wrote:

(Sorry for sending this twice... Forgot to reply to the list)
Is rbd caching safe to enable when you may need to do a live migration of the 
guest later on? It was my understanding that it wasn't, and that libvirt 
prevented you from doing the migration of it knew about the caching setting.

If it isn't, is there anything else that could help performance? Like, some 
tuning of block size parameters for the rbd image or the qemu 
On May 10, 2013 8:57 PM, "Mark Nelson"  wrote:

On 05/10/2013 07:21 PM, Yun Mao wrote:


Hi Mark,



Given the same hardware, optimal configuration (I have no idea what that

means exactly but feel free to specify), which is supposed to perform

better, kernel rbd or qemu/kvm? Thanks,



Yun




Hi Yun,



I'm in the process of actually running some tests right now.



In previous testing, it looked like kernel rbd and qemu/kvm performed about the 
same with cache off.  With cache on (in cuttlefish), small sequential write 
performance improved pretty dramatically vs without cache.  Large write 
performance seemed to take more concurrency to reach peak performance, but 
ultimately aggregate throughput was about the same.




Hopefully I should have some new results published in the near future.



Mark








On Fri, May 10, 2013 at 6:56 PM, Mark Nelson mailto:mark.nel...@inktank.com>> wrote:



On 05/10/2013 12:16 PM, Greg wrote:



Hello folks,



I'm in the process of testing CEPH and RBD, I have set up a small

cluster of  hosts running each a MON and an OSD with both

journal and

data on the same SSD (ok this is stupid but this is simple to

verify the

disks are not the bottleneck for 1 client). All nodes are

connected on a

1Gb network (no dedicated network for OSDs, shame on me :).



Summary : the RBD performance is poor compared to benchmark



A 5 seconds seq read benchmark shows something like this :



sec Cur ops   started  finished  avg MB/s  cur MB/s

  last lat   avg

lat

  0   0 0 0 0 0 -

   0

  1  163923   91.958692

0.966117  0.431249

  2  166448   95.9602   100

0.513435   0.53849

  3  169074   98.6317   104

0.25631   0.55494

  4  119584   83.973540

1.80038   0.58712

  Total time run:4.165747

Total reads made: 95

Read size:4194304

Bandwidth (MB/sec):91.220



Average Latency:   0.678901

Max latency:   1.80038

Min latency:   0.104719





91MB read performance, quite good !



Now the RBD performance :



root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100

100+0 records in

100+0 records out

419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s





There is a 3x performance factor (same for write: ~60M

benchmark, ~20M

dd on block device)



The network is ok, the CPU is also ok on all OSDs.

CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some

patches for the SoC being used)



Can you show me the starting point for digging into this ?





Hi Greg, First things first, are you doing kernel rbd or qemu/kvm?

  If you are doing qemu/kvm, make sure you are using virtio disks.

  This can have a pretty big performance impact.  Next, are you

using RBD cache? With 0.56.4 there are some performance issues with

large sequential writes if cache is on, but it does provide benefit

for small sequential writes.  In general RBD cache behaviour has

improved with Cuttlefish.



Beyond that, are the pools being targeted by RBD and rados bench

setup the same way?  Same number of Pgs?  Same replication?







Thanks!

_

ceph-users mailing list

ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>

http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com

 

Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-11 Thread Michael Lowe
I believe that this is fixed in the most recent versions of libvirt, sheepdog 
and rbd were marked erroneously as unsafe.

http://libvirt.org/git/?p=libvirt.git;a=commit;h=78290b1641e95304c862062ee0aca95395c5926c

Sent from my iPad

On May 11, 2013, at 8:36 AM, Mike Kelly  wrote:

> (Sorry for sending this twice... Forgot to reply to the list)
> 
> Is rbd caching safe to enable when you may need to do a live migration of the 
> guest later on? It was my understanding that it wasn't, and that libvirt 
> prevented you from doing the migration of it knew about the caching setting.
> 
> If it isn't, is there anything else that could help performance? Like, some 
> tuning of block size parameters for the rbd image or the qemu
> 
> On May 10, 2013 8:57 PM, "Mark Nelson"  wrote:
>> On 05/10/2013 07:21 PM, Yun Mao wrote:
>>> Hi Mark,
>>> 
>>> Given the same hardware, optimal configuration (I have no idea what that
>>> means exactly but feel free to specify), which is supposed to perform
>>> better, kernel rbd or qemu/kvm? Thanks,
>>> 
>>> Yun
>> 
>> Hi Yun,
>> 
>> I'm in the process of actually running some tests right now.
>> 
>> In previous testing, it looked like kernel rbd and qemu/kvm performed about 
>> the same with cache off.  With cache on (in cuttlefish), small sequential 
>> write performance improved pretty dramatically vs without cache.  Large 
>> write performance seemed to take more concurrency to reach peak performance, 
>> but ultimately aggregate throughput was about the same.
>> 
>> Hopefully I should have some new results published in the near future.
>> 
>> Mark
>> 
>>> 
>>> 
>>> On Fri, May 10, 2013 at 6:56 PM, Mark Nelson >> > wrote:
>>> 
>>> On 05/10/2013 12:16 PM, Greg wrote:
>>> 
>>> Hello folks,
>>> 
>>> I'm in the process of testing CEPH and RBD, I have set up a small
>>> cluster of  hosts running each a MON and an OSD with both
>>> journal and
>>> data on the same SSD (ok this is stupid but this is simple to
>>> verify the
>>> disks are not the bottleneck for 1 client). All nodes are
>>> connected on a
>>> 1Gb network (no dedicated network for OSDs, shame on me :).
>>> 
>>> Summary : the RBD performance is poor compared to benchmark
>>> 
>>> A 5 seconds seq read benchmark shows something like this :
>>> 
>>> sec Cur ops   started  finished  avg MB/s  cur MB/s
>>>   last lat   avg
>>> lat
>>>   0   0 0 0 0 0 -
>>>0
>>>   1  163923   91.958692
>>> 0.966117  0.431249
>>>   2  166448   95.9602   100
>>> 0.513435   0.53849
>>>   3  169074   98.6317   104
>>> 0.25631   0.55494
>>>   4  119584   83.973540
>>> 1.80038   0.58712
>>>   Total time run:4.165747
>>> Total reads made: 95
>>> Read size:4194304
>>> Bandwidth (MB/sec):91.220
>>> 
>>> Average Latency:   0.678901
>>> Max latency:   1.80038
>>> Min latency:   0.104719
>>> 
>>> 
>>> 91MB read performance, quite good !
>>> 
>>> Now the RBD performance :
>>> 
>>> root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
>>> 100+0 records in
>>> 100+0 records out
>>> 419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s
>>> 
>>> 
>>> There is a 3x performance factor (same for write: ~60M
>>> benchmark, ~20M
>>> dd on block device)
>>> 
>>> The network is ok, the CPU is also ok on all OSDs.
>>> CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
>>> patches for the SoC being used)
>>> 
>>> Can you show me the starting point for digging into this ?
>>> 
>>> 
>>> Hi Greg, First things first, are you doing kernel rbd or qemu/kvm?
>>>   If you are doing qemu/kvm, make sure you are using virtio disks.
>>>   This can have a pretty big performance impact.  Next, are you
>>> using RBD cache? With 0.56.4 there are some performance issues with
>>> large sequential writes if cache is on, but it does provide benefit
>>> for small sequential writes.  In general RBD cache behaviour has
>>> improved with Cuttlefish.
>>> 
>>> Beyond that, are the pools being targeted by RBD and rados bench
>>> setup the same way?  Same number of Pgs?  Same replication?
>>> 
>>> 
>>> 
>>> Thanks!
>>> _
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com 
>>> http://lists.ceph.com/__listinfo.cgi/ceph-users-cep

Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-11 Thread Mike Kelly
(Sorry for sending this twice... Forgot to reply to the list)

Is rbd caching safe to enable when you may need to do a live migration of
the guest later on? It was my understanding that it wasn't, and that
libvirt prevented you from doing the migration of it knew about the caching
setting.

If it isn't, is there anything else that could help performance? Like, some
tuning of block size parameters for the rbd image or the qemu
On May 10, 2013 8:57 PM, "Mark Nelson"  wrote:

> On 05/10/2013 07:21 PM, Yun Mao wrote:
>
>> Hi Mark,
>>
>> Given the same hardware, optimal configuration (I have no idea what that
>> means exactly but feel free to specify), which is supposed to perform
>> better, kernel rbd or qemu/kvm? Thanks,
>>
>> Yun
>>
>
> Hi Yun,
>
> I'm in the process of actually running some tests right now.
>
> In previous testing, it looked like kernel rbd and qemu/kvm performed
> about the same with cache off.  With cache on (in cuttlefish), small
> sequential write performance improved pretty dramatically vs without cache.
>  Large write performance seemed to take more concurrency to reach peak
> performance, but ultimately aggregate throughput was about the same.
>
> Hopefully I should have some new results published in the near future.
>
> Mark
>
>
>>
>> On Fri, May 10, 2013 at 6:56 PM, Mark Nelson > > wrote:
>>
>> On 05/10/2013 12:16 PM, Greg wrote:
>>
>> Hello folks,
>>
>> I'm in the process of testing CEPH and RBD, I have set up a small
>> cluster of  hosts running each a MON and an OSD with both
>> journal and
>> data on the same SSD (ok this is stupid but this is simple to
>> verify the
>> disks are not the bottleneck for 1 client). All nodes are
>> connected on a
>> 1Gb network (no dedicated network for OSDs, shame on me :).
>>
>> Summary : the RBD performance is poor compared to benchmark
>>
>> A 5 seconds seq read benchmark shows something like this :
>>
>> sec Cur ops   started  finished  avg MB/s  cur MB/s
>>   last lat   avg
>> lat
>>   0   0 0 0 0 0 -
>>0
>>   1  163923   91.958692
>> 0.966117  0.431249
>>   2  166448   95.9602   100
>> 0.513435   0.53849
>>   3  169074   98.6317   104
>> 0.25631   0.55494
>>   4  119584   83.973540
>> 1.80038   0.58712
>>   Total time run:4.165747
>> Total reads made: 95
>> Read size:4194304
>> Bandwidth (MB/sec):91.220
>>
>> Average Latency:   0.678901
>> Max latency:   1.80038
>> Min latency:   0.104719
>>
>>
>> 91MB read performance, quite good !
>>
>> Now the RBD performance :
>>
>> root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
>> 100+0 records in
>> 100+0 records out
>> 419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s
>>
>>
>> There is a 3x performance factor (same for write: ~60M
>> benchmark, ~20M
>> dd on block device)
>>
>> The network is ok, the CPU is also ok on all OSDs.
>> CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
>> patches for the SoC being used)
>>
>> Can you show me the starting point for digging into this ?
>>
>>
>> Hi Greg, First things first, are you doing kernel rbd or qemu/kvm?
>>   If you are doing qemu/kvm, make sure you are using virtio disks.
>>   This can have a pretty big performance impact.  Next, are you
>> using RBD cache? With 0.56.4 there are some performance issues with
>> large sequential writes if cache is on, but it does provide benefit
>> for small sequential writes.  In general RBD cache behaviour has
>> improved with Cuttlefish.
>>
>> Beyond that, are the pools being targeted by RBD and rados bench
>> setup the same way?  Same number of Pgs?  Same replication?
>>
>>
>>
>> Thanks!
>> __**___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> > >
>> 
>> http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_com
>> 
>> 
>> >
>>
>>
>> __**___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> > >
>> 
>> http://lists.ceph.com/__**listi

Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-11 Thread Greg

Le 11/05/2013 13:24, Greg a écrit :

Le 11/05/2013 02:52, Mark Nelson a écrit :

On 05/10/2013 07:20 PM, Greg wrote:

Le 11/05/2013 00:56, Mark Nelson a écrit :

On 05/10/2013 12:16 PM, Greg wrote:

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to 
verify the
disks are not the bottleneck for 1 client). All nodes are 
connected on a

1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :
   sec Cur ops   started  finished avg MB/s  cur MB/s  last lat   
avg

lat
 0   0 0 0 0 0 - 0
 1  163923   91.958692 0.966117
0.431249
 2  166448   95.9602   100 0.513435 
0.53849
 3  169074   98.6317   104 0.25631 
0.55494
 4  119584   83.973540 1.80038 
0.58712

 Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719


91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s


There is a 3x performance factor (same for write: ~60M benchmark, 
~20M

dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?


Hi Greg, First things first, are you doing kernel rbd or qemu/kvm?  If
you are doing qemu/kvm, make sure you are using virtio disks. This
can have a pretty big performance impact. Next, are you using RBD
cache? With 0.56.4 there are some performance issues with large
sequential writes if cache is on, but it does provide benefit for
small sequential writes.  In general RBD cache behaviour has improved
with Cuttlefish.

Beyond that, are the pools being targeted by RBD and rados bench setup
the same way?  Same number of Pgs?  Same replication?

Mark, thanks for your prompt reply.

I'm doing kernel RBD and so, I have not enabled the cache (default
setting?)
Sorry, I forgot to mention the pool used for bench and RBD is the same.


Interesting.  Does your rados bench performance change if you run a 
longer test?  So far I've been seeing about a 20-30% performance 
overhead for kernel RBD, but 3x is excessive!  It might be worth 
watching the underlying IO sizes to the OSDs in each case with 
something like "collectl -sD -oT" to see if there's any significant 
differences.

Mark,

I'll gather you some more data with collectl, meanwhile I realized a 
difference : the benchmark performs 16 concurrent reads while RBD only 
does 1. Shouldn't be a problem but still these are 2 different usage 
patterns.


Ok, I run the benchmark with only 1 concurrent thread and here is the 
result :

Total time run:5.118677
Total reads made: 56
Read size:4194304
Bandwidth (MB/sec):43.761

Average Latency:   0.09
Max latency:   0.096591
Min latency:   0.076976


So we have 36% more performance in the benchmark which correlates with 
your numbers.
Now the question is : why so much difference between 1 and 16 concurrent 
workloads ? I guess I know the answer : because of latency.

So the next question is : how can I optimize latency ? :)

Cheers,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-11 Thread Greg

Le 11/05/2013 02:52, Mark Nelson a écrit :

On 05/10/2013 07:20 PM, Greg wrote:

Le 11/05/2013 00:56, Mark Nelson a écrit :

On 05/10/2013 12:16 PM, Greg wrote:

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to 
verify the
disks are not the bottleneck for 1 client). All nodes are connected 
on a

1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

   sec Cur ops   started  finished avg MB/s  cur MB/s  last lat   avg
lat
 0   0 0 0 0 0 - 0
 1  163923   91.958692 0.966117
0.431249
 2  166448   95.9602   100 0.513435 
0.53849
 3  169074   98.6317   104 0.25631 
0.55494
 4  119584   83.973540 1.80038 
0.58712

 Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719


91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s


There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?


Hi Greg, First things first, are you doing kernel rbd or qemu/kvm?  If
you are doing qemu/kvm, make sure you are using virtio disks. This
can have a pretty big performance impact. Next, are you using RBD
cache? With 0.56.4 there are some performance issues with large
sequential writes if cache is on, but it does provide benefit for
small sequential writes.  In general RBD cache behaviour has improved
with Cuttlefish.

Beyond that, are the pools being targeted by RBD and rados bench setup
the same way?  Same number of Pgs?  Same replication?

Mark, thanks for your prompt reply.

I'm doing kernel RBD and so, I have not enabled the cache (default
setting?)
Sorry, I forgot to mention the pool used for bench and RBD is the same.


Interesting.  Does your rados bench performance change if you run a 
longer test?  So far I've been seeing about a 20-30% performance 
overhead for kernel RBD, but 3x is excessive!  It might be worth 
watching the underlying IO sizes to the OSDs in each case with 
something like "collectl -sD -oT" to see if there's any significant 
differences.

Mark,

I'll gather you some more data with collectl, meanwhile I realized a 
difference : the benchmark performs 16 concurrent reads while RBD only 
does 1. Shouldn't be a problem but still these are 2 different usage 
patterns.


Cheers,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-10 Thread Mark Nelson

On 05/10/2013 07:21 PM, Yun Mao wrote:

Hi Mark,

Given the same hardware, optimal configuration (I have no idea what that
means exactly but feel free to specify), which is supposed to perform
better, kernel rbd or qemu/kvm? Thanks,

Yun


Hi Yun,

I'm in the process of actually running some tests right now.

In previous testing, it looked like kernel rbd and qemu/kvm performed 
about the same with cache off.  With cache on (in cuttlefish), small 
sequential write performance improved pretty dramatically vs without 
cache.  Large write performance seemed to take more concurrency to reach 
peak performance, but ultimately aggregate throughput was about the same.


Hopefully I should have some new results published in the near future.

Mark




On Fri, May 10, 2013 at 6:56 PM, Mark Nelson mailto:mark.nel...@inktank.com>> wrote:

On 05/10/2013 12:16 PM, Greg wrote:

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both
journal and
data on the same SSD (ok this is stupid but this is simple to
verify the
disks are not the bottleneck for 1 client). All nodes are
connected on a
1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

sec Cur ops   started  finished  avg MB/s  cur MB/s
  last lat   avg
lat
  0   0 0 0 0 0 -
   0
  1  163923   91.958692
0.966117  0.431249
  2  166448   95.9602   100
0.513435   0.53849
  3  169074   98.6317   104
0.25631   0.55494
  4  119584   83.973540
1.80038   0.58712
  Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719


91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s


There is a 3x performance factor (same for write: ~60M
benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?


Hi Greg, First things first, are you doing kernel rbd or qemu/kvm?
  If you are doing qemu/kvm, make sure you are using virtio disks.
  This can have a pretty big performance impact.  Next, are you
using RBD cache? With 0.56.4 there are some performance issues with
large sequential writes if cache is on, but it does provide benefit
for small sequential writes.  In general RBD cache behaviour has
improved with Cuttlefish.

Beyond that, are the pools being targeted by RBD and rados bench
setup the same way?  Same number of Pgs?  Same replication?



Thanks!
_
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com



_
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-10 Thread Mark Nelson

On 05/10/2013 07:20 PM, Greg wrote:

Le 11/05/2013 00:56, Mark Nelson a écrit :

On 05/10/2013 12:16 PM, Greg wrote:

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to verify the
disks are not the bottleneck for 1 client). All nodes are connected on a
1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
lat
 0   0 0 0 0 0 - 0
 1  163923   91.958692 0.966117
0.431249
 2  166448   95.9602   100 0.513435 0.53849
 3  169074   98.6317   104 0.25631 0.55494
 4  119584   83.973540 1.80038 0.58712
 Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719


91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s


There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?


Hi Greg, First things first, are you doing kernel rbd or qemu/kvm?  If
you are doing qemu/kvm, make sure you are using virtio disks.  This
can have a pretty big performance impact. Next, are you using RBD
cache? With 0.56.4 there are some performance issues with large
sequential writes if cache is on, but it does provide benefit for
small sequential writes.  In general RBD cache behaviour has improved
with Cuttlefish.

Beyond that, are the pools being targeted by RBD and rados bench setup
the same way?  Same number of Pgs?  Same replication?

Mark, thanks for your prompt reply.

I'm doing kernel RBD and so, I have not enabled the cache (default
setting?)
Sorry, I forgot to mention the pool used for bench and RBD is the same.


Interesting.  Does your rados bench performance change if you run a 
longer test?  So far I've been seeing about a 20-30% performance 
overhead for kernel RBD, but 3x is excessive!  It might be worth 
watching the underlying IO sizes to the OSDs in each case with something 
like "collectl -sD -oT" to see if there's any significant differences.




Regards,


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-10 Thread Yun Mao
Hi Mark,

Given the same hardware, optimal configuration (I have no idea what that
means exactly but feel free to specify), which is supposed to perform
better, kernel rbd or qemu/kvm? Thanks,

Yun


On Fri, May 10, 2013 at 6:56 PM, Mark Nelson wrote:

> On 05/10/2013 12:16 PM, Greg wrote:
>
>> Hello folks,
>>
>> I'm in the process of testing CEPH and RBD, I have set up a small
>> cluster of  hosts running each a MON and an OSD with both journal and
>> data on the same SSD (ok this is stupid but this is simple to verify the
>> disks are not the bottleneck for 1 client). All nodes are connected on a
>> 1Gb network (no dedicated network for OSDs, shame on me :).
>>
>> Summary : the RBD performance is poor compared to benchmark
>>
>> A 5 seconds seq read benchmark shows something like this :
>>
>>>sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
>>> lat
>>>  0   0 0 0 0 0 - 0
>>>  1  163923   91.958692 0.966117  0.431249
>>>  2  166448   95.9602   100 0.513435   0.53849
>>>  3  169074   98.6317   104 0.25631   0.55494
>>>  4  119584   83.973540 1.80038   0.58712
>>>  Total time run:4.165747
>>> Total reads made: 95
>>> Read size:4194304
>>> Bandwidth (MB/sec):91.220
>>>
>>> Average Latency:   0.678901
>>> Max latency:   1.80038
>>> Min latency:   0.104719
>>>
>>
>> 91MB read performance, quite good !
>>
>> Now the RBD performance :
>>
>>> root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
>>> 100+0 records in
>>> 100+0 records out
>>> 419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s
>>>
>>
>> There is a 3x performance factor (same for write: ~60M benchmark, ~20M
>> dd on block device)
>>
>> The network is ok, the CPU is also ok on all OSDs.
>> CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
>> patches for the SoC being used)
>>
>> Can you show me the starting point for digging into this ?
>>
>
> Hi Greg, First things first, are you doing kernel rbd or qemu/kvm?  If you
> are doing qemu/kvm, make sure you are using virtio disks.  This can have a
> pretty big performance impact.  Next, are you using RBD cache? With 0.56.4
> there are some performance issues with large sequential writes if cache is
> on, but it does provide benefit for small sequential writes.  In general
> RBD cache behaviour has improved with Cuttlefish.
>
> Beyond that, are the pools being targeted by RBD and rados bench setup the
> same way?  Same number of Pgs?  Same replication?
>
>
>
>> Thanks!
>> __**_
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com
>>
>
> __**_
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-10 Thread Greg

Le 11/05/2013 00:56, Mark Nelson a écrit :

On 05/10/2013 12:16 PM, Greg wrote:

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to verify the
disks are not the bottleneck for 1 client). All nodes are connected on a
1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
lat
 0   0 0 0 0 0 - 0
 1  163923   91.958692 0.966117  
0.431249
 2  166448   95.9602   100 0.513435   
0.53849
 3  169074   98.6317   104 0.25631   
0.55494
 4  119584   83.973540 1.80038   
0.58712

 Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719


91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s


There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?


Hi Greg, First things first, are you doing kernel rbd or qemu/kvm?  If 
you are doing qemu/kvm, make sure you are using virtio disks.  This 
can have a pretty big performance impact. Next, are you using RBD 
cache? With 0.56.4 there are some performance issues with large 
sequential writes if cache is on, but it does provide benefit for 
small sequential writes.  In general RBD cache behaviour has improved 
with Cuttlefish.


Beyond that, are the pools being targeted by RBD and rados bench setup 
the same way?  Same number of Pgs?  Same replication?

Mark, thanks for your prompt reply.

I'm doing kernel RBD and so, I have not enabled the cache (default setting?)
Sorry, I forgot to mention the pool used for bench and RBD is the same.

Regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD vs RADOS benchmark performance

2013-05-10 Thread Mark Nelson

On 05/10/2013 12:16 PM, Greg wrote:

Hello folks,

I'm in the process of testing CEPH and RBD, I have set up a small
cluster of  hosts running each a MON and an OSD with both journal and
data on the same SSD (ok this is stupid but this is simple to verify the
disks are not the bottleneck for 1 client). All nodes are connected on a
1Gb network (no dedicated network for OSDs, shame on me :).

Summary : the RBD performance is poor compared to benchmark

A 5 seconds seq read benchmark shows something like this :

   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
lat
 0   0 0 0 0 0 - 0
 1  163923   91.958692 0.966117  0.431249
 2  166448   95.9602   100 0.513435   0.53849
 3  169074   98.6317   104 0.25631   0.55494
 4  119584   83.973540 1.80038   0.58712
 Total time run:4.165747
Total reads made: 95
Read size:4194304
Bandwidth (MB/sec):91.220

Average Latency:   0.678901
Max latency:   1.80038
Min latency:   0.104719


91MB read performance, quite good !

Now the RBD performance :

root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s


There is a 3x performance factor (same for write: ~60M benchmark, ~20M
dd on block device)

The network is ok, the CPU is also ok on all OSDs.
CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
patches for the SoC being used)

Can you show me the starting point for digging into this ?


Hi Greg, First things first, are you doing kernel rbd or qemu/kvm?  If 
you are doing qemu/kvm, make sure you are using virtio disks.  This can 
have a pretty big performance impact.  Next, are you using RBD cache? 
With 0.56.4 there are some performance issues with large sequential 
writes if cache is on, but it does provide benefit for small sequential 
writes.  In general RBD cache behaviour has improved with Cuttlefish.


Beyond that, are the pools being targeted by RBD and rados bench setup 
the same way?  Same number of Pgs?  Same replication?




Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com