Hi,

>> We tried adding more RBDs to single VM, but no luck.

If you want to scale with more disks in a single qemu vm, you need to use 
iothread feature from qemu and assign 1 iothread by disk (works with 
virtio-blk).
It's working for me, I can scale with adding more disks.


My bench here are done with fio-rbd on host.
I can scale up to 400k iops with 10clients-rbd_cache=off on a single host and 
around 250kiops 10clients-rbdcache=on.


I just wonder why I don't have performance decrease around 30k iops with 1osd.

I'm going to see if this tracker
http://tracker.ceph.com/issues/11056

could be the cause.

(My master build was done some week ago)



----- Mail original -----
De: "pushpesh sharma" <pushpesh....@gmail.com>
À: "aderumier" <aderum...@odiso.com>
Cc: "ceph-devel" <ceph-de...@vger.kernel.org>, "ceph-users" 
<ceph-users@lists.ceph.com>
Envoyé: Mardi 9 Juin 2015 09:21:04
Objet: Re: rbd_cache, limiting read on high iops around 40k

Hi Alexandre, 

We have also seen something very similar on Hammer(0.94-1). We were doing some 
benchmarking for VMs hosted on hypervisor (QEMU-KVM, openstack-juno). Each 
Ubuntu-VM has a RBD as root disk, and 1 RBD as additional storage. For some 
strange reason it was not able to scale 4K- RR iops on each VM beyond 35-40k. 
We tried adding more RBDs to single VM, but no luck. However increasing number 
of VMs to 4 on a single hypervisor did scale to some extent. After this there 
was no much benefit we got from adding more VMs. 

Here is the trend we have seen, x-axis is number of hypervisor, each hypervisor 
has 4 VM, each VM has 1 RBD:- 



 
VDbench is used as benchmarking tool. We were not saturating network and CPUs 
at OSD nodes. We were not able to saturate CPUs at hypervisors, and that is 
where we were suspecting of some throttling effect. However we haven't setted 
any such limits from nova or kvm end. We tried some CPU pinning and other KVM 
related tuning as well, but no luck. 

We tried the same experiment on a bare metal. It was 4K RR IOPs were scaling 
from 40K(1 RBD) to 180K(4 RBDs). But after that rather than scaling beyond that 
point the numbers were actually degrading. (Single pipe more congestion effect) 

We never suspected that rbd cache enable could be detrimental to performance. 
It would nice to route cause the problem if that is the case. 

On Tue, Jun 9, 2015 at 11:21 AM, Alexandre DERUMIER < aderum...@odiso.com > 
wrote: 


Hi, 

I'm doing benchmark (ceph master branch), with randread 4k qdepth=32, 
and rbd_cache=true seem to limit the iops around 40k 


no cache 
-------- 
1 client - rbd_cache=false - 1osd : 38300 iops 
1 client - rbd_cache=false - 2osd : 69073 iops 
1 client - rbd_cache=false - 3osd : 78292 iops 


cache 
----- 
1 client - rbd_cache=true - 1osd : 38100 iops 
1 client - rbd_cache=true - 2osd : 42457 iops 
1 client - rbd_cache=true - 3osd : 45823 iops 



Is it expected ? 



fio result rbd_cache=false 3 osd 
-------------------------------- 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
iodepth=32 
fio-2.1.11 
Starting 1 process 
rbd engine: RBD version: 0.1.9 
Jobs: 1 (f=1): [r(1)] [100.0% done] [307.5MB/0KB/0KB /s] [78.8K/0/0 iops] [eta 
00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113548: Tue Jun 9 07:48:42 
2015 
read : io=10000MB, bw=313169KB/s, iops=78292, runt= 32698msec 
slat (usec): min=5, max=530, avg=11.77, stdev= 6.77 
clat (usec): min=70, max=2240, avg=336.08, stdev=94.82 
lat (usec): min=101, max=2247, avg=347.84, stdev=95.49 
clat percentiles (usec): 
| 1.00th=[ 173], 5.00th=[ 209], 10.00th=[ 231], 20.00th=[ 262], 
| 30.00th=[ 282], 40.00th=[ 302], 50.00th=[ 322], 60.00th=[ 346], 
| 70.00th=[ 370], 80.00th=[ 402], 90.00th=[ 454], 95.00th=[ 506], 
| 99.00th=[ 628], 99.50th=[ 692], 99.90th=[ 860], 99.95th=[ 948], 
| 99.99th=[ 1176] 
bw (KB /s): min=238856, max=360448, per=100.00%, avg=313402.34, stdev=25196.21 
lat (usec) : 100=0.01%, 250=15.94%, 500=78.60%, 750=5.19%, 1000=0.23% 
lat (msec) : 2=0.03%, 4=0.01% 
cpu : usr=74.48%, sys=13.25%, ctx=703225, majf=0, minf=12452 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=87.0%, 32=12.1%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=91.6%, 8=3.4%, 16=4.5%, 32=0.4%, 64=0.0%, >=64=0.0% 
issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=10000MB, aggrb=313169KB/s, minb=313169KB/s, maxb=313169KB/s, 
mint=32698msec, maxt=32698msec 

Disk stats (read/write): 
dm-0: ios=0/45, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/24, 
aggrmerge=0/21, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% 
sda: ios=0/24, merge=0/21, ticks=0/0, in_queue=0, util=0.00% 




fio result rbd_cache=true 3osd 
------------------------------ 

rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
iodepth=32 
fio-2.1.11 
Starting 1 process 
rbd engine: RBD version: 0.1.9 
Jobs: 1 (f=1): [r(1)] [100.0% done] [171.6MB/0KB/0KB /s] [43.1K/0/0 iops] [eta 
00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113389: Tue Jun 9 07:47:30 
2015 
read : io=10000MB, bw=183296KB/s, iops=45823, runt= 55866msec 
slat (usec): min=7, max=805, avg=21.26, stdev=15.84 
clat (usec): min=101, max=4602, avg=478.55, stdev=143.73 
lat (usec): min=123, max=4669, avg=499.80, stdev=146.03 
clat percentiles (usec): 
| 1.00th=[ 227], 5.00th=[ 274], 10.00th=[ 306], 20.00th=[ 350], 
| 30.00th=[ 390], 40.00th=[ 430], 50.00th=[ 470], 60.00th=[ 506], 
| 70.00th=[ 548], 80.00th=[ 596], 90.00th=[ 660], 95.00th=[ 724], 
| 99.00th=[ 844], 99.50th=[ 908], 99.90th=[ 1112], 99.95th=[ 1288], 
| 99.99th=[ 2192] 
bw (KB /s): min=115280, max=204416, per=100.00%, avg=183315.10, stdev=15079.93 
lat (usec) : 250=2.42%, 500=55.61%, 750=38.48%, 1000=3.28% 
lat (msec) : 2=0.19%, 4=0.01%, 10=0.01% 
cpu : usr=60.27%, sys=12.01%, ctx=2995393, majf=0, minf=14100 
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=13.5%, 16=81.0%, 32=5.3%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=95.0%, 8=0.1%, 16=1.0%, 32=4.0%, 64=0.0%, >=64=0.0% 
issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=10000MB, aggrb=183295KB/s, minb=183295KB/s, maxb=183295KB/s, 
mint=55866msec, maxt=55866msec 

Disk stats (read/write): 
dm-0: ios=0/61, merge=0/0, ticks=0/8, in_queue=8, util=0.01%, aggrios=0/29, 
aggrmerge=0/32, aggrticks=0/8, aggrin_queue=8, aggrutil=0.01% 
sda: ios=0/29, merge=0/32, ticks=0/8, in_queue=8, util=0.01% 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majord...@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 






-- 
-Pushpesh 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to