Re: [ceph-users] large reads become 512 kbyte reads on qemu-kvm rbd

Dan Van Der Ster Fri, 28 Nov 2014 06:56:30 -0800

Hi Andrei,
Yes, I’m testing from within the guest.

Here is an example. First, I do 2MB reads when the max_sectors_kb=512, and we 
see the reads are split into 4. (fio sees 25 iops, though iostat reports 100 
smaller iops):


# echo 512 >  /sys/block/vdb/queue/max_sectors_kb  # this is the default
# fio --readonly --name /dev/vdb --rw=read --size=1G  --ioengine=libaio 
--direct=1 --runtime=10s --blocksize=2m
/dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
fio-2.0.13
Starting 1 process
Jobs: 1 (f=1): [R] [100.0% done] [51200K/0K/0K /s] [25 /0 /0  iops] [eta 
00m:00s]

meanwhile iostat is reporting 100 iops of average size 1024 sectors (i.e. 
512kB):

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
vdb               0.00     0.00  100.00    0.00    50.00     0.00  1024.00     
3.02   30.25  10.00 100.00



Now increase the max_sectors_kb to 4MB, and the IOs are no longer split:

# echo 4096 >  /sys/block/vdb/queue/max_sectors_kb
# fio --readonly --name /dev/vdb --rw=read --size=1G  --ioengine=libaio 
--direct=1 --runtime=10s --blocksize=2m
/dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
fio-2.0.13
Starting 1 process
Jobs: 1 (f=1): [R] [100.0% done] [200.0M/0K/0K /s] [100 /0 /0  iops] [eta 
00m:00s]

iostat reports 100 iops, 4096 sectors each read (i.e. 2MB):

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
vdb             300.00     0.00  100.00    0.00   200.00     0.00  4096.00     
0.99    9.94   9.94  99.40

Cheers, Dan


On 28 Nov 2014, at 15:28, Andrei Mikhailovsky 
<and...@arhont.com<mailto:and...@arhont.com>> wrote:

Dan, are you setting this on the guest vm side? Did you run some tests to see 
if this impacts performance? Like small block size performance, etc?

Cheers



________________________________
From: "Dan Van Der Ster" 
<daniel.vanders...@cern.ch<mailto:daniel.vanders...@cern.ch>>
To: "ceph-users" <ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>>
Sent: Friday, 28 November, 2014 1:33:20 PM
Subject: Re: [ceph-users] large reads become 512 kbyte reads on qemu-kvm rbd

Hi,
After some more tests we’ve found that max_sectors_kb is the reason for 
splitting large IOs.
We increased it to 4MB:
   echo 4096 > cat /sys/block/vdb/queue/max_sectors_kb
and now fio/iostat are showing reads up to 4MB are getting through to the block 
device unsplit.

We use 4MB to match the size of the underlying RBD objects. I can’t think of a 
reason to split IOs smaller than the RBD objects -- with a small max_sectors_kb 
the client would use 8 IOs to read a single object.

Does anyone know of a reason that max_sectors_kb should not be set to the RBD 
object size? Is there any udev rule or similar that could set max_sectors_kb 
when a RBD device is attached?

Cheers, Dan


On 27 Nov 2014, at 20:29, Dan Van Der Ster 
<daniel.vanders...@cern.ch<mailto:daniel.vanders...@cern.ch>> wrote:

Oops, I was off by a factor of 1000 in my original subject. We actually have 4M 
and 8M reads being split into 100 512kB reads per second. So perhaps these are 
limiting:
# cat /sys/block/vdb/queue/max_sectors_kb
512
# cat /sys/block/vdb/queue/read_ahead_kb
512
Questions below remain.
Cheers, Dan
On 27 Nov 2014 18:26, Dan Van Der Ster 
<daniel.vanders...@cern.ch<mailto:daniel.vanders...@cern.ch>> wrote:
Hi all,
We throttle (with qemu-kvm) rbd devices to 100 w/s and 100 r/s (and 80MB/s 
write and read).
With fio we cannot exceed 51.2MB/s sequential or random reads, no matter the 
reading block size. (But with large writes we can achieve 80MB/s).

I just realised that the VM subsytem is probably splitting large reads into 512 
byte reads, following at least one of:

# cat /sys/block/vdb/queue/hw_sector_size
512
# cat /sys/block/vdb/queue/minimum_io_size
512
# cat /sys/block/vdb/queue/optimal_io_size
0

vdb is an RBD device coming over librbd, with rbd cache=true and mounted like 
this:

  /dev/vdb on /vicepa type xfs (rw)

Did anyone observe this before?

Is there a kernel setting to stop splitting reads like that? or a way to change 
the io_sizes reported by RBD to the kernel).

(I found a similar thread on the lvm mailing list, but lvm shouldn’t be 
involved here.)

All components here are running latest dumpling. Client VM is running CentOS 
6.6.

Cheers, Dan
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] large reads become 512 kbyte reads on qemu-kvm rbd

Reply via email to