Hi Andrei, Yes, I’m testing from within the guest. Here is an example. First, I do 2MB reads when the max_sectors_kb=512, and we see the reads are split into 4. (fio sees 25 iops, though iostat reports 100 smaller iops):
# echo 512 > /sys/block/vdb/queue/max_sectors_kb # this is the default # fio --readonly --name /dev/vdb --rw=read --size=1G --ioengine=libaio --direct=1 --runtime=10s --blocksize=2m /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1 fio-2.0.13 Starting 1 process Jobs: 1 (f=1): [R] [100.0% done] [51200K/0K/0K /s] [25 /0 /0 iops] [eta 00m:00s] meanwhile iostat is reporting 100 iops of average size 1024 sectors (i.e. 512kB): Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util vdb 0.00 0.00 100.00 0.00 50.00 0.00 1024.00 3.02 30.25 10.00 100.00 Now increase the max_sectors_kb to 4MB, and the IOs are no longer split: # echo 4096 > /sys/block/vdb/queue/max_sectors_kb # fio --readonly --name /dev/vdb --rw=read --size=1G --ioengine=libaio --direct=1 --runtime=10s --blocksize=2m /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1 fio-2.0.13 Starting 1 process Jobs: 1 (f=1): [R] [100.0% done] [200.0M/0K/0K /s] [100 /0 /0 iops] [eta 00m:00s] iostat reports 100 iops, 4096 sectors each read (i.e. 2MB): Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util vdb 300.00 0.00 100.00 0.00 200.00 0.00 4096.00 0.99 9.94 9.94 99.40 Cheers, Dan On 28 Nov 2014, at 15:28, Andrei Mikhailovsky <and...@arhont.com<mailto:and...@arhont.com>> wrote: Dan, are you setting this on the guest vm side? Did you run some tests to see if this impacts performance? Like small block size performance, etc? Cheers ________________________________ From: "Dan Van Der Ster" <daniel.vanders...@cern.ch<mailto:daniel.vanders...@cern.ch>> To: "ceph-users" <ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>> Sent: Friday, 28 November, 2014 1:33:20 PM Subject: Re: [ceph-users] large reads become 512 kbyte reads on qemu-kvm rbd Hi, After some more tests we’ve found that max_sectors_kb is the reason for splitting large IOs. We increased it to 4MB: echo 4096 > cat /sys/block/vdb/queue/max_sectors_kb and now fio/iostat are showing reads up to 4MB are getting through to the block device unsplit. We use 4MB to match the size of the underlying RBD objects. I can’t think of a reason to split IOs smaller than the RBD objects -- with a small max_sectors_kb the client would use 8 IOs to read a single object. Does anyone know of a reason that max_sectors_kb should not be set to the RBD object size? Is there any udev rule or similar that could set max_sectors_kb when a RBD device is attached? Cheers, Dan On 27 Nov 2014, at 20:29, Dan Van Der Ster <daniel.vanders...@cern.ch<mailto:daniel.vanders...@cern.ch>> wrote: Oops, I was off by a factor of 1000 in my original subject. We actually have 4M and 8M reads being split into 100 512kB reads per second. So perhaps these are limiting: # cat /sys/block/vdb/queue/max_sectors_kb 512 # cat /sys/block/vdb/queue/read_ahead_kb 512 Questions below remain. Cheers, Dan On 27 Nov 2014 18:26, Dan Van Der Ster <daniel.vanders...@cern.ch<mailto:daniel.vanders...@cern.ch>> wrote: Hi all, We throttle (with qemu-kvm) rbd devices to 100 w/s and 100 r/s (and 80MB/s write and read). With fio we cannot exceed 51.2MB/s sequential or random reads, no matter the reading block size. (But with large writes we can achieve 80MB/s). I just realised that the VM subsytem is probably splitting large reads into 512 byte reads, following at least one of: # cat /sys/block/vdb/queue/hw_sector_size 512 # cat /sys/block/vdb/queue/minimum_io_size 512 # cat /sys/block/vdb/queue/optimal_io_size 0 vdb is an RBD device coming over librbd, with rbd cache=true and mounted like this: /dev/vdb on /vicepa type xfs (rw) Did anyone observe this before? Is there a kernel setting to stop splitting reads like that? or a way to change the io_sizes reported by RBD to the kernel). (I found a similar thread on the lvm mailing list, but lvm shouldn’t be involved here.) All components here are running latest dumpling. Client VM is running CentOS 6.6. Cheers, Dan _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com