On Mon, Aug 13, 2018 at 9:32 AM Emmanuel Lacour <elac...@easter-eggs.com> wrote:
> Le 13/08/2018 à 15:21, Jason Dillaman a écrit : > > Is this a clean (new) cluster and RBD image you are using for your > > test or has it been burned in? When possible (i.e. it has enough free > > space), bluestore will essentially turn your random RBD image writes > > into sequential writes. This optimization doesn't work for random > > reads unless your read patterns matches your original random write > > pattern. > > Cluster is a new one but already hosts some VM images, not yet used on > production, but already has data and had writes/reads. > > > > > Note that with the default "stupid" allocator, this optimization will > > at some point hit a massive performance cliff because the allocator > > will aggressively try to re-use free slots that best match the IO > > size, even if that means it will require massive seeking around the > > disk. Hopefully the "bitmap" allocator will address this issue once it > > becomes the stable default in a future release of Ceph. > > Well, but not so worst that I see here: > > New cluster > ======= > > > file1: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, > iodepth=16 > fio-2.16 > Starting 1 process > file1: Laying out IO file(s) (1 file(s) / 2048MB) > Jobs: 1 (f=1): [r(1)] [100.0% done] [876KB/0KB/0KB /s] [219/0/0 iops] > [eta 00m:00s] > file1: (groupid=0, jobs=1): err= 0: pid=3289045: Mon Aug 13 14:58:22 2018 > read : io=16072KB, bw=822516B/s, iops=200, runt= 20009msec > > An old cluster with less disks and older hardware, running ceph hammer > ============================================ > > file1: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, > iodepth=16 > fio-2.16 > Starting 1 process > file1: Laying out IO file(s) (1 file(s) / 2048MB) > Jobs: 1 (f=0): [f(1)] [100.0% done] [6350KB/0KB/0KB /s] [1587/0/0 iops] > [eta 00m:00s] > file1: (groupid=0, jobs=1): err= 0: pid=15596: Mon Aug 13 14:59:22 2018 > read : io=112540KB, bw=5626.8KB/s, iops=1406, runt= 20001msec > > > > So around 7 times less iops ::( > > When using rados bench, new cluster has better results: > > New: > > Total time run: 10.080886 > Total reads made: 3724 > Read size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 1477.65 > Average IOPS: 369 > Stddev IOPS: 59 > Max IOPS: 451 > Min IOPS: 279 > Average Latency(s): 0.0427141 > Max latency(s): 0.320013 > Min latency(s): 0.00142682 > > > Old: > > Total time run: 10.276202 > Total reads made: 724 > Read size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 281.816 > Average IOPS: 70 > Stddev IOPS: 5 > Max IOPS: 76 > Min IOPS: 59 > Average Latency(s): 0.226087 > Max latency(s): 0.981571 > Min latency(s): 0.00343391 > > > so problem seems located on "rbd" side ... > That's a pretty big apples-to-oranges comparison (4KiB random IO to 4MiB full-object IO). With your RBD workload, the OSDs will be seeking after each 4KiB read but w/ your RADOS bench workload, it's reading a full 4MiB object before seeking. > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com