Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5
Nick, all, fantastic, that did it! I installed kernel 4.5.2 on the client, now the single threaded read performance using krbd mount is up to about 370 MB/s with standard 256 readahead size, and touching 400 MB/s with larger readahead sizes. All single threaded. Multi-threaded krbd read on the same mount also improved, a 10 GBit/s network connection is easily saturated. Thank you all so much for the discussion and the hints. Regards, Mike On 4/23/16 9:51 AM, n...@fisk.me.uk wrote: I've just looked through github for the Linux kernel and it looks like that read ahead fix was introduced in 4.4, so I'm not sure if it's worth trying a slightly newer kernel? Sent from Nine <http://xo4t.mj.am/link/xo4t/x07j8u1663zy/1/QWFy6wfcMj4vvr1tAIpuQA/aHR0cDovL3d3dy45Zm9sZGVycy5jb20v> *From:* Mike Miller *Sent:* 21 Apr 2016 2:20 pm *To:* ceph-users@lists.ceph.com *Subject:* Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5 Hi Udo, thanks, just to make sure, further increased the readahead: $ sudo blockdev --getra /dev/rbd0 1048576 $ cat /sys/block/rbd0/queue/read_ahead_kb 524288 No difference here. First one is sectors (512 bytes), second one KB. The second read (after drop cache) is somewhat faster (10%-20%) but not much. I also found this info http://tracker.ceph.com/issues/9192 Maybe Ilya can help us, he knows probably best how this can be improved. Thanks and cheers, Mike On 4/21/16 4:32 PM, Udo Lembke wrote: Hi Mike, Am 21.04.2016 um 09:07 schrieb Mike Miller: Hi Nick and Udo, thanks, very helpful, I tweaked some of the config parameters along the line Udo suggests, but still only some 80 MB/s or so. this mean you have reached factor 3 (this are round about the value I see with single thread on RBD too). Better than nothing. Kernel 4.3.4 running on the client machine and comfortable readahead configured $ sudo blockdev --getra /dev/rbd0 262144 Still not more than about 80-90 MB/s. they are two possibilities for read-ahead. Take a look here (and change with echo) cat /sys/block/rbd0/queue/read_ahead_kb Perhaps there are slightly differences? For writing the parallelization is amazing and I see very impressive speeds, but why is reading performance so much behind? Why is it not parallelized the same way writing is? Is this something coming up in the jewel release? Or is it planned further down the road? If you read an big file and clear your cache ("echo 3 > /proc/sys/vm/drop_caches") on the client, is the second read very fast? I assume yes. In this case the readed data is in the cache on the osd-nodes... so tuning must be there (and I'm very interesting in improvements). Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5
Hi Mike, Am 21.04.2016 um 15:20 schrieb Mike Miller: Hi Udo, thanks, just to make sure, further increased the readahead: $ sudo blockdev --getra /dev/rbd0 1048576 $ cat /sys/block/rbd0/queue/read_ahead_kb 524288 No difference here. First one is sectors (512 bytes), second one KB. oops, sorry! My fault. Sector/KB make sense... The second read (after drop cache) is somewhat faster (10%-20%) but not much. That's very strange! Looks like tuning possibilities. Has your OSD-Nodes enough RAM? Are they very very busy? If I do single thread reading on a test-vm I got following results (very small test-cluster - 2 nodes with 10GB-Nic and one Node with 1GB-Nic): support@upgrade-test:~/fio$ dd if=fiojo.0.0 of=/dev/null bs=1M 4096+0 Datensätze ein 4096+0 Datensätze aus 4294967296 Bytes (4,3 GB) kopiert, 62,0267 s, 69,2 MB/s ### as root "echo 3 > /proc/sys/vm/drop_caches" and the same on the VM-host support@upgrade-test:~/fio$ dd if=fiojo.0.0 of=/dev/null bs=1M 4096+0 Datensätze ein 4096+0 Datensätze aus 4294967296 Bytes (4,3 GB) kopiert, 30,0987 s, 143 MB/s # this is due to cached data on the osd-nodes # with cleared cache on all nodes (vm, vm-host, osd-nodes) # I got the value like on the first run: support@upgrade-test:~/fio$ dd if=fiojo.0.0 of=/dev/null bs=1M 4096+0 Datensätze ein 4096+0 Datensätze aus 4294967296 Bytes (4,3 GB) kopiert, 61,8995 s, 69,4 MB/s I don't know why this should not the same with krbd. Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5
I've just looked through github for the Linux kernel and it looks like that read ahead fix was introduced in 4.4, so I'm not sure if it's worth trying a slightly newer kernel? Sent from Nine From: Mike Miller Sent: 21 Apr 2016 2:20 pm To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5 Hi Udo, thanks, just to make sure, further increased the readahead: $ sudo blockdev --getra /dev/rbd0 1048576 $ cat /sys/block/rbd0/queue/read_ahead_kb 524288 No difference here. First one is sectors (512 bytes), second one KB. The second read (after drop cache) is somewhat faster (10%-20%) but not much. I also found this info http://tracker.ceph.com/issues/9192 Maybe Ilya can help us, he knows probably best how this can be improved. Thanks and cheers, Mike On 4/21/16 4:32 PM, Udo Lembke wrote: > Hi Mike, > > Am 21.04.2016 um 09:07 schrieb Mike Miller: >> Hi Nick and Udo, >> >> thanks, very helpful, I tweaked some of the config parameters along >> the line Udo suggests, but still only some 80 MB/s or so. > this mean you have reached factor 3 (this are round about the value I > see with single thread on RBD too). Better than nothing. > >> >> Kernel 4.3.4 running on the client machine and comfortable readahead >> configured >> >> $ sudo blockdev --getra /dev/rbd0 >> 262144 >> >> Still not more than about 80-90 MB/s. > they are two possibilities for read-ahead. > Take a look here (and change with echo) > cat /sys/block/rbd0/queue/read_ahead_kb > > Perhaps there are slightly differences? > >> >> For writing the parallelization is amazing and I see very impressive >> speeds, but why is reading performance so much behind? Why is it not >> parallelized the same way writing is? Is this something coming up in >> the jewel release? Or is it planned further down the road? > If you read an big file and clear your cache ("echo 3 > > /proc/sys/vm/drop_caches") on the client, is the second read very fast? > I assume yes. > In this case the readed data is in the cache on the osd-nodes... so > tuning must be there (and I'm very interesting in improvements). > > Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5
Hi Udo, thanks, just to make sure, further increased the readahead: $ sudo blockdev --getra /dev/rbd0 1048576 $ cat /sys/block/rbd0/queue/read_ahead_kb 524288 No difference here. First one is sectors (512 bytes), second one KB. The second read (after drop cache) is somewhat faster (10%-20%) but not much. I also found this info http://tracker.ceph.com/issues/9192 Maybe Ilya can help us, he knows probably best how this can be improved. Thanks and cheers, Mike On 4/21/16 4:32 PM, Udo Lembke wrote: Hi Mike, Am 21.04.2016 um 09:07 schrieb Mike Miller: Hi Nick and Udo, thanks, very helpful, I tweaked some of the config parameters along the line Udo suggests, but still only some 80 MB/s or so. this mean you have reached factor 3 (this are round about the value I see with single thread on RBD too). Better than nothing. Kernel 4.3.4 running on the client machine and comfortable readahead configured $ sudo blockdev --getra /dev/rbd0 262144 Still not more than about 80-90 MB/s. they are two possibilities for read-ahead. Take a look here (and change with echo) cat /sys/block/rbd0/queue/read_ahead_kb Perhaps there are slightly differences? For writing the parallelization is amazing and I see very impressive speeds, but why is reading performance so much behind? Why is it not parallelized the same way writing is? Is this something coming up in the jewel release? Or is it planned further down the road? If you read an big file and clear your cache ("echo 3 > /proc/sys/vm/drop_caches") on the client, is the second read very fast? I assume yes. In this case the readed data is in the cache on the osd-nodes... so tuning must be there (and I'm very interesting in improvements). Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5
Hi Mike, Am 21.04.2016 um 09:07 schrieb Mike Miller: Hi Nick and Udo, thanks, very helpful, I tweaked some of the config parameters along the line Udo suggests, but still only some 80 MB/s or so. this mean you have reached factor 3 (this are round about the value I see with single thread on RBD too). Better than nothing. Kernel 4.3.4 running on the client machine and comfortable readahead configured $ sudo blockdev --getra /dev/rbd0 262144 Still not more than about 80-90 MB/s. they are two possibilities for read-ahead. Take a look here (and change with echo) cat /sys/block/rbd0/queue/read_ahead_kb Perhaps there are slightly differences? For writing the parallelization is amazing and I see very impressive speeds, but why is reading performance so much behind? Why is it not parallelized the same way writing is? Is this something coming up in the jewel release? Or is it planned further down the road? If you read an big file and clear your cache ("echo 3 > /proc/sys/vm/drop_caches") on the client, is the second read very fast? I assume yes. In this case the readed data is in the cache on the osd-nodes... so tuning must be there (and I'm very interesting in improvements). Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5
Hi Nick and Udo, thanks, very helpful, I tweaked some of the config parameters along the line Udo suggests, but still only some 80 MB/s or so. Kernel 4.3.4 running on the client machine and comfortable readahead configured $ sudo blockdev --getra /dev/rbd0 262144 Still not more than about 80-90 MB/s. For writing the parallelization is amazing and I see very impressive speeds, but why is reading performance so much behind? Why is it not parallelized the same way writing is? Is this something coming up in the jewel release? Or is it planned further down the road? Please let me know if there is a way to enable clients better single threaded read performance for large files. Thanks and regards, Mike On 4/20/16 10:43 PM, Nick Fisk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Udo Lembke Sent: 20 April 2016 07:21 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5 Hi Mike, I don't have experiences with RBD mounts, but see the same effect with RBD. You can do some tuning to get better results (disable debug and so on). As hint some values from a ceph.conf: [osd] debug asok = 0/0 debug auth = 0/0 debug buffer = 0/0 debug client = 0/0 debug context = 0/0 debug crush = 0/0 debug filer = 0/0 debug filestore = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug journal = 0/0 debug journaler = 0/0 debug lockdep = 0/0 debug mds = 0/0 debug mds balancer = 0/0 debug mds locker = 0/0 debug mds log = 0/0 debug mds log expire = 0/0 debug mds migrator = 0/0 debug mon = 0/0 debug monc = 0/0 debug ms = 0/0 debug objclass = 0/0 debug objectcacher = 0/0 debug objecter = 0/0 debug optracker = 0/0 debug osd = 0/0 debug paxos = 0/0 debug perfcounter = 0/0 debug rados = 0/0 debug rbd = 0/0 debug rgw = 0/0 debug throttle = 0/0 debug timer = 0/0 debug tp = 0/0 filestore_op_threads = 4 osd max backfills = 1 osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M" osd mkfs options xfs = "-f -i size=2048" osd recovery max active = 1 osd_disk_thread_ioprio_class = idle osd_disk_thread_ioprio_priority = 7 osd_disk_threads = 1 osd_enable_op_tracker = false osd_op_num_shards = 10 osd_op_num_threads_per_shard = 1 osd_op_threads = 4 Udo On 19.04.2016 11:21, Mike Miller wrote: Hi, RBD mount ceph v0.94.5 6 OSD with 9 HDD each 10 GBit/s public and private networks 3 MON nodes 1Gbit/s network A rbd mounted with btrfs filesystem format performs really badly when reading. Tried readahead in all combinations but that does not help in any way. Write rates are very good in excess of 600 MB/s up to 1200 MB/s, average 800 MB/s Read rates on the same mounted rbd are about 10-30 MB/s !? What kernel are you running, older kernels had an issue where readahead was capped at 2MB. In order to get good read speeds you need readahead set to about 32MB+. Of course, both writes and reads are from a single client machine with a single write/read command. So I am looking at single threaded performance. Actually, I was hoping to see at least 200-300 MB/s when reading, but I am seeing 10% of that at best. Thanks for your help. Mike ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Udo Lembke > Sent: 20 April 2016 07:21 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5 > > Hi Mike, > I don't have experiences with RBD mounts, but see the same effect with > RBD. > > You can do some tuning to get better results (disable debug and so on). > > As hint some values from a ceph.conf: > [osd] > debug asok = 0/0 > debug auth = 0/0 > debug buffer = 0/0 > debug client = 0/0 > debug context = 0/0 > debug crush = 0/0 > debug filer = 0/0 > debug filestore = 0/0 > debug finisher = 0/0 > debug heartbeatmap = 0/0 > debug journal = 0/0 > debug journaler = 0/0 > debug lockdep = 0/0 > debug mds = 0/0 > debug mds balancer = 0/0 > debug mds locker = 0/0 > debug mds log = 0/0 > debug mds log expire = 0/0 > debug mds migrator = 0/0 > debug mon = 0/0 > debug monc = 0/0 > debug ms = 0/0 > debug objclass = 0/0 > debug objectcacher = 0/0 > debug objecter = 0/0 > debug optracker = 0/0 > debug osd = 0/0 > debug paxos = 0/0 > debug perfcounter = 0/0 > debug rados = 0/0 > debug rbd = 0/0 > debug rgw = 0/0 > debug throttle = 0/0 > debug timer = 0/0 > debug tp = 0/0 > filestore_op_threads = 4 > osd max backfills = 1 > osd mount options xfs = > "rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M" > osd mkfs options xfs = "-f -i size=2048" > osd recovery max active = 1 > osd_disk_thread_ioprio_class = idle > osd_disk_thread_ioprio_priority = 7 > osd_disk_threads = 1 > osd_enable_op_tracker = false > osd_op_num_shards = 10 > osd_op_num_threads_per_shard = 1 > osd_op_threads = 4 > > Udo > > On 19.04.2016 11:21, Mike Miller wrote: > > Hi, > > > > RBD mount > > ceph v0.94.5 > > 6 OSD with 9 HDD each > > 10 GBit/s public and private networks > > 3 MON nodes 1Gbit/s network > > > > A rbd mounted with btrfs filesystem format performs really badly when > > reading. Tried readahead in all combinations but that does not help in > > any way. > > > > Write rates are very good in excess of 600 MB/s up to 1200 MB/s, > > average 800 MB/s Read rates on the same mounted rbd are about 10-30 > > MB/s !? What kernel are you running, older kernels had an issue where readahead was capped at 2MB. In order to get good read speeds you need readahead set to about 32MB+. > > > > Of course, both writes and reads are from a single client machine with > > a single write/read command. So I am looking at single threaded > > performance. > > Actually, I was hoping to see at least 200-300 MB/s when reading, but > > I am seeing 10% of that at best. > > > > Thanks for your help. > > > > Mike > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Slow read on RBD mount, Hammer 0.94.5
Hi Mike, I don't have experiences with RBD mounts, but see the same effect with RBD. You can do some tuning to get better results (disable debug and so on). As hint some values from a ceph.conf: [osd] debug asok = 0/0 debug auth = 0/0 debug buffer = 0/0 debug client = 0/0 debug context = 0/0 debug crush = 0/0 debug filer = 0/0 debug filestore = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug journal = 0/0 debug journaler = 0/0 debug lockdep = 0/0 debug mds = 0/0 debug mds balancer = 0/0 debug mds locker = 0/0 debug mds log = 0/0 debug mds log expire = 0/0 debug mds migrator = 0/0 debug mon = 0/0 debug monc = 0/0 debug ms = 0/0 debug objclass = 0/0 debug objectcacher = 0/0 debug objecter = 0/0 debug optracker = 0/0 debug osd = 0/0 debug paxos = 0/0 debug perfcounter = 0/0 debug rados = 0/0 debug rbd = 0/0 debug rgw = 0/0 debug throttle = 0/0 debug timer = 0/0 debug tp = 0/0 filestore_op_threads = 4 osd max backfills = 1 osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M" osd mkfs options xfs = "-f -i size=2048" osd recovery max active = 1 osd_disk_thread_ioprio_class = idle osd_disk_thread_ioprio_priority = 7 osd_disk_threads = 1 osd_enable_op_tracker = false osd_op_num_shards = 10 osd_op_num_threads_per_shard = 1 osd_op_threads = 4 Udo On 19.04.2016 11:21, Mike Miller wrote: > Hi, > > RBD mount > ceph v0.94.5 > 6 OSD with 9 HDD each > 10 GBit/s public and private networks > 3 MON nodes 1Gbit/s network > > A rbd mounted with btrfs filesystem format performs really badly when > reading. Tried readahead in all combinations but that does not help in > any way. > > Write rates are very good in excess of 600 MB/s up to 1200 MB/s, > average 800 MB/s > Read rates on the same mounted rbd are about 10-30 MB/s !? > > Of course, both writes and reads are from a single client machine with > a single write/read command. So I am looking at single threaded > performance. > Actually, I was hoping to see at least 200-300 MB/s when reading, but > I am seeing 10% of that at best. > > Thanks for your help. > > Mike > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com