Re: Ceph Benchmark HowTo
On Wed, Aug 01, 2012 at 09:06:44AM -0500, Mark Nelson wrote: I haven't actually used bonnie++ myself, but I've read some rather bad reports from various other people in the industry. Not sure how much it's changed since then... https://blogs.oracle.com/roch/entry/decoding_bonnie http://www.quora.com/What-are-some-file-system-benchmarks http://scalability.org/?p=1685 http://scalability.org/?p=1688 I'd say to just take extra care to make sure that that it's behaving the way you intended it to (probably good advice no matter which benchmark you use!) Thanks, for this good links :), I have started to try fio too for its flexibility. All results are good, my benchmark is clearly limited by my network connection ~ 110MB/s. Gigabit Ethernet is definitely going to be a limitation with large block sequential IO for most modern disks. I'm concerned with your 6 client numbers though. I assume those numbers are per client? Even so, with 10 OSDs that performance is pretty bad! Are you getting a good distribution of writes across all OSDs? Consistent throughput over time on each? This is a network issue too, the 6 clients tests are not really representatives, all clients share the same 1 gigabit link, I will acquire more hardwares to be more realistic soon (and replace these results). Some precisions have been added to the benchmark page. In exception of the rest-api bench, the value seems really low. ... Is my rest-bench result normal ? Have I missed something ? You may want to try increasing the number of concurrent rest-bench operations. Also I'd explicitly specify the number of PGs for the pool you create to make sure that you are getting a good distribution. During my test the number of PGs is 640 for 10 OSDs, I have tried with more concurrent operation 32 and 64, but the result is almost the same with more latency. Cheers, -- Mehdi Abaakouk for eNovance mail: sil...@sileht.net irc: sileht signature.asc Description: Digital signature
Re: Ceph Benchmark HowTo
Hi all, I have updated the how-to here: http://ceph.com/wiki/Benchmark And published the results of my latest tests: http://ceph.com/wiki/Benchmark#First_Example All results are good, my benchmark is clearly limited by my network connection ~ 110MB/s. In exception of the rest-api bench, the value seems really low. I have configured radosgw with this: http://ceph.com/docs/master/radosgw/config/ I clean disk cache on all servers before the bench, and start rest-bench for 900 seconds with default value. Is my rest-bench result normal ? Have I missed something ? Don't hesitate if you need more informations on my setup. And then, I have another question about how is the Standard Deviation calculated with rados bench and rest-bench ? with the reported value printed each second by the benchmark client ? If yes, when latency is too high, the reported bandwith is sometime zero, then has the calculated StdDev for bandwith a sens ? Cheers, -- Mehdi Abaakouk for eNovance mail: sil...@sileht.net irc: sileht signature.asc Description: Digital signature
Re: Ceph Benchmark HowTo
On Tue, Jul 24, 2012 at 10:55:37AM -0500, Mark Nelson wrote: On 07/24/2012 09:43 AM, Mehdi Abaakouk wrote: Thanks for taking the time to put all of your benchmarking procedures into writing! Having this kind of community ... Thanks, for yours comments and these tools, that will help me for sure. -- Mehdi Abaakouk mail: sil...@sileht.net irc: sileht signature.asc Description: Digital signature
Re: Ceph Benchmark HowTo
On Tue, Jul 24, 2012 at 6:19 PM, Tommi Virtanen t...@inktank.com wrote: On Tue, Jul 24, 2012 at 8:55 AM, Mark Nelson mark.nel...@inktank.com wrote: personally I think it's fine to have it on the wiki. I do want to stress that performance is going to be (hopefully!) improving over the next couple of months so we will probably want to have updated results (or at least remove old results!) as things improve. Also, I'm not sure if we will be keeping the wiki around in it's current form. There was some talk about migrating to something else, but I don't really remember the details. Sounds like a job for doc/dev/benchmark/index.rst! (It, or parts of it, can move out from under Internal if/when it gets user friendly enough to not need as much skill to use.) If John is currently busy (which I assume he always is :) ), I should be able to take care of that. In that case, would someone please open a documentation bug and assign that to me? Cheers, Florian -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph Benchmark HowTo
Hi Mehdi, great work! A few questions (for you, Mark, and anyone else watching this thread) regarding the content of that wiki page: For the OSD tests, which OSD filesystem are you testing on? Are you using a separate journal device? If yes, what type? For the RADOS benchmarks: # rados bench -p pbench 900 seq ... 611 16 17010 16994 111.241 104 1.05852 0.574897 612 16 17037 17021 111.236 108 1.17321 0.574932 613 16 17056 17040 111.17876 1.01611 0.574903 Total time run:613.339616 Total reads made: 17056 Read size:4194304 Bandwidth (MB/sec):111.234 Average Latency: 0.575252 Max latency: 1.65182 Min latency: 0.07418 How meaningful is it to use a (arithmetic) average here, consisting the min and max differ by a factor of 22? Aren't we being bitten by outliers pretty severely here, and wouldn't, say, a median be much more useful? (Actually, would the max latency include the initial hunt for a mon and the mon/osdmap exchange?) seekwatcher -t rbd-latency-write.trace -o rbd-latency-write.png -p 'dd if=/dev/zero of=/dev/rbd0 bs=4M count=1000 oflag=direct' -d /dev/rbd0 Just making sure: are you getting the same numbers just with dd, rather than dd invoked by seekwatcher? Also, for your dd latency test of 4M direct I/O reads writes, you seem to be getting 39 and 300 ms average latency, yet further down it says RBD latency read/write: 28ms and 114.5ms. Any explanation for the write latency being cut in half on what was apparently a different test run? Also, were read and write caches cleared between tests? (echo 3 /proc/sys/vm/drop_caches) Cheers, Florian -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph Benchmark HowTo
On Wed, Jul 25, 2012 at 1:06 PM, Florian Haas flor...@hastexo.com wrote: Hi Mehdi, great work! A few questions (for you, Mark, and anyone else watching this thread) regarding the content of that wiki page: For the OSD tests, which OSD filesystem are you testing on? Are you using a separate journal device? If yes, what type? For the RADOS benchmarks: # rados bench -p pbench 900 seq ... 611 16 17010 16994 111.241 104 1.05852 0.574897 612 16 17037 17021 111.236 108 1.17321 0.574932 613 16 17056 17040 111.17876 1.01611 0.574903 Total time run:613.339616 Total reads made: 17056 Read size:4194304 Bandwidth (MB/sec):111.234 Average Latency: 0.575252 Max latency: 1.65182 Min latency: 0.07418 How meaningful is it to use a (arithmetic) average here, consisting the min and max differ by a factor of 22? Aren't we being bitten by outliers pretty severely here, and wouldn't, say, a median be much more useful? (Actually, would the max latency include the initial hunt for a mon and the mon/osdmap exchange?) Yeah, an average isn't necessarily very useful here — it's what you get because that's easy to implement (with a sum and a counter variable, instead of binning). The inclusion of max and min latencies is an attempt to cheaply compensate for that...but if somebody wants to find/write an appropriately-licensed statistical counting library and integrate it with rados bench, then (say it with me) contributions are welcome! ;) seekwatcher -t rbd-latency-write.trace -o rbd-latency-write.png -p 'dd if=/dev/zero of=/dev/rbd0 bs=4M count=1000 oflag=direct' -d /dev/rbd0 Just making sure: are you getting the same numbers just with dd, rather than dd invoked by seekwatcher? Also, for your dd latency test of 4M direct I/O reads writes, you seem to be getting 39 and 300 ms average latency, yet further down it says RBD latency read/write: 28ms and 114.5ms. Any explanation for the write latency being cut in half on what was apparently a different test run? Also, were read and write caches cleared between tests? (echo 3 /proc/sys/vm/drop_caches) Cheers, Florian -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph Benchmark HowTo
On Wed, Jul 25, 2012 at 1:25 PM, Gregory Farnum g...@inktank.com wrote: Yeah, an average isn't necessarily very useful here — it's what you get because that's easy to implement (with a sum and a counter variable, instead of binning). The inclusion of max and min latencies is an attempt to cheaply compensate for that...but if somebody wants to find/write an appropriately-licensed statistical counting library and integrate it with rados bench, then (say it with me) contributions are welcome! ;) How about output results in a good machine-readable format and here's the pandas script to crunch it to a useful summary. http://pandas.pydata.org/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph Benchmark HowTo
Hi Florian, On Wed, Jul 25, 2012 at 10:06:04PM +0200, Florian Haas wrote: Hi Mehdi, For the OSD tests, which OSD filesystem are you testing on? Are you using a separate journal device? If yes, what type? Actually, I use xfs and the journal is on a same disk in an other partition. After reading documentation, I seems that using a dedicated disk is better and SSD is a good choice. seekwatcher -t rbd-latency-write.trace -o rbd-latency-write.png -p 'dd if=/dev/zero of=/dev/rbd0 bs=4M count=1000 oflag=direct' -d /dev/rbd0 Just making sure: are you getting the same numbers just with dd, rather than dd invoked by seekwatcher? yes Also, for your dd latency test of 4M direct I/O reads writes, you seem to be getting 39 and 300 ms average latency, yet further down it says RBD latency read/write: 28ms and 114.5ms. Any explanation for the write latency being cut in half on what was apparently a different test run? Yes this is a different run, the one on the bottom was with less servers but with better hardware. Also, were read and write caches cleared between tests? (echo 3 /proc/sys/vm/drop_caches) No, I will add it Cheers, Florian I known that my setup is not really optimal, Writing these tests help me to understand how ceph work and I'm sure with your advice I will build a better cluster :) Thanks for your help. Cheers, -- Mehdi Abaakouk mail: sil...@sileht.net irc: sileht signature.asc Description: Digital signature
Ceph Benchmark HowTo
Hi all, I am currently doing some tests on Ceph, more precisely on the RDB and RADOSGW parts. My goal is to get some performance metrics according to the hardwares and the Ceph setup. To do so, I am preparing a benchmark how-to, to help people to compare their metrics. I have started the how-to here : http://ceph.com/w/index.php?title=Benchmark I have linked it in the misc section of the main page. Then, first question, is it alright if I continue publishing this procedure on your wiki ? The how-to is not finished yet, this is only a first draft. My test platform is not ready yet too, so the result of the bench can't be used yet. The next work I will do on the how-to is to add some explanations on how to interpret the results of benchmark. So, if you have some comments, ideas of benchmarks, or anything that can be helpful to me to improve the how-to and/or compare future results, I would be glad to read them. And thanks a lot for your work on Ceph, this is a great storage system :) Best Regards, -- Mehdi Abaakouk for eNovance mail: sil...@sileht.net irc: sileht signature.asc Description: Digital signature
Re: Ceph Benchmark HowTo
On 07/24/2012 09:43 AM, Mehdi Abaakouk wrote: Hi all, I am currently doing some tests on Ceph, more precisely on the RDB and RADOSGW parts. My goal is to get some performance metrics according to the hardwares and the Ceph setup. To do so, I am preparing a benchmark how-to, to help people to compare their metrics. I have started the how-to here : http://ceph.com/w/index.php?title=Benchmark I have linked it in the misc section of the main page. Then, first question, is it alright if I continue publishing this procedure on your wiki ? The how-to is not finished yet, this is only a first draft. My test platform is not ready yet too, so the result of the bench can't be used yet. The next work I will do on the how-to is to add some explanations on how to interpret the results of benchmark. So, if you have some comments, ideas of benchmarks, or anything that can be helpful to me to improve the how-to and/or compare future results, I would be glad to read them. And thanks a lot for your work on Ceph, this is a great storage system :) Best Regards, Hi Mehdi, Thanks for taking the time to put all of your benchmarking procedures into writing! Having this kind of community participation is really important for a project like Ceph. We use many of the same tools internally and personally I think it's fine to have it on the wiki. I do want to stress that performance is going to be (hopefully!) improving over the next couple of months so we will probably want to have updated results (or at least remove old results!) as things improve. Also, I'm not sure if we will be keeping the wiki around in it's current form. There was some talk about migrating to something else, but I don't really remember the details. Some comments: - 60s is a pretty short test. You may get a more accurate representation of throughput by running longer tests. - Performance degradation on aged filesystems can be an issue, so you may see different results if you run the test on a fresh filesystem vs one that has already had a lot of data written to it. - Depending on the number of OSDs you have you may want to explicitly set the number of PGs when creating the benchmarking pool. - We also have a tool called test_filestore_workloadgen which lets you directly test the filestore (data disk and journal) which can be useful when doing strace/perf/valgrind tests. Also, We have some scripts in our ceph-tools repo that may also be useful for anyone who is interested in performance profiling or benchmarking. Specifically: analysis/log_analyzer.py - lets you analyze where high latency requests are spending their time if debugging/tracker options are turned on for the logs. analysis/strace_parser.py - Rough tool to let you examine the frequency of various write/read/etc operations as reported by strace. Useful for analyzing IO for things other than Ceph as well, but still in progress. aging/runtests.py - A tool we use for running rados bench and rest bench internally on multiple clients. Eventually this may be folded into our teuthology project as much of the functionality overlaps. Requires pdsh, collectl, blktrace, perf, and possibly some other dependencies. Thanks, Mark -- Mark Nelson Performance Engineer Inktank -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph Benchmark HowTo
On Tue, Jul 24, 2012 at 8:55 AM, Mark Nelson mark.nel...@inktank.com wrote: personally I think it's fine to have it on the wiki. I do want to stress that performance is going to be (hopefully!) improving over the next couple of months so we will probably want to have updated results (or at least remove old results!) as things improve. Also, I'm not sure if we will be keeping the wiki around in it's current form. There was some talk about migrating to something else, but I don't really remember the details. Sounds like a job for doc/dev/benchmark/index.rst! (It, or parts of it, can move out from under Internal if/when it gets user friendly enough to not need as much skill to use.) All of the rest in your email should go in there too, in some form or another. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html