Re: Ceph Benchmark HowTo

2012-08-02 Thread Mehdi Abaakouk
On Wed, Aug 01, 2012 at 09:06:44AM -0500, Mark Nelson wrote:
 I haven't actually used bonnie++ myself, but I've read some rather
 bad reports from various other people in the industry.  Not sure how
 much it's changed since then...
 
 https://blogs.oracle.com/roch/entry/decoding_bonnie
 http://www.quora.com/What-are-some-file-system-benchmarks
 http://scalability.org/?p=1685
 http://scalability.org/?p=1688
 
 I'd say to just take extra care to make sure that that it's behaving
 the way you intended it to (probably good advice no matter which
 benchmark you use!)

Thanks, for this good links :), I have started to try fio too for its
flexibility.

 All results are good, my benchmark is clearly limited by my network
 connection ~ 110MB/s.
 
 Gigabit Ethernet is definitely going to be a limitation with large
 block sequential IO for most modern disks.  I'm concerned with your
 6 client numbers though.  I assume those numbers are per client?
 Even so, with 10 OSDs that performance is pretty bad!  Are you
 getting a good distribution of writes across all OSDs?  Consistent
 throughput over time on each?

This is a network issue too, the 6 clients tests are not really 
representatives, all clients share the same 1 gigabit link, I will 
acquire more hardwares to be more realistic soon (and replace these
results).

Some precisions have been added to the benchmark page.

 In exception of the rest-api bench, the value seems really low.
  ...
 Is my rest-bench result normal ? Have I missed something ?
 
 You may want to try increasing the number of concurrent rest-bench
 operations.  Also I'd explicitly specify the number of PGs for the
 pool you create to make sure that you are getting a good
 distribution.

During my test the number of PGs is 640 for 10 OSDs, I have tried with more
concurrent operation 32 and 64, but the result is almost the same with
more latency. 


Cheers,
-- 
Mehdi Abaakouk for eNovance
mail: sil...@sileht.net
irc: sileht


signature.asc
Description: Digital signature


Re: Ceph Benchmark HowTo

2012-07-31 Thread Mehdi Abaakouk
Hi all,

I have updated the how-to here:
http://ceph.com/wiki/Benchmark

And published the results of my latest tests:
http://ceph.com/wiki/Benchmark#First_Example

All results are good, my benchmark is clearly limited by my network
connection ~ 110MB/s.

In exception of the rest-api bench, the value seems really low.

I have configured radosgw with this:
http://ceph.com/docs/master/radosgw/config/
I clean disk cache on all servers before the bench,
and start rest-bench for 900 seconds with default value.

Is my rest-bench result normal ? Have I missed something ?

Don't hesitate if you need more informations on my setup.

And then, I have another question about how is the Standard Deviation
calculated with rados bench and rest-bench ? with the reported value 
printed each second by the benchmark client ?
If yes, when latency is too high, the reported bandwith is sometime zero,
then has the calculated StdDev for bandwith a sens ?


Cheers,
-- 
Mehdi Abaakouk for eNovance
mail: sil...@sileht.net
irc: sileht


signature.asc
Description: Digital signature


Re: Ceph Benchmark HowTo

2012-07-25 Thread Mehdi Abaakouk
On Tue, Jul 24, 2012 at 10:55:37AM -0500, Mark Nelson wrote:
 On 07/24/2012 09:43 AM, Mehdi Abaakouk wrote:
 
 Thanks for taking the time to put all of your benchmarking
 procedures into writing!  Having this kind of community

 ...


Thanks, for yours comments and these tools, that will help me for sure.


-- 
Mehdi Abaakouk
mail: sil...@sileht.net
irc: sileht


signature.asc
Description: Digital signature


Re: Ceph Benchmark HowTo

2012-07-25 Thread Florian Haas
On Tue, Jul 24, 2012 at 6:19 PM, Tommi Virtanen t...@inktank.com wrote:
 On Tue, Jul 24, 2012 at 8:55 AM, Mark Nelson mark.nel...@inktank.com wrote:
 personally I think it's fine to have it on the wiki.  I do want to stress
 that performance is going to be (hopefully!) improving over the next couple
 of months so we will probably want to have updated results (or at least
 remove old results!) as things improve.  Also, I'm not sure if we will be
 keeping the wiki around in it's current form. There was some talk about
 migrating to something else, but I don't really remember the details.

 Sounds like a job for doc/dev/benchmark/index.rst!  (It, or parts of
 it, can move out from under Internal if/when it gets user friendly
 enough to not need as much skill to use.)

If John is currently busy (which I assume he always is :) ), I should
be able to take care of that. In that case, would someone please open
a documentation bug and assign that to me?

Cheers,
Florian
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph Benchmark HowTo

2012-07-25 Thread Florian Haas
Hi Mehdi,

great work! A few questions (for you, Mark, and anyone else watching
this thread) regarding the content of that wiki page:

For the OSD tests, which OSD filesystem are you testing on? Are you
using a separate journal device? If yes, what type?

For the RADOS benchmarks:

# rados bench -p pbench 900 seq
...
   611  16 17010 16994   111.241   104   1.05852  0.574897
   612  16 17037 17021   111.236   108   1.17321  0.574932
   613  16 17056 17040   111.17876   1.01611  0.574903
 Total time run:613.339616
Total reads made: 17056
Read size:4194304
Bandwidth (MB/sec):111.234

Average Latency:   0.575252
Max latency:   1.65182
Min latency:   0.07418

How meaningful is it to use a (arithmetic) average here, consisting
the min and max differ by a factor of 22? Aren't we being bitten by
outliers pretty severely here, and wouldn't, say, a median be much
more useful? (Actually, would the max latency include the initial
hunt for a mon and the mon/osdmap exchange?)



seekwatcher -t rbd-latency-write.trace -o rbd-latency-write.png -p 'dd
if=/dev/zero of=/dev/rbd0 bs=4M count=1000 oflag=direct' -d /dev/rbd0

Just making sure: are you getting the same numbers just with dd,
rather than dd invoked by seekwatcher?

Also, for your dd latency test of 4M direct I/O reads writes, you seem
to be getting 39 and 300 ms average latency, yet further down it says
RBD latency read/write: 28ms and 114.5ms. Any explanation for the
write latency being cut in half on what was apparently a different
test run?

Also, were read and write caches cleared between tests? (echo 3 
/proc/sys/vm/drop_caches)

Cheers,
Florian
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph Benchmark HowTo

2012-07-25 Thread Gregory Farnum
On Wed, Jul 25, 2012 at 1:06 PM, Florian Haas flor...@hastexo.com wrote:
 Hi Mehdi,

 great work! A few questions (for you, Mark, and anyone else watching
 this thread) regarding the content of that wiki page:

 For the OSD tests, which OSD filesystem are you testing on? Are you
 using a separate journal device? If yes, what type?

 For the RADOS benchmarks:

 # rados bench -p pbench 900 seq
 ...
611  16 17010 16994   111.241   104   1.05852  0.574897
612  16 17037 17021   111.236   108   1.17321  0.574932
613  16 17056 17040   111.17876   1.01611  0.574903
  Total time run:613.339616
 Total reads made: 17056
 Read size:4194304
 Bandwidth (MB/sec):111.234

 Average Latency:   0.575252
 Max latency:   1.65182
 Min latency:   0.07418

 How meaningful is it to use a (arithmetic) average here, consisting
 the min and max differ by a factor of 22? Aren't we being bitten by
 outliers pretty severely here, and wouldn't, say, a median be much
 more useful? (Actually, would the max latency include the initial
 hunt for a mon and the mon/osdmap exchange?)

Yeah, an average isn't necessarily very useful here — it's what you
get because that's easy to implement (with a sum and a counter
variable, instead of binning). The inclusion of max and min latencies
is an attempt to cheaply compensate for that...but if somebody wants
to find/write an appropriately-licensed statistical counting library
and integrate it with rados bench, then (say it with me) contributions
are welcome! ;)


 seekwatcher -t rbd-latency-write.trace -o rbd-latency-write.png -p 'dd
 if=/dev/zero of=/dev/rbd0 bs=4M count=1000 oflag=direct' -d /dev/rbd0

 Just making sure: are you getting the same numbers just with dd,
 rather than dd invoked by seekwatcher?

 Also, for your dd latency test of 4M direct I/O reads writes, you seem
 to be getting 39 and 300 ms average latency, yet further down it says
 RBD latency read/write: 28ms and 114.5ms. Any explanation for the
 write latency being cut in half on what was apparently a different
 test run?

 Also, were read and write caches cleared between tests? (echo 3 
 /proc/sys/vm/drop_caches)

 Cheers,
 Florian
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph Benchmark HowTo

2012-07-25 Thread Tommi Virtanen
On Wed, Jul 25, 2012 at 1:25 PM, Gregory Farnum g...@inktank.com wrote:
 Yeah, an average isn't necessarily very useful here — it's what you
 get because that's easy to implement (with a sum and a counter
 variable, instead of binning). The inclusion of max and min latencies
 is an attempt to cheaply compensate for that...but if somebody wants
 to find/write an appropriately-licensed statistical counting library
 and integrate it with rados bench, then (say it with me) contributions
 are welcome! ;)

How about output results in a good machine-readable format and here's
the pandas script to crunch it to a useful summary.

http://pandas.pydata.org/
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph Benchmark HowTo

2012-07-25 Thread Mehdi Abaakouk
Hi Florian,

On Wed, Jul 25, 2012 at 10:06:04PM +0200, Florian Haas wrote:
 Hi Mehdi,
 For the OSD tests, which OSD filesystem are you testing on? Are you   
   
 
 using a separate journal device? If yes, what type? 

Actually, I use xfs and the journal is on a same disk in an other partition.
After reading documentation, I seems that using a dedicated disk is 
better and SSD is a good choice.

 seekwatcher -t rbd-latency-write.trace -o rbd-latency-write.png -p 'dd
 if=/dev/zero of=/dev/rbd0 bs=4M count=1000 oflag=direct' -d /dev/rbd0
 
 Just making sure: are you getting the same numbers just with dd,
 rather than dd invoked by seekwatcher?

yes

 
 Also, for your dd latency test of 4M direct I/O reads writes, you seem
 to be getting 39 and 300 ms average latency, yet further down it says
 RBD latency read/write: 28ms and 114.5ms. Any explanation for the
 write latency being cut in half on what was apparently a different
 test run?

Yes this is a different run, the one on the bottom was with less servers
but with better hardware.

 
 Also, were read and write caches cleared between tests? (echo 3 
 /proc/sys/vm/drop_caches)

No, I will add it 

 Cheers,
 Florian

I known that my setup is not really optimal,
Writing these tests help me to understand how ceph work and
I'm sure with your advice I will build a better cluster :)

Thanks for your help.

Cheers,
-- 
Mehdi Abaakouk
mail: sil...@sileht.net
irc: sileht


signature.asc
Description: Digital signature


Ceph Benchmark HowTo

2012-07-24 Thread Mehdi Abaakouk
Hi all,

I am currently doing some tests on Ceph, more precisely on the RDB and RADOSGW 
parts. 
My goal is to get some performance metrics according to the hardwares and 
the Ceph setup.

To do so, I am preparing a benchmark how-to, to help people to compare their
metrics.

I have started the how-to here : http://ceph.com/w/index.php?title=Benchmark
I have linked it in the misc section of the main page.

Then, first question, is it alright if I continue publishing this procedure 
on your wiki ?

The how-to is not finished yet, this is only a first draft.
My test platform is not ready yet too, so the result of the bench can't be
used yet.

The next work I will do on the how-to is to add some explanations on how
to interpret the results of benchmark.

So, if you have some comments, ideas of benchmarks, or anything that can be 
helpful to me to improve the how-to and/or compare future results, 
I would be glad to read them.

And thanks a lot for your work on Ceph, this is a great storage system :)

Best Regards,
-- 
Mehdi Abaakouk for eNovance
mail: sil...@sileht.net
irc: sileht


signature.asc
Description: Digital signature


Re: Ceph Benchmark HowTo

2012-07-24 Thread Mark Nelson

On 07/24/2012 09:43 AM, Mehdi Abaakouk wrote:

Hi all,

I am currently doing some tests on Ceph, more precisely on the RDB and RADOSGW
parts.
My goal is to get some performance metrics according to the hardwares and
the Ceph setup.

To do so, I am preparing a benchmark how-to, to help people to compare their
metrics.

I have started the how-to here : http://ceph.com/w/index.php?title=Benchmark
I have linked it in the misc section of the main page.

Then, first question, is it alright if I continue publishing this procedure
on your wiki ?

The how-to is not finished yet, this is only a first draft.
My test platform is not ready yet too, so the result of the bench can't be
used yet.

The next work I will do on the how-to is to add some explanations on how
to interpret the results of benchmark.

So, if you have some comments, ideas of benchmarks, or anything that can be
helpful to me to improve the how-to and/or compare future results,
I would be glad to read them.

And thanks a lot for your work on Ceph, this is a great storage system :)

Best Regards,


Hi Mehdi,

Thanks for taking the time to put all of your benchmarking procedures 
into writing!  Having this kind of community participation is really 
important for a project like Ceph.  We use many of the same tools 
internally and personally I think it's fine to have it on the wiki.  I 
do want to stress that performance is going to be (hopefully!) improving 
over the next couple of months so we will probably want to have updated 
results (or at least remove old results!) as things improve.  Also, I'm 
not sure if we will be keeping the wiki around in it's current form. 
There was some talk about migrating to something else, but I don't 
really remember the details.


Some comments:

- 60s is a pretty short test.  You may get a more accurate 
representation of throughput by running longer tests.
- Performance degradation on aged filesystems can be an issue, so you 
may see different results if you run the test on a fresh filesystem vs 
one that has already had a lot of data written to it.
- Depending on the number of OSDs you have you may want to explicitly 
set the number of PGs when creating the benchmarking pool.
- We also have a tool called test_filestore_workloadgen which lets you 
directly test the filestore (data disk and journal) which can be useful 
when doing strace/perf/valgrind tests.


Also, We have some scripts in our ceph-tools repo that may also be 
useful for anyone who is interested in performance profiling or 
benchmarking.  Specifically:


analysis/log_analyzer.py - lets you analyze where high latency requests 
are spending their time if debugging/tracker options are turned on for 
the logs.


analysis/strace_parser.py - Rough tool to let you examine the frequency 
of various write/read/etc operations as reported by strace.  Useful for 
analyzing IO for things other than Ceph as well, but still in progress.


aging/runtests.py - A tool we use for running rados bench and rest bench 
internally on multiple clients.  Eventually this may be folded into our 
teuthology project as much of the functionality overlaps.  Requires 
pdsh, collectl, blktrace, perf, and possibly some other dependencies.


Thanks,
Mark
--
Mark Nelson
Performance Engineer
Inktank
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph Benchmark HowTo

2012-07-24 Thread Tommi Virtanen
On Tue, Jul 24, 2012 at 8:55 AM, Mark Nelson mark.nel...@inktank.com wrote:
 personally I think it's fine to have it on the wiki.  I do want to stress
 that performance is going to be (hopefully!) improving over the next couple
 of months so we will probably want to have updated results (or at least
 remove old results!) as things improve.  Also, I'm not sure if we will be
 keeping the wiki around in it's current form. There was some talk about
 migrating to something else, but I don't really remember the details.

Sounds like a job for doc/dev/benchmark/index.rst!  (It, or parts of
it, can move out from under Internal if/when it gets user friendly
enough to not need as much skill to use.)

All of the rest in your email should go in there too, in some form or another.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html