Re: monitor start up question
On Wed, 25 Jul 2012, Mandell Degerness wrote: > When a cluster has been shut down and then re-started, how do the > monitors know what the cluster fsid is? Is it stored somewhere? It's embedded in the monmap, currently found at $mon_data/monmap/. Not terribly convenient, sorry! > I would like to be able to verify, before starting a monitor on a > given server, if an existing monitor directory belongs to the current > cluster or to a previous cluster incarnation. With the OSDs, I can > just check the cluster_fsid file. We can add a similar file in the $mon_data directory in future version. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
monitor start up question
When a cluster has been shut down and then re-started, how do the monitors know what the cluster fsid is? Is it stored somewhere? I would like to be able to verify, before starting a monitor on a given server, if an existing monitor directory belongs to the current cluster or to a previous cluster incarnation. With the OSDs, I can just check the cluster_fsid file. Regards, Mandell Degerness -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph Benchmark HowTo
Hi Florian, On Wed, Jul 25, 2012 at 10:06:04PM +0200, Florian Haas wrote: > Hi Mehdi, > For the OSD tests, which OSD filesystem are you testing on? Are you > > > using a separate journal device? If yes, what type? Actually, I use xfs and the journal is on a same disk in an other partition. After reading documentation, I seems that using a dedicated disk is better and SSD is a good choice. > seekwatcher -t rbd-latency-write.trace -o rbd-latency-write.png -p 'dd > if=/dev/zero of=/dev/rbd0 bs=4M count=1000 oflag=direct' -d /dev/rbd0 > > Just making sure: are you getting the same numbers just with dd, > rather than dd invoked by seekwatcher? yes > > Also, for your dd latency test of 4M direct I/O reads writes, you seem > to be getting 39 and 300 ms average latency, yet further down it says > "RBD latency read/write: 28ms and 114.5ms". Any explanation for the > write latency being cut in half on what was apparently a different > test run? Yes this is a different run, the one on the bottom was with less servers but with better hardware. > > Also, were read and write caches cleared between tests? (echo 3 > > /proc/sys/vm/drop_caches) No, I will add it > Cheers, > Florian I known that my setup is not really optimal, Writing these tests help me to understand how ceph work and I'm sure with your advice I will build a better cluster :) Thanks for your help. Cheers, -- Mehdi Abaakouk mail: sil...@sileht.net irc: sileht signature.asc Description: Digital signature
Re: Ceph Benchmark HowTo
On Wed, Jul 25, 2012 at 1:25 PM, Gregory Farnum wrote: > Yeah, an average isn't necessarily very useful here — it's what you > get because that's easy to implement (with a sum and a counter > variable, instead of binning). The inclusion of max and min latencies > is an attempt to cheaply compensate for that...but if somebody wants > to find/write an appropriately-licensed statistical counting library > and integrate it with rados bench, then (say it with me) contributions > are welcome! ;) How about "output results in a good machine-readable format and here's the pandas script to crunch it to a useful summary". http://pandas.pydata.org/ -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph Benchmark HowTo
On Wed, Jul 25, 2012 at 1:06 PM, Florian Haas wrote: > Hi Mehdi, > > great work! A few questions (for you, Mark, and anyone else watching > this thread) regarding the content of that wiki page: > > For the OSD tests, which OSD filesystem are you testing on? Are you > using a separate journal device? If yes, what type? > > For the RADOS benchmarks: > > # rados bench -p pbench 900 seq > ... >611 16 17010 16994 111.241 104 1.05852 0.574897 >612 16 17037 17021 111.236 108 1.17321 0.574932 >613 16 17056 17040 111.17876 1.01611 0.574903 > Total time run:613.339616 > Total reads made: 17056 > Read size:4194304 > Bandwidth (MB/sec):111.234 > > Average Latency: 0.575252 > Max latency: 1.65182 > Min latency: 0.07418 > > How meaningful is it to use a (arithmetic) average here, consisting > the min and max differ by a factor of 22? Aren't we being bitten by > outliers pretty severely here, and wouldn't, say, a median be much > more useful? (Actually, would the "max latency" include the initial > hunt for a mon and the mon/osdmap exchange?) Yeah, an average isn't necessarily very useful here — it's what you get because that's easy to implement (with a sum and a counter variable, instead of binning). The inclusion of max and min latencies is an attempt to cheaply compensate for that...but if somebody wants to find/write an appropriately-licensed statistical counting library and integrate it with rados bench, then (say it with me) contributions are welcome! ;) > seekwatcher -t rbd-latency-write.trace -o rbd-latency-write.png -p 'dd > if=/dev/zero of=/dev/rbd0 bs=4M count=1000 oflag=direct' -d /dev/rbd0 > > Just making sure: are you getting the same numbers just with dd, > rather than dd invoked by seekwatcher? > > Also, for your dd latency test of 4M direct I/O reads writes, you seem > to be getting 39 and 300 ms average latency, yet further down it says > "RBD latency read/write: 28ms and 114.5ms". Any explanation for the > write latency being cut in half on what was apparently a different > test run? > > Also, were read and write caches cleared between tests? (echo 3 > > /proc/sys/vm/drop_caches) > > Cheers, > Florian > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph Benchmark HowTo
Hi Mehdi, great work! A few questions (for you, Mark, and anyone else watching this thread) regarding the content of that wiki page: For the OSD tests, which OSD filesystem are you testing on? Are you using a separate journal device? If yes, what type? For the RADOS benchmarks: # rados bench -p pbench 900 seq ... 611 16 17010 16994 111.241 104 1.05852 0.574897 612 16 17037 17021 111.236 108 1.17321 0.574932 613 16 17056 17040 111.17876 1.01611 0.574903 Total time run:613.339616 Total reads made: 17056 Read size:4194304 Bandwidth (MB/sec):111.234 Average Latency: 0.575252 Max latency: 1.65182 Min latency: 0.07418 How meaningful is it to use a (arithmetic) average here, consisting the min and max differ by a factor of 22? Aren't we being bitten by outliers pretty severely here, and wouldn't, say, a median be much more useful? (Actually, would the "max latency" include the initial hunt for a mon and the mon/osdmap exchange?) seekwatcher -t rbd-latency-write.trace -o rbd-latency-write.png -p 'dd if=/dev/zero of=/dev/rbd0 bs=4M count=1000 oflag=direct' -d /dev/rbd0 Just making sure: are you getting the same numbers just with dd, rather than dd invoked by seekwatcher? Also, for your dd latency test of 4M direct I/O reads writes, you seem to be getting 39 and 300 ms average latency, yet further down it says "RBD latency read/write: 28ms and 114.5ms". Any explanation for the write latency being cut in half on what was apparently a different test run? Also, were read and write caches cleared between tests? (echo 3 > /proc/sys/vm/drop_caches) Cheers, Florian -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph Benchmark HowTo
On Tue, Jul 24, 2012 at 6:19 PM, Tommi Virtanen wrote: > On Tue, Jul 24, 2012 at 8:55 AM, Mark Nelson wrote: >> personally I think it's fine to have it on the wiki. I do want to stress >> that performance is going to be (hopefully!) improving over the next couple >> of months so we will probably want to have updated results (or at least >> remove old results!) as things improve. Also, I'm not sure if we will be >> keeping the wiki around in it's current form. There was some talk about >> migrating to something else, but I don't really remember the details. > > Sounds like a job for doc/dev/benchmark/index.rst! (It, or parts of > it, can move out from under "Internal" if/when it gets user friendly > enough to not need as much skill to use.) If John is currently busy (which I assume he always is :) ), I should be able to take care of that. In that case, would someone please open a documentation bug and assign that to me? Cheers, Florian -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph Benchmark HowTo
On Tue, Jul 24, 2012 at 10:55:37AM -0500, Mark Nelson wrote: > On 07/24/2012 09:43 AM, Mehdi Abaakouk wrote: > > Thanks for taking the time to put all of your benchmarking > procedures into writing! Having this kind of community > > ... > Thanks, for yours comments and these tools, that will help me for sure. -- Mehdi Abaakouk mail: sil...@sileht.net irc: sileht signature.asc Description: Digital signature
Re: Clusters and pools
On 07/25/2012 06:34 AM, Ryan Nicholson wrote: I'm running a cluster based on 4 hosts that each have 3 fast, SCSI osd's, and 1 very large SATA osd, meaning, 12 fast osd's and 4 slow osd's total. I wish to segregate these into 2 pools, that operate independently. The goal is to use the faster disks as an area to hold rbd based VM's, and the larger area to host rbd-base large volumes (to start), and possibly have that become just a big cephfs area once, the fs side of things is considered more stable. Now, I've been thrown a couple options, and am still unsettled. Which is best?: - Create 2 independent clusters; one with the 12 SCSI osd's and the other with just the 4 large OSD's on the same hosts. This seems to be more complex from a scripting and boot time standpoint, but easier for my head. - Create a single cluster and use CRUSH rules to separate the two. This one STILL has me lost, as I'm having trouble understanding the crushmap syntax, the Crushmap import/export commands, and the other mkpool or otherwise commands from the docs in order to "make rbd's come from this faster pool", while "cephfs, you come from this slower pool". I really would like to entertain this path, however, as this allows ceph to handle the entire situation, and, it would seem more elegant. I'm also open to other options as well. The "easiest" way to approach this: Set up the cluster with the 12 fast OSD's first en leave the other 4 out of the configuration. Get everything up and running and play with it. Then, add the 4 remaining OSD's to the cluster: 1. Add them to ceph.conf 2. Increment max_osd 3. Add them to the keyring 4. Format the OSD's 5. Start the OSD's Now they should show up in your "ceph -s" output, but no data will go to them. The next step is to export your current crushmap: $ ceph osd getcrushmap -o crushmap $ crushtool -d crushmap -o crushmap.txt You should now add 4 new hosts to the crushmap, something like "hostA-slow" and add one OSD under each of them. Now you can add a new rack called "slowrbd" for example, add a new pool and a new rule afterwards. Compile crushmap.txt back again to "crushmap" and load it into the cluster. You can now create a new pool with a specific crushrule. All the data in that pool will go onto those 4 slower OSD's. Wido Thanks! Ryan Nicholson -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
EU Ceph.com mirror for Debian/Ubuntu packages
Hi, On a couple of systems I'm using the Debian packages provided on Ceph.com, but these packages are hosted on a CA based server. In the EU that's rather slow, especially when updating multiple servers and when downloading the debug packages. As I'm lazy I don't want to maintain my own mirror with my own build packages, I rather use the ones build by Ceph.com Could we set up a EU mirror like eu.ceph.com or nl.ceph.com? deb http://eu.ceph.com/debian/ precise main That could update from the main mirror with rsync every hour. I could offer some space if needed? Thanks, Wido -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html