G'day Wido Great advice, thanks! We settled on 1x LVM partition on SSD for OSD-Journal.
A quick follow up if I may please? > "A last note, if you use a SSD for your journaling, make sure that you align > your partitions which the page size of the SSD, otherwise you'd run into the > write amplification of the SSD, resulting in a performance loss." Do you have any technical doco on how to achieve this? I am happy to value-add and write it up in a format that can go back into the wiki for others to follow. And secondly, should the SSD Journal sizes be large or small? Ie, is say 1G partition per paired 2-3TB SATA disk OK? Or as large an SSD as possible? There are many forum posts that say 100-200MB will suffice. A quick piece of advice will save us hopefully sever days of reconfiguring and benchmarking the Cluster :-) Thanks Paul -----Original Message----- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Wido den Hollander Sent: Tuesday, 14 February 2012 10:46 PM To: Paul Pettigrew Cc: ceph-devel@vger.kernel.org Subject: Re: Which SSD method is better for performance? Hi, On 02/14/2012 01:39 AM, Paul Pettigrew wrote: > G'day all > > About to commence an R&D eval of the Ceph platform having been impressed with > the momentum achieved over the past 12mths. > > I have one question re design before rolling out to metal........ > > I will be using 1x SSD drive per storage server node (assume it is /dev/sdb > for this discussion), and cannot readily determine the pro/con's for the two > methods of using it for OSD-Journal, being: > #1. place it in the main [osd] stanza and reference the whole drive as > a single partition; or That won't work. If you do that all OSD's will try to open the journal. The journal for each OSD has to be unique. > #2. partition up the disk, so 1x partition per SATA HDD, and place > each partition in the [osd.N] portion That would be your best option. I'm doing the same: http://zooi.widodh.nl/ceph/ceph.conf the VG "data" is placed on a SSD (Intel X25-M). > > So if I were to code #1 in the ceph.conf file, it would be: > [osd] > osd journal = /dev/sdb > > Or, #2 would be like: > [osd.0] > host = ceph1 > btrfs devs = /dev/sdc > osd journal = /dev/sdb5 > [osd.1] > host = ceph1 > btrfs devs = /dev/sdd > osd journal = /dev/sdb6 > [osd.2] > host = ceph1 > btrfs devs = /dev/sde > osd journal = /dev/sdb7 > [osd.3] > host = ceph1 > btrfs devs = /dev/sdf > osd journal = /dev/sdb8 > > I am asking therefore, is the added work (and constraints) of specifying down > to individual partitions per #2 worth it in performance gains? Does it not > also have a constraint, in that if I wanted to add more HDD's into the server > (we buy 45 bay units, and typically provision HDD's "on demand" i.e. 15x at a > time as usage grows), I would have to additionally partition the SSD (taking > it offline) - but if it were #1 option, I would only have to add more [osd.N] > sections (and not have to worry about getting the SSD with 45x partitions)? > You'd still have to go for #2. However, running 45 OSD's on a single machine is a bit tricky imho. If that machine fails you would loose 45 OSD's at once, that will put a lot of stress on the recovery of your cluster. You'd also need a lot of RAM to accommodate those 45 OSD's, at least 48GB of RAM I guess. A last note, if you use a SSD for your journaling, make sure that you align your partitions which the page size of the SSD, otherwise you'd run into the write amplification of the SSD, resulting in a performance loss. Wido > One final related question, if I were to use #1 method (which I would prefer > if there is no material performance or other reason to use #2), then that > specification (i.e. the "osd journal = /dev/sdb") SSD disk reference would > have to be identical on all other hardware nodes, yes (I want to use the same > ceph.conf file on all servers per the doco recommendations)? What would > happen if for example, the SSD was on /dev/sde on a new node added into the > cluster? References to /dev/disk/by-id etc are clearly no help, so should a > symlink be used from the get-go? Eg something like "ln -s /dev/sdb /srv/ssd" > on one box, and "ln -s /dev/sde /srv/ssd" on the other box, so that in the > [osd] section we could use this line which would find the SSD disk on all > nodes "osd journal = /srv/ssd"? > > Many thanks for any advice provided. > > Cheers > > Paul > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in the body of a message to majord...@vger.kernel.org More majordomo > info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html