Hey All, Thanks for the quick responses! I have chosen the micron pci-e card due to its benchmark results on http://www.storagereview.com/micron_realssd_p320h_enterprise_pcie_review . Per the vendor the card has a 25PB life expectancy so I'm not terribly worried about it failing on me too soon :)
Christian Balzer <chibi@...> writes: > > On Wed, 14 May 2014 19:28:17 -0500 Mark Nelson wrote: > > > On 05/14/2014 06:36 PM, Tyler Wilson wrote: > > > Hey All, > > > > Hi! > > > > > > > > I am setting up a new storage cluster that absolutely must have the > > > best read/write sequential speed <at> 128k and the highest IOps at 4k > > > read/write as possible. > > > > I assume random? > > > > > > > > My current specs for each storage node are currently; > > > CPU: 2x E5-2670V2 > > > Motherboard: SM X9DRD-EF > > > OSD Disks: 20-30 Samsung 840 1TB > > > OSD Journal(s): 1-2 Micron RealSSD P320h > > > Network: 4x 10gb, Bridged > I assume you mean 2x10Gb bonded for public and 2x10Gb for cluster network? > > The SSDs you specified would read at about 500MB/s, meaning that only 4 of > them would already saturate your network uplink. > For writes (assuming journal on SSDs, see below) you reach that point with > just 8 SSDs. > the 4x 10gb will be ceph-storage only traffic with public and management being on-board interfaces. This is expandable to 80Gbps if needed. > > > Memory: 32-96GB depending on need > RAM is pretty cheap these days and a large pagecache on the storage nodes > is always quite helpful. > Noted, I wasn't sure how Ceph used the linux memory cache or if it would benefit us. > > > > > How many of these nodes are you planning to deploy initially? > As always and especially when going for performance, more and smaller > nodes tend to be better, also less impact if one goes down. > And in your case it is easier to balance storage and network bandwidth, > see above. > 2 storage nodes per location at start, these are serving OpenStack VM's so whenever it gets utilized enough to warrant more. > > > Does anyone see any potential bottlenecks in the above specs? What kind > > > of improvements or configurations can we make on the OSD config side? > > > We are looking to run this with 2 replication. > > > > Likely you'll run into latency due to context switching and lock > > contention in the OSDs and maybe even some kernel slowness. Potentially > > you could end up CPU limited too, even with E5-2670s given how fast all > > of those SSDs are. I'd suggest considering a chassis without an > > expander backplane and using multiple controllers with the drives > > directly attached. > > > > Indeed, I'd be worried about that as well, same with the > chassis/controller bit. > Thanks for the advise on the controller card, we will look into different chassis options w/ the LSI cards recommended on the InkTank docs. Would running a different distribution affect this at all? Our target was CentOS 6 however if a more recent kernel would make a difference we could switch. > > There's work going into improving things on the Ceph side but I don't > > know how much of it has even hit our wip branches in github yet. So for > > now ymmv, but there's a lot of work going on in this area as it's > > something that lots of folks are interested in. > > > If you look at the current "Slow IOPS on RBD compared to journal and > backing devices" thread and the Inktank document referenced in it > > https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf > > you should probably assume no more than 800 random write IOPS and 4000 > random read IOPS per OSD (4KB block size). > That later number I can also reproduce with my cluster. > > Now I expect those numbers to go up as Ceph is improved, but for the time > being those limits might influence your choice of hardware. > > > I'd also suggest testing whether or not putting all of the journals on > > the RealSSD cards actually helps you that much over just putting your > > journals on the other SSDs. The advantage here is that by putting > > journals on the 2.5" SSDs, you don't lose a pile of OSDs if one of those > > PCIE cards fails. > > > More than seconded, I could only find READ values on the Micron site which > makes me very suspicious, as the journal's main role is to be able to > WRITE as fast as possible. Also all journals combined ought to be faster > than your final storage. > Lastly there was no endurance data on the Micron site either and with ALL > your writes having to through those devices I'd be dead scared to deploy > them. > > I'd spend that money on the case and controllers as mentioned above and > better storage SSDs. > > I was going to pipe up about the Samsungs, but Mark Kirkwood did beat me > to it. > Unless you can be 100% certain that your workload per storage SSD > doesn't exceed 40GB/day I'd stay very clear of them. > > Christian > Would it be possible to have redundant journals in this case? Per http://www.storagereview.com/micron_realssd_p320h_enterprise_pcie_reviewthe 350gb model has 25PB expectancy. On a purely IO/ps level from benchmarking with 4k writes the Micron is 25x faster than the Samsung 840's we tested with, hence the move to PCI-e journals. > > The only other thing I would be careful about is making sure that your > > SSDs are good about dealing with power failure during writes. Not all > > SSDs behave as you would expect. > > > > > > > > Thanks for your guys assistance with this. > > > > np, good luck! > > > > > > > > > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@... > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@... > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > Thanks again for the responses!
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com