Hey All,

Thanks for the quick responses! I have chosen the micron pci-e card due to
its benchmark results on
http://www.storagereview.com/micron_realssd_p320h_enterprise_pcie_review .
Per the vendor the card has
a 25PB life expectancy so I'm not terribly worried about it failing on me
too soon :)

Christian Balzer <chibi@...> writes:

>
> On Wed, 14 May 2014 19:28:17 -0500 Mark Nelson wrote:
>
> > On 05/14/2014 06:36 PM, Tyler Wilson wrote:
> > > Hey All,
> >
> > Hi!
> >
> > >
> > > I am setting up a new storage cluster that absolutely must have the
> > > best read/write sequential speed  <at>  128k and the highest IOps at
4k
> > > read/write as possible.
> >
> > I assume random?
> >
> > >
> > > My current specs for each storage node are currently;
> > > CPU: 2x E5-2670V2
> > > Motherboard: SM X9DRD-EF
> > > OSD Disks: 20-30 Samsung 840 1TB
> > > OSD Journal(s): 1-2 Micron RealSSD P320h
> > > Network: 4x 10gb, Bridged
> I assume you mean 2x10Gb bonded for public and 2x10Gb for cluster network?
>
> The SSDs you specified would read at about 500MB/s, meaning that only 4 of
> them would already saturate your network uplink.
> For writes (assuming journal on SSDs, see below) you reach that point with
> just 8 SSDs.
>

the 4x 10gb will be ceph-storage only traffic with public and management
being on-board interfaces.
This is expandable to 80Gbps if needed.


> > > Memory: 32-96GB depending on need
> RAM is pretty cheap these days and a large pagecache on the storage nodes
> is always quite helpful.
>

Noted, I wasn't sure how Ceph used the linux memory cache or if it would
benefit us.

> > >
>
> How many of these nodes are you planning to deploy initially?
> As always and especially when going for performance, more and smaller
> nodes tend to be better, also less impact if one goes down.
> And in your case it is easier to balance storage and network bandwidth,
> see above.
>

2 storage nodes per location at start, these are serving OpenStack VM's so
whenever it gets utilized
enough to warrant more.

> > > Does anyone see any potential bottlenecks in the above specs? What
kind
> > > of improvements or configurations can we make on the OSD config side?
> > > We are looking to run this with 2 replication.
> >
> > Likely you'll run into latency due to context switching and lock
> > contention in the OSDs and maybe even some kernel slowness.
 Potentially
> > you could end up CPU limited too, even with E5-2670s given how fast all
> > of those SSDs are.  I'd suggest considering a chassis without an
> > expander backplane and using multiple controllers with the drives
> > directly attached.
> >
>
> Indeed, I'd be worried about that as well, same with the
> chassis/controller bit.
>


Thanks for the advise on the controller card, we will look into different
chassis options w/ the LSI
cards recommended on the InkTank docs.
Would running a different distribution affect this at all? Our target was
CentOS 6 however if a more
recent kernel would make a difference we could switch.

> > There's work going into improving things on the Ceph side but I don't
> > know how much of it has even hit our wip branches in github yet.  So
for
> > now ymmv, but there's a lot of work going on in this area as it's
> > something that lots of folks are interested in.
> >
> If you look at the current "Slow IOPS on RBD compared to journal and
> backing devices" thread and the Inktank document referenced in it
>
>
https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf

>
> you should probably assume no more than 800 random write IOPS and 4000
> random read IOPS per OSD (4KB block size).
> That later number I can also reproduce with my cluster.
>
> Now I expect those numbers to go up as Ceph is improved, but for the time
> being those limits might influence your choice of hardware.
>
> > I'd also suggest testing whether or not putting all of the journals on
> > the RealSSD cards actually helps you that much over just putting your
> > journals on the other SSDs.  The advantage here is that by putting
> > journals on the 2.5" SSDs, you don't lose a pile of OSDs if one of
those
> > PCIE cards fails.
> >
> More than seconded, I could only find READ values on the Micron site which
> makes me very suspicious, as the journal's main role is to be able to
> WRITE as fast as possible. Also all journals combined ought to be faster
> than your final storage.
> Lastly there was no endurance data on the Micron site either and with ALL
> your writes having to through those devices I'd be dead scared to deploy
> them.
>
> I'd spend that money on the case and controllers as mentioned above and
> better storage SSDs.
>
> I was going to pipe up about the Samsungs, but Mark Kirkwood did beat me
> to it.
> Unless you can be 100% certain that your workload per storage SSD
> doesn't exceed 40GB/day I'd stay very clear of them.
>
> Christian
>

Would it be possible to have redundant journals in this case? Per
http://www.storagereview.com/micron_realssd_p320h_enterprise_pcie_reviewthe
350gb model has 25PB
expectancy. On a purely IO/ps level from benchmarking with 4k writes the
Micron is 25x faster than the
Samsung 840's we tested with, hence the move to PCI-e journals.


> > The only other thing I would be careful about is making sure that your
> > SSDs are good about dealing with power failure during writes.  Not all
> > SSDs behave as you would expect.
> >
> > >
> > > Thanks for your guys assistance with this.
> >
> > np, good luck!
> >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@...
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@...
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>

Thanks again for the responses!
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to