Re: [ceph-users] Cost- and Powerefficient OSD-Nodes

Scott Laird Tue, 28 Apr 2015 10:13:05 -0700

FYI, most Juniper switches hash LAGs on IP+port, so you'd get somewhat
better performance than you would with simple MAC or IP hashing.  10G is
better if you can afford it, though.


On Tue, Apr 28, 2015 at 9:55 AM Nick Fisk <n...@fisk.me.uk> wrote:

>
>
>
>
> > -----Original Message-----
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> > Dominik Hannen
> > Sent: 28 April 2015 17:08
> > To: Nick Fisk
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Cost- and Powerefficient OSD-Nodes
> >
> > >> Interconnect as currently planned:
> > >> 4 x 1Gbit LACP Bonds over a pair of MLAG-capable switches (planned:
> > >> EX3300)
> >
> > > If you can do 10GB networking its really worth it. I found that with
> > > 1G, latency effects your performance before you max out the bandwidth.
> > > We got some Supermicro servers with 10GB-T onboard for a tiny price
> > > difference and some basic 10GB-T switches.
> >
> > I do not except to max out the bandwidth. My estimation would be 200 MB/s
> > r/w are needed at maximum.
> >
> > The performance-metric that suffers most as far as I read would be IOPS?
> > How many IOps do you think will be possible with 8 x 4osd-nodes with
> > 4x1Gbit (distributed among all the clients, VMs, etc)
>
> It's all about the total latency per operation. Most IO sizes over 10GB
> don't make much difference to the Round Trip Time. But comparatively even
> 128KB IO's over 1GB take quite a while. For example ping a host with a
> payload of 64k over 1GB and 10GB networks and look at the difference in
> times. Now double this for Ceph (Client->Prim OSD->Sec OSD)
>
> When you are using SSD journals you normally end up with write latency of
> 3-4ms over 10GB, 1GB networking will probably increase this by another
> 2-4ms. IOPs=1000/latency
>
> I guess it all really depends on how important performance is
>
>
> >
> > >> 250GB SSD - Journal (MX200 250GB with extreme over-provisioning,
> > >> staggered deployment, monitored for TBW-Value)
> >
> > > Not sure if that SSD would be suitable for a journal. I would
> > > recommend going with one of the Intel 3700's. You could also save a
> > > bit and run the OS from it.
> >
> > I am still on the fence about ditching the SATA-DOM and install the OS on
> the
> > SSD as well.
> >
> > If the MX200 turn out to be unsuited, I can still use them for other
> purposes
> > and fetch some better SSDs later.
> >
> > >> Seagate Surveillance HDD (ST3000VX000) 7200rpm
> >
> > > Would also possibly consider a more NAS/Enterprise friendly HDD
> >
> > I thought video-surveillance HDDs would be a nice fit, they are build to
> run
> > 24/7 and to write multiple data-stream at the same time to disk.
> > Also cheap, which enables me to get more nodes from the start.
>
> Just had a look and the Seagate Surveillance disks spin at 7200RPM (missed
> that you put that there), whereas the WD ones that I am familiar with spin
> at 5400rpm, so not as bad as I thought.
>
> So probably ok to use, but I don't see many people using them for Ceph/
> generic NAS so can't be sure there's no hidden gotchas.
>
> >
> > > CPU might be on the limit, but would probably suffice. If anything you
> > > won't max out all the cores, but the overall speed of the CPU might
> > > increase latency, which may or may not be a problem for you.
> >
> > Do you have some values, so that I can imagine the difference?
> > I also maintain another cluster with dual-socket hexa-core Xeon
> 12osd-nodes
> > and all the CPUs do is idling. And the 2x10G LACP Link is usually never
> used
> > above 1 Gbit.
> > Hence the focus on cost-efficiency with this build.
>
> Sorry nothing in detail, I did actually build a ceph cluster on the same 8
> core CPU as you have listed. I didn't have any performance problems but I
> do
> remember with SSD journals when doing high queue depth writes I could get
> the CPU quite high. It's like what I said before about the 1vs10Gb
> networking, how important is performance, If using this CPU gives you an
> extra 1ms of latency per OSD, is that acceptable?
>
> Agree 12cores (guessing 2.5Ghz each) will be an overkill for just 12 OSDs.
> I
> have a very similar spec and see exactly the same as you, but will change
> the nodes to 1CPU each when I expand and use the spare CPU's for the new
> nodes.
>
> I'm using this:-
>
> http://www.supermicro.nl/products/system/4U/F617/SYS-F617H6-FTPTL_.cfm
>
> Mainly because of rack density, which I know doesn't apply to you. But the
> fact they share PSU's/Rails/Chassis helps reduce power a bit and drives
> down
> cost
>
> I can get 14 disks in each and they have 10GB on board. The SAS controller
> is flashable to JBOD mode.
>
> Maybe one of the other Twin solutions might be suitable?
>
> >
> > >> Are there any cost-effective suggestions to improve this
> configuration?
> >
> > > Have you looked at a normal Xeon based server but with more disks per
> > > node? Depending on how much capacity you need spending a little more
> > > per server but allowing you to have more disks per server might work
> > > out cheaper.
> >
> > > There are some interesting SuperMicro combinations, or if you want to
> > > go really cheap, you could buy Case,MB,CPU...etc separately and build
> > > yourself.
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cost- and Powerefficient OSD-Nodes

Reply via email to