FYI, most Juniper switches hash LAGs on IP+port, so you'd get somewhat better performance than you would with simple MAC or IP hashing. 10G is better if you can afford it, though.
On Tue, Apr 28, 2015 at 9:55 AM Nick Fisk <n...@fisk.me.uk> wrote: > > > > > > -----Original Message----- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > > Dominik Hannen > > Sent: 28 April 2015 17:08 > > To: Nick Fisk > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] Cost- and Powerefficient OSD-Nodes > > > > >> Interconnect as currently planned: > > >> 4 x 1Gbit LACP Bonds over a pair of MLAG-capable switches (planned: > > >> EX3300) > > > > > If you can do 10GB networking its really worth it. I found that with > > > 1G, latency effects your performance before you max out the bandwidth. > > > We got some Supermicro servers with 10GB-T onboard for a tiny price > > > difference and some basic 10GB-T switches. > > > > I do not except to max out the bandwidth. My estimation would be 200 MB/s > > r/w are needed at maximum. > > > > The performance-metric that suffers most as far as I read would be IOPS? > > How many IOps do you think will be possible with 8 x 4osd-nodes with > > 4x1Gbit (distributed among all the clients, VMs, etc) > > It's all about the total latency per operation. Most IO sizes over 10GB > don't make much difference to the Round Trip Time. But comparatively even > 128KB IO's over 1GB take quite a while. For example ping a host with a > payload of 64k over 1GB and 10GB networks and look at the difference in > times. Now double this for Ceph (Client->Prim OSD->Sec OSD) > > When you are using SSD journals you normally end up with write latency of > 3-4ms over 10GB, 1GB networking will probably increase this by another > 2-4ms. IOPs=1000/latency > > I guess it all really depends on how important performance is > > > > > > >> 250GB SSD - Journal (MX200 250GB with extreme over-provisioning, > > >> staggered deployment, monitored for TBW-Value) > > > > > Not sure if that SSD would be suitable for a journal. I would > > > recommend going with one of the Intel 3700's. You could also save a > > > bit and run the OS from it. > > > > I am still on the fence about ditching the SATA-DOM and install the OS on > the > > SSD as well. > > > > If the MX200 turn out to be unsuited, I can still use them for other > purposes > > and fetch some better SSDs later. > > > > >> Seagate Surveillance HDD (ST3000VX000) 7200rpm > > > > > Would also possibly consider a more NAS/Enterprise friendly HDD > > > > I thought video-surveillance HDDs would be a nice fit, they are build to > run > > 24/7 and to write multiple data-stream at the same time to disk. > > Also cheap, which enables me to get more nodes from the start. > > Just had a look and the Seagate Surveillance disks spin at 7200RPM (missed > that you put that there), whereas the WD ones that I am familiar with spin > at 5400rpm, so not as bad as I thought. > > So probably ok to use, but I don't see many people using them for Ceph/ > generic NAS so can't be sure there's no hidden gotchas. > > > > > > CPU might be on the limit, but would probably suffice. If anything you > > > won't max out all the cores, but the overall speed of the CPU might > > > increase latency, which may or may not be a problem for you. > > > > Do you have some values, so that I can imagine the difference? > > I also maintain another cluster with dual-socket hexa-core Xeon > 12osd-nodes > > and all the CPUs do is idling. And the 2x10G LACP Link is usually never > used > > above 1 Gbit. > > Hence the focus on cost-efficiency with this build. > > Sorry nothing in detail, I did actually build a ceph cluster on the same 8 > core CPU as you have listed. I didn't have any performance problems but I > do > remember with SSD journals when doing high queue depth writes I could get > the CPU quite high. It's like what I said before about the 1vs10Gb > networking, how important is performance, If using this CPU gives you an > extra 1ms of latency per OSD, is that acceptable? > > Agree 12cores (guessing 2.5Ghz each) will be an overkill for just 12 OSDs. > I > have a very similar spec and see exactly the same as you, but will change > the nodes to 1CPU each when I expand and use the spare CPU's for the new > nodes. > > I'm using this:- > > http://www.supermicro.nl/products/system/4U/F617/SYS-F617H6-FTPTL_.cfm > > Mainly because of rack density, which I know doesn't apply to you. But the > fact they share PSU's/Rails/Chassis helps reduce power a bit and drives > down > cost > > I can get 14 disks in each and they have 10GB on board. The SAS controller > is flashable to JBOD mode. > > Maybe one of the other Twin solutions might be suitable? > > > > > >> Are there any cost-effective suggestions to improve this > configuration? > > > > > Have you looked at a normal Xeon based server but with more disks per > > > node? Depending on how much capacity you need spending a little more > > > per server but allowing you to have more disks per server might work > > > out cheaper. > > > > > There are some interesting SuperMicro combinations, or if you want to > > > go really cheap, you could buy Case,MB,CPU...etc separately and build > > > yourself. > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com