So 3 servers are the entirety of your Ceph storage nodes, right?
Exactly. + 3 Openstack Compute Nodes


Have you been able to determine what causes the drops?
My first guess would be that this bonding is simply not compatible with
what the switches can do/expect.
Yeah, something like that. load balancing round robin kinda works, but it's a 'server side' bonding protocol. The switches don't know anything about that particular configuration.
LACP isn't round-robin, but it does distribute things in fashion and given
the fact that it actually works you should try it.

To be more specific, LACP distribution is based on "sessions", so if you
have enough variety in there you will get something that's good enough.
A single session however will not be faster than an individual link, IIRC.

What do you mean by 'variety'? Do you mean I/O?



Why a single switch and thus a SPoF?
Or are you planning to get 2 switches and plan for more clients and Ceph
nodes down the road?
Sorry I wasn't more clear. Yes, 2 48 port switches. And yes, I am planning to add more Ceph nodes. The backend network also runs on only one failover Gigabit interface right now and I'm planning to utilize the 2 remaining interfaces as well.

If I were in your shoes, I'd look at 2 switches running MC-LAG (in any of
the happy variations there are)
https://en.wikipedia.org/wiki/MC-LAG

And since you're on a budget, something like the Cumulus based offerings
(Penguin computing, etc).
Thanks, I'll look into it. Never heard of that protocol before.

Regards
David

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to