Re: [ceph-users] Best Network Switches for Redundancy

2016-06-01 Thread Adrian Saul

> > For two links it should be quite good - it seemed to balance across
> > that quite well, but with 4 links it seemed to really prefer 2 in my case.
> >
> Just for the record, did you also change the LACP policies on the switches?
>
> From what I gather, having fancy pants L3+4 hashing on the Linux side will not
> fix imbalances by itself, the switches need to be configured likewise.

Yes - I was changing policies on both sides in similar manners but it seemed to 
be that the way the OSDs selected their service ports just happened to hash 
consistently to the same links.   There just wasn't enough variation in the 
combinations of L3+L4 or even L2 hash output to utilise more of the links (the 
even numbered ports and consistent IP pairs just kept returning the same link 
output for the hash algorithm).   Some of the more simplistic round robin 
methods might have got better results but I didn't want to stick with for 
future scalability.

In a larger scale deployment with more clients or a wider pool of OSDs that 
would probably not be the case as there would be greater distribution of hash 
inputs.  Just something to be aware of when you look to do LACP with more than 
2 links.



>
> Christian
> >
> > > -Original Message-
> > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> > > Behalf Of David Riedl
> > > Sent: Thursday, 2 June 2016 2:12 AM
> > > To: ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] Best Network Switches for Redundancy
> > >
> > >
> > > > 4. As Ceph has lots of connections on lots of IP's and port's,
> > > > LACP or the Linux ALB mode should work really well to balance
> connections.
> > > Linux ALB Mode looks promising. Does that work with two switches?
> > > Each server has 4 ports which are 'splitted' and connected to each switch.
> > >  _
> > >/ _[switch]
> > >   / /  ||
> > > [server] ||
> > >  \ \_ ||
> > >   \__[switch]
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > Confidentiality: This email and any attachments are confidential and
> > may be subject to copyright, legal or some other professional privilege.
> > They are intended solely for the attention and use of the named
> > addressee(s). They may only be copied, distributed or disclosed with
> > the consent of the copyright owner. If you have received this email by
> > mistake or by breach of the confidentiality clause, please notify the
> > sender immediately by return email and delete or destroy all copies of
> > the email. Any confidentiality, privilege or copyright is not waived
> > or lost because this email has been sent to you by mistake.
> > ___ ceph-users
> mailing
> > list ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best Network Switches for Redundancy

2016-06-01 Thread Christian Balzer

Hello Adrian,

On Thu, 2 Jun 2016 00:53:41 + Adrian Saul wrote:

> 
> I am currently running our Ceph POC environment using dual Nexus 9372TX
> 10G-T switches, each OSD host has two connections to each switch and
> they are formed into a single 4 link VPC (MC-LAG), which is bonded under
> LACP on the host side.
> 
> What I have noticed is that the various hashing policies for LACP do not
> guarantee you will make full use of all the links.  I tried various
> policies and from what I could see the normal L3+L4 IP and port hashing
> generally worked as good as anything else, but if you have lots of
> similar connections it doesn't seem to hash across all the links and say
> 2 will be heavily used while not much is hashed onto the other links.
> This might have just been because it was a fairly small pool of IPs and
> fairly similar port numbers that just happened to keep hashing to the
> same links (I ended up going to the point of tcpdumping traffic and
> scripting a calculation of what link it should use, it just happened to
> be so consistent).
> 
> For two links it should be quite good - it seemed to balance across that
> quite well, but with 4 links it seemed to really prefer 2 in my case.
>
Just for the record, did you also change the LACP policies on the
switches?

>From what I gather, having fancy pants L3+4 hashing on the Linux side will
not fix imbalances by itself, the switches need to be configured likewise.

Christian 
> 
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> > Of David Riedl
> > Sent: Thursday, 2 June 2016 2:12 AM
> > To: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Best Network Switches for Redundancy
> >
> >
> > > 4. As Ceph has lots of connections on lots of IP's and port's, LACP
> > > or the Linux ALB mode should work really well to balance connections.
> > Linux ALB Mode looks promising. Does that work with two switches? Each
> > server has 4 ports which are 'splitted' and connected to each switch.
> >  _
> >/ _[switch]
> >   / /  ||
> > [server] ||
> >  \ \_ ||
> >   \__[switch]
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> Confidentiality: This email and any attachments are confidential and may
> be subject to copyright, legal or some other professional privilege.
> They are intended solely for the attention and use of the named
> addressee(s). They may only be copied, distributed or disclosed with the
> consent of the copyright owner. If you have received this email by
> mistake or by breach of the confidentiality clause, please notify the
> sender immediately by return email and delete or destroy all copies of
> the email. Any confidentiality, privilege or copyright is not waived or
> lost because this email has been sent to you by mistake.
> ___ ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best Network Switches for Redundancy

2016-06-01 Thread Adrian Saul

I am currently running our Ceph POC environment using dual Nexus 9372TX 10G-T 
switches, each OSD host has two connections to each switch and they are formed 
into a single 4 link VPC (MC-LAG), which is bonded under LACP on the host side.

What I have noticed is that the various hashing policies for LACP do not 
guarantee you will make full use of all the links.  I tried various policies 
and from what I could see the normal L3+L4 IP and port hashing generally worked 
as good as anything else, but if you have lots of similar connections it 
doesn't seem to hash across all the links and say 2 will be heavily used while 
not much is hashed onto the other links.  This might have just been because it 
was a fairly small pool of IPs and fairly similar port numbers that just 
happened to keep hashing to the same links (I ended up going to the point of 
tcpdumping traffic and scripting a calculation of what link it should use, it 
just happened to be so consistent).

For two links it should be quite good - it seemed to balance across that quite 
well, but with 4 links it seemed to really prefer 2 in my case.


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> David Riedl
> Sent: Thursday, 2 June 2016 2:12 AM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Best Network Switches for Redundancy
>
>
> > 4. As Ceph has lots of connections on lots of IP's and port's, LACP or
> > the Linux ALB mode should work really well to balance connections.
> Linux ALB Mode looks promising. Does that work with two switches? Each
> server has 4 ports which are 'splitted' and connected to each switch.
>  _
>/ _[switch]
>   / /  ||
> [server] ||
>  \ \_ ||
>   \__[switch]
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best Network Switches for Redundancy

2016-06-01 Thread Christian Balzer
On Wed, 1 Jun 2016 18:11:54 +0200 David Riedl wrote:

> 
> > 4. As Ceph has lots of connections on lots of IP's and port's, LACP or
> > the Linux ALB mode should work really well to balance connections.
> Linux ALB Mode looks promising. Does that work with two switches? Each 
> server has 4 ports which are 'splitted' and connected to each switch.
>  _
>/ _[switch]
>   / /  ||
> [server] ||
>  \ \_ ||
>   \__[switch]
>
Will it work for the scenario above? 
Yes, and it will probably be better than the bonding mode you use now.
https://en.wikipedia.org/wiki/Link_aggregation

However if you read the article above, you will notice that it may NOT do
a fully automatic failover in case of switch failure, as it does its magic
by ARP trickery. 

LACP (with MC-LAG capable switches) will just work [TM].

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best Network Switches for Redundancy

2016-06-01 Thread David Riedl



4. As Ceph has lots of connections on lots of IP's and port's, LACP or the
Linux ALB mode should work really well to balance connections.
Linux ALB Mode looks promising. Does that work with two switches? Each 
server has 4 ports which are 'splitted' and connected to each switch.

_
  / _[switch]
 / /  ||
[server] ||
\ \_ ||
 \__[switch]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best Network Switches for Redundancy

2016-06-01 Thread Nick Fisk
Just a couple of points.

1. I know you said 10G was not an option, but I would really push for it.
You can pick up Dell 10G-T switches (N4032) for not a lot more than a 48
port 1G switch. They make a lot more difference than just 10x the bandwidth.
With Ceph latency is critical. As its 10G-T, you can use the existing 1GB
Nic's until with the new switches until you can upgrade the Nic's.

2. If you still only want 1GB, maybe something like the HP 2920's?

3. RR load balancing is probably not working well for you, due to out of
order TCP packets. As the amount of traffic increases with RR, packets
arrive slightly in the wrong order, which causes them to be retransmitted.
This eventually snowballs and all sorts of nasty stuff starts happening.

4. As Ceph has lots of connections on lots of IP's and port's, LACP or the
Linux ALB mode should work really well to balance connections.

5. Read #1 again, if you are buying a new switch anyway, don't throw good
money after bad.

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Christian Balzer
> Sent: 01 June 2016 10:16
> To: ceph-us...@ceph.com
> Subject: Re: [ceph-users] Best Network Switches for Redundancy
> 
> 
> Hello,
> 
> On Wed, 1 Jun 2016 11:03:16 +0200 David Riedl wrote:
> 
> >
> > > So 3 servers are the entirety of your Ceph storage nodes, right?
> > Exactly. + 3 Openstack Compute Nodes
> >
> >
> > > Have you been able to determine what causes the drops?
> > > My first guess would be that this bonding is simply not compatible
> > > with what the switches can do/expect.
> > >
> > Yeah, something like that. load balancing round robin kinda works, but
> > it's a 'server side' bonding protocol. The switches don't know
> > anything about that particular configuration.
> > > LACP isn't round-robin, but it does distribute things in fashion and
> > > given the fact that it actually works you should try it.
> > >
> > > To be more specific, LACP distribution is based on "sessions", so if
> > > you have enough variety in there you will get something that's good
> > > enough. A single session however will not be faster than an
> > > individual link, IIRC.
> > >
> > What do you mean by 'variety'? Do you mean I/O?
> >
> Variety as in what those sessions (hashes) are based on.
> Usually IP addresses.
> So if you were to send just data over one specific TCP session from
> 10.0.0.1 to 10.0.0.2 it would go over one of your interfaces only, not
both.
> 
> So on some my servers with LACP I see noticeable differences between
> interface usage, especially when sending traffic is just to one other node
> usually.
> On others with a sufficiently large number of connections to various
hosts, it
> approaches uniform utilization.
> 
> > >
> > >
> > > Why a single switch and thus a SPoF?
> > > Or are you planning to get 2 switches and plan for more clients and
> > > Ceph nodes down the road?
> > Sorry I wasn't more clear. Yes, 2 48 port switches. And yes, I am
> > planning to add more Ceph nodes. The backend network also runs on only
> > one failover Gigabit interface right now and I'm planning to utilize
> > the
> > 2 remaining interfaces as well.
> >
> Then mLAG, mc-lag, vlag, clag is for you.
> 
> Also consider a flat network consisting of 4 mc-LAGed interfaces instead
of a
> private cluster and client network.
> At 4Gb/s total your local storage is still going to be most likely faster
than you
> network bandwidth.
> 
> > >
> > > If I were in your shoes, I'd look at 2 switches running MC-LAG (in
> > > any of the happy variations there are)
> > > https://en.wikipedia.org/wiki/MC-LAG
> > >
> > > And since you're on a budget, something like the Cumulus based
> > > offerings (Penguin computing, etc).
> > Thanks, I'll look into it. Never heard of that protocol before.
> >
> It's just LACP over multiple switches, giving you full redundancy AND
> bandwidth.
> 
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best Network Switches for Redundancy

2016-06-01 Thread Christian Balzer

Hello,

On Wed, 1 Jun 2016 11:03:16 +0200 David Riedl wrote:

> 
> > So 3 servers are the entirety of your Ceph storage nodes, right?
> Exactly. + 3 Openstack Compute Nodes
> 
> 
> > Have you been able to determine what causes the drops?
> > My first guess would be that this bonding is simply not compatible with
> > what the switches can do/expect.
> >   
> Yeah, something like that. load balancing round robin kinda works, but 
> it's a 'server side' bonding protocol. The switches don't know anything 
> about that particular configuration.
> > LACP isn't round-robin, but it does distribute things in fashion and
> > given the fact that it actually works you should try it.
> >
> > To be more specific, LACP distribution is based on "sessions", so if
> > you have enough variety in there you will get something that's good
> > enough. A single session however will not be faster than an individual
> > link, IIRC.
> >
> What do you mean by 'variety'? Do you mean I/O?
> 
Variety as in what those sessions (hashes) are based on. 
Usually IP addresses.
So if you were to send just data over one specific TCP session from
10.0.0.1 to 10.0.0.2 it would go over one of your interfaces only, not
both.

So on some my servers with LACP I see noticeable differences between
interface usage, especially when sending traffic is just to one other
node usually. 
On others with a sufficiently large number of connections to various
hosts, it approaches uniform utilization. 

> >
> >
> > Why a single switch and thus a SPoF?
> > Or are you planning to get 2 switches and plan for more clients and
> > Ceph nodes down the road?
> Sorry I wasn't more clear. Yes, 2 48 port switches. And yes, I am 
> planning to add more Ceph nodes. The backend network also runs on only 
> one failover Gigabit interface right now and I'm planning to utilize the 
> 2 remaining interfaces as well.
>
Then mLAG, mc-lag, vlag, clag is for you.

Also consider a flat network consisting of 4 mc-LAGed interfaces instead of
a private cluster and client network.
At 4Gb/s total your local storage is still going to be most likely faster
than you network bandwidth.

> >
> > If I were in your shoes, I'd look at 2 switches running MC-LAG (in any
> > of the happy variations there are)
> > https://en.wikipedia.org/wiki/MC-LAG
> >
> > And since you're on a budget, something like the Cumulus based
> > offerings (Penguin computing, etc).
> Thanks, I'll look into it. Never heard of that protocol before.
> 
It's just LACP over multiple switches, giving you full redundancy AND
bandwidth.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best Network Switches for Redundancy

2016-06-01 Thread David Riedl



So 3 servers are the entirety of your Ceph storage nodes, right?

Exactly. + 3 Openstack Compute Nodes



Have you been able to determine what causes the drops?
My first guess would be that this bonding is simply not compatible with
what the switches can do/expect.
  
Yeah, something like that. load balancing round robin kinda works, but 
it's a 'server side' bonding protocol. The switches don't know anything 
about that particular configuration.

LACP isn't round-robin, but it does distribute things in fashion and given
the fact that it actually works you should try it.

To be more specific, LACP distribution is based on "sessions", so if you
have enough variety in there you will get something that's good enough.
A single session however will not be faster than an individual link, IIRC.


What do you mean by 'variety'? Do you mean I/O?




Why a single switch and thus a SPoF?
Or are you planning to get 2 switches and plan for more clients and Ceph
nodes down the road?
Sorry I wasn't more clear. Yes, 2 48 port switches. And yes, I am 
planning to add more Ceph nodes. The backend network also runs on only 
one failover Gigabit interface right now and I'm planning to utilize the 
2 remaining interfaces as well.


If I were in your shoes, I'd look at 2 switches running MC-LAG (in any of
the happy variations there are)
https://en.wikipedia.org/wiki/MC-LAG

And since you're on a budget, something like the Cumulus based offerings
(Penguin computing, etc).

Thanks, I'll look into it. Never heard of that protocol before.

Regards
David

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best Network Switches for Redundancy

2016-06-01 Thread Christian Balzer

Hello,

firstly, I'm not the main network guy here by a long shot, OTOH I do know
a thing or two, my they just be from trial and error.

On Wed, 1 Jun 2016 09:49:53 +0200 David Riedl wrote:

> Hello everybody,
> 
> we want to upgrade/fix our SAN switches. I kinda screwed up when I was 
> first planning our CEPH storage cluster.
> 
> Right now we have 2 x HP 2530-24G Switch (J9776A). We have 3 server each 
> outfittet with 2 x 4 gigabit cards. (Don't judge me, I also was on a
> budget)
> 
So 3 servers are the entirety of your Ceph storage nodes, right?

> Each card goes with 2 cables to one of the two switches for redundancy.
> 
> The bonds on the servers are configured as mode 0 (load balancing 
> (round-robin)).
> 
> The cluster works but I guess due to high drop rate on the switch 
> interfaces we have a pretty bad latency.
>
Have you been able to determine what causes the drops?
My first guess would be that this bonding is simply not compatible with
what the switches can do/expect. 
 
> I also read that LACP would be the answer to my problem, since I want to 
> utilize all interfaces and have redundancy at the same time.
> 
LACP isn't round-robin, but it does distribute things in fashion and given
the fact that it actually works you should try it. 

To be more specific, LACP distribution is based on "sessions", so if you
have enough variety in there you will get something that's good enough.
A single session however will not be faster than an individual link, IIRC.

> 
> So, back to my Question:
> 
> What 48 port gigabit switch is the best replacement for that type of 
> configuration? 10GB is not an option.
> 

Why a single switch and thus a SPoF?
Or are you planning to get 2 switches and plan for more clients and Ceph
nodes down the road?

If I were in your shoes, I'd look at 2 switches running MC-LAG (in any of
the happy variations there are)
https://en.wikipedia.org/wiki/MC-LAG

And since you're on a budget, something like the Cumulus based offerings
(Penguin computing, etc).

Christian
> 
> 
> 
> Regards
> 
> David
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Best Network Switches for Redundancy

2016-06-01 Thread David Riedl

Hello everybody,

we want to upgrade/fix our SAN switches. I kinda screwed up when I was 
first planning our CEPH storage cluster.


Right now we have 2 x HP 2530-24G Switch (J9776A). We have 3 server each 
outfittet with 2 x 4 gigabit cards. (Don't judge me, I also was on a budget)


Each card goes with 2 cables to one of the two switches for redundancy.

The bonds on the servers are configured as mode 0 (load balancing 
(round-robin)).


The cluster works but I guess due to high drop rate on the switch 
interfaces we have a pretty bad latency.


I also read that LACP would be the answer to my problem, since I want to 
utilize all interfaces and have redundancy at the same time.



So, back to my Question:

What 48 port gigabit switch is the best replacement for that type of 
configuration? 10GB is not an option.





Regards

David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com