Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-28 Thread Jarod Wilson

Uwe Koziolek wrote:

Am 03.09.2015 um 17:05 schrieb Jay Vosburgh:

Uwe Koziolek  wrote:


On Tue, Sep 01, 2015 at 05:41 PM +0200, Andy Gospodarek wrote:

On Mon, Aug 17, 2015 at 10:51:27PM +0200, Uwe Koziolek wrote:

On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:

Uwe Koziolek  wrote:

...

I probably should have asked this in the beginning, but at what
range of arp_interval values does the problem manifest? If it's a race
condition with the switch update, I'd expect that only very small
arp_interval values would be affected.

Also, your proposed comment wraps past 80 columns.

-J


Only 500 msecs arp interval is used, no other values are checked.
Wraps in patch are now removed.

diff -up ./drivers/net/bonding/bond_main.c.orig
./drivers/net/bonding/bond_main.c
--- ./drivers/net/bonding/bond_main.c.orig 2015-08-30 20:34:09.0
+0200
+++ ./drivers/net/bonding/bond_main.c 2015-09-04 11:59:05.755897182 +0200
@@ -2795,6 +2795,17 @@ static bool bond_ab_arp_probe(struct bon
return should_notify_rtnl;
}

+ /* sometimes the forwarding tables of the switches are not updated
+ * fast enough. the first arp response after a slave change is received
+ * on the wrong slave.
+ * the arp requests will be retried 2 times on the same slave
+ */
+
+ if (bond_time_in_interval(bond, curr_arp_slave->last_link_up, 2)) {
+ bond_arp_send_all(bond, curr_arp_slave);
+ return should_notify_rtnl;
+ }
+
bond_set_slave_inactive_flags(curr_arp_slave, BOND_SLAVE_NOTIFY_LATER);

bond_for_each_slave_rcu(bond, slave, iter) {


Jay, any further issues with this patch? I know Veaceslav was concerned 
about it breaking the logic for link state management if there's no 
current active slave for 2 * arp_interval, while Andy seemed okay with 
it, provided there was a comment explaining. Just looking at what might 
have to be done next here to keep heading towards a resolution.


Thanks much,

--
Jarod Wilson
ja...@redhat.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-28 Thread Jarod Wilson

Uwe Koziolek wrote:

Am 03.09.2015 um 17:05 schrieb Jay Vosburgh:

Uwe Koziolek  wrote:


On Tue, Sep 01, 2015 at 05:41 PM +0200, Andy Gospodarek wrote:

On Mon, Aug 17, 2015 at 10:51:27PM +0200, Uwe Koziolek wrote:

On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:

Uwe Koziolek  wrote:

...

I probably should have asked this in the beginning, but at what
range of arp_interval values does the problem manifest? If it's a race
condition with the switch update, I'd expect that only very small
arp_interval values would be affected.

Also, your proposed comment wraps past 80 columns.

-J


Only 500 msecs arp interval is used, no other values are checked.
Wraps in patch are now removed.

diff -up ./drivers/net/bonding/bond_main.c.orig
./drivers/net/bonding/bond_main.c
--- ./drivers/net/bonding/bond_main.c.orig 2015-08-30 20:34:09.0
+0200
+++ ./drivers/net/bonding/bond_main.c 2015-09-04 11:59:05.755897182 +0200
@@ -2795,6 +2795,17 @@ static bool bond_ab_arp_probe(struct bon
return should_notify_rtnl;
}

+ /* sometimes the forwarding tables of the switches are not updated
+ * fast enough. the first arp response after a slave change is received
+ * on the wrong slave.
+ * the arp requests will be retried 2 times on the same slave
+ */
+
+ if (bond_time_in_interval(bond, curr_arp_slave->last_link_up, 2)) {
+ bond_arp_send_all(bond, curr_arp_slave);
+ return should_notify_rtnl;
+ }
+
bond_set_slave_inactive_flags(curr_arp_slave, BOND_SLAVE_NOTIFY_LATER);

bond_for_each_slave_rcu(bond, slave, iter) {


Jay, any further issues with this patch? I know Veaceslav was concerned 
about it breaking the logic for link state management if there's no 
current active slave for 2 * arp_interval, while Andy seemed okay with 
it, provided there was a comment explaining. Just looking at what might 
have to be done next here to keep heading towards a resolution.


Thanks much,

--
Jarod Wilson
ja...@redhat.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-04 Thread Uwe Koziolek

Am 03.09.2015 um 17:05 schrieb Jay Vosburgh:

Uwe Koziolek  wrote:


On Tue, Sep 01, 2015 at 05:41 PM +0200, Andy Gospodarek wrote:

On Mon, Aug 17, 2015 at 10:51:27PM +0200, Uwe Koziolek wrote:

On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:

Uwe Koziolek  wrote:


On2015-08-17 07:12 PM,Jarod Wilson wrote:

On 2015-08-17 12:55 PM, Veaceslav Falico wrote:

On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:

From: Uwe Koziolek 

With some very finicky switch hardware, active backup bonding can get
into
a situation where we play ping-pong between interfaces, trying to get
one
to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost,
so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an
additional
arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing
with
this hardware combination.

Sorry, I don't understand the logic of why it works, and what exactly
are
we fixiing here.

It also breaks completely the logic for link state management in case
of no
current active slave for 2*arp_interval.

Could you please elaborate what exactly is fixed here, and how it
works? :)

I can either duplicate some information from the bug, or Uwe can, to
illustrate the exact nature of the problem.


p.s. num_grat_arp maybe could help?

That was my thought as well, but as I understand it, that route was
explored, and it didn't help any. I don't actually have a reproducer
setup of my own, unfortunately, so I'm kind of caught in the middle
here...

Uwe, can you perhaps further enlighten us as to what num_grat_arp
settings were tried that didn't help? I'm still of the mind that if
num_grat_arp *didn't* help, we probably need to do something keyed off
num_grat_arp.

The bonding slaves are connected to high available switches, each of the
slaves is connected to a different switch. If the bond is starting, only
the selected slave sends one arp-request. If a matching arp_response was
received, this slave and the bond is going into state up, sending the
gratitious arps...
But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other software
bugs, or with a single switch configuration, you would get a arp response
on the first arp request.
But in case of high availability configuration with non perfect switches
like HP ProCurve 54xx, also with some Cisco models, you may not get a
response on the first arp request.

I have seen network snoops, there the switches are not responding to the
first arp request on slave 1, the second arp request was sent on slave 2
but the response was received on slave one,  and all following arp
requests are anwsered on the wrong slave for a longer time.

Could you elaborate on the exact "high availability
configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
updating the forwarding tables and the bond flip flopping between the
slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
the forwarding table, but (for whatever reason) there is no reply.  ARP
on slave 2 is sent (using the same source MAC, unless you set
fail_over_mac), but forwarding tables still send that MAC to slave 1, so
reply is sent there.

High availability:
2 managed switches with routing capabilities have an interconnect.
One slave of a bonding interface is connected to the first switch, the
second slave is connected to the other switch.
The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
remember also HP E 3500 and  E 3800 are also
affected, for the affected Cisco models I can't answer today.
Affected single switch configurations was not seen.

Yes, race conditions with delayed upgrades of the forwarding tables is a
well matching explanation for the problem.


The proposed change sents up to 3 arp requests on a down bond using the
same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the right
slave at latest on the second arp request. So the bond is going into state
up.

How does it works:
The bonds in up state are handled on the beginning of bond_ab_arp_probe
procedure, the other part of this procedure is handling the slave change.
The proposed change is bypassing the slave change for 2 additional calls
of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also
implemented for a down bond.

Does this delay failover or bringup on switches that are not
"problematic"?  I.e., if arp_interval is, say, 1000 (1 second), will
this impact failover / recovery times?

-J

It depends.
failover times are not impacted, this is handled different.
Only the 

Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-04 Thread Uwe Koziolek

Am 03.09.2015 um 17:05 schrieb Jay Vosburgh:

Uwe Koziolek  wrote:


On Tue, Sep 01, 2015 at 05:41 PM +0200, Andy Gospodarek wrote:

On Mon, Aug 17, 2015 at 10:51:27PM +0200, Uwe Koziolek wrote:

On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:

Uwe Koziolek  wrote:


On2015-08-17 07:12 PM,Jarod Wilson wrote:

On 2015-08-17 12:55 PM, Veaceslav Falico wrote:

On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:

From: Uwe Koziolek 

With some very finicky switch hardware, active backup bonding can get
into
a situation where we play ping-pong between interfaces, trying to get
one
to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost,
so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an
additional
arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing
with
this hardware combination.

Sorry, I don't understand the logic of why it works, and what exactly
are
we fixiing here.

It also breaks completely the logic for link state management in case
of no
current active slave for 2*arp_interval.

Could you please elaborate what exactly is fixed here, and how it
works? :)

I can either duplicate some information from the bug, or Uwe can, to
illustrate the exact nature of the problem.


p.s. num_grat_arp maybe could help?

That was my thought as well, but as I understand it, that route was
explored, and it didn't help any. I don't actually have a reproducer
setup of my own, unfortunately, so I'm kind of caught in the middle
here...

Uwe, can you perhaps further enlighten us as to what num_grat_arp
settings were tried that didn't help? I'm still of the mind that if
num_grat_arp *didn't* help, we probably need to do something keyed off
num_grat_arp.

The bonding slaves are connected to high available switches, each of the
slaves is connected to a different switch. If the bond is starting, only
the selected slave sends one arp-request. If a matching arp_response was
received, this slave and the bond is going into state up, sending the
gratitious arps...
But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other software
bugs, or with a single switch configuration, you would get a arp response
on the first arp request.
But in case of high availability configuration with non perfect switches
like HP ProCurve 54xx, also with some Cisco models, you may not get a
response on the first arp request.

I have seen network snoops, there the switches are not responding to the
first arp request on slave 1, the second arp request was sent on slave 2
but the response was received on slave one,  and all following arp
requests are anwsered on the wrong slave for a longer time.

Could you elaborate on the exact "high availability
configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
updating the forwarding tables and the bond flip flopping between the
slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
the forwarding table, but (for whatever reason) there is no reply.  ARP
on slave 2 is sent (using the same source MAC, unless you set
fail_over_mac), but forwarding tables still send that MAC to slave 1, so
reply is sent there.

High availability:
2 managed switches with routing capabilities have an interconnect.
One slave of a bonding interface is connected to the first switch, the
second slave is connected to the other switch.
The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
remember also HP E 3500 and  E 3800 are also
affected, for the affected Cisco models I can't answer today.
Affected single switch configurations was not seen.

Yes, race conditions with delayed upgrades of the forwarding tables is a
well matching explanation for the problem.


The proposed change sents up to 3 arp requests on a down bond using the
same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the right
slave at latest on the second arp request. So the bond is going into state
up.

How does it works:
The bonds in up state are handled on the beginning of bond_ab_arp_probe
procedure, the other part of this procedure is handling the slave change.
The proposed change is bypassing the slave change for 2 additional calls
of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also
implemented for a down bond.

Does this delay failover or bringup on switches that are not
"problematic"?  I.e., if arp_interval is, say, 1000 (1 second), will
this impact failover / recovery times?

-J

It 

Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-03 Thread Jay Vosburgh
Uwe Koziolek  wrote:

>On Tue, Sep 01, 2015 at 05:41 PM +0200, Andy Gospodarek wrote:
>> On Mon, Aug 17, 2015 at 10:51:27PM +0200, Uwe Koziolek wrote:
>>> On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:
 Uwe Koziolek  wrote:

> On2015-08-17 07:12 PM,Jarod Wilson wrote:
>> On 2015-08-17 12:55 PM, Veaceslav Falico wrote:
>>> On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:
 From: Uwe Koziolek 

 With some very finicky switch hardware, active backup bonding can get
 into
 a situation where we play ping-pong between interfaces, trying to get
 one
 to come up as the active slave. There seems to be an issue with the
 switch's arp replies either taking too long, or simply getting lost,
 so we
 wind up unable to get any interface up and active. Sometimes, the issue
 sorts itself out after a while, sometimes it doesn't.

 Testing with num_grat_arp has proven fruitless, but sending an
 additional
 arp on curr_arp_slave if we're still in the arp_interval timeslice in
 bond_ab_arp_probe(), has shown to produce 100% reliability in testing
 with
 this hardware combination.
>>> Sorry, I don't understand the logic of why it works, and what exactly
>>> are
>>> we fixiing here.
>>>
>>> It also breaks completely the logic for link state management in case
>>> of no
>>> current active slave for 2*arp_interval.
>>>
>>> Could you please elaborate what exactly is fixed here, and how it
>>> works? :)
>> I can either duplicate some information from the bug, or Uwe can, to
>> illustrate the exact nature of the problem.
>>
>>> p.s. num_grat_arp maybe could help?
>> That was my thought as well, but as I understand it, that route was
>> explored, and it didn't help any. I don't actually have a reproducer
>> setup of my own, unfortunately, so I'm kind of caught in the middle
>> here...
>>
>> Uwe, can you perhaps further enlighten us as to what num_grat_arp
>> settings were tried that didn't help? I'm still of the mind that if
>> num_grat_arp *didn't* help, we probably need to do something keyed off
>> num_grat_arp.
> The bonding slaves are connected to high available switches, each of the
> slaves is connected to a different switch. If the bond is starting, only
> the selected slave sends one arp-request. If a matching arp_response was
> received, this slave and the bond is going into state up, sending the
> gratitious arps...
> But if you got no arp reply the next slave was selected.
> With most of the newer switches, not overloaded, or with other software
> bugs, or with a single switch configuration, you would get a arp response
> on the first arp request.
> But in case of high availability configuration with non perfect switches
> like HP ProCurve 54xx, also with some Cisco models, you may not get a
> response on the first arp request.
>
> I have seen network snoops, there the switches are not responding to the
> first arp request on slave 1, the second arp request was sent on slave 2
> but the response was received on slave one,  and all following arp
> requests are anwsered on the wrong slave for a longer time.
Could you elaborate on the exact "high availability
 configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
 updating the forwarding tables and the bond flip flopping between the
 slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
 the forwarding table, but (for whatever reason) there is no reply.  ARP
 on slave 2 is sent (using the same source MAC, unless you set
 fail_over_mac), but forwarding tables still send that MAC to slave 1, so
 reply is sent there.
>>> High availability:
>>> 2 managed switches with routing capabilities have an interconnect.
>>> One slave of a bonding interface is connected to the first switch, the
>>> second slave is connected to the other switch.
>>> The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
>>> remember also HP E 3500 and  E 3800 are also
>>> affected, for the affected Cisco models I can't answer today.
>>> Affected single switch configurations was not seen.
>>>
>>> Yes, race conditions with delayed upgrades of the forwarding tables is a
>>> well matching explanation for the problem.
>>>
> The proposed change sents up to 3 arp requests on a down bond using the
> same slave, delayed by arp_interval.
> Using problematic switches i have seen the the arp response on the right
> slave at latest on the second arp request. So the bond is going into state
> up.
>
> How does it works:
> The bonds in up state are handled on the beginning of bond_ab_arp_probe
> 

Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-03 Thread Jay Vosburgh
Uwe Koziolek  wrote:

>On Tue, Sep 01, 2015 at 05:41 PM +0200, Andy Gospodarek wrote:
>> On Mon, Aug 17, 2015 at 10:51:27PM +0200, Uwe Koziolek wrote:
>>> On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:
 Uwe Koziolek  wrote:

> On2015-08-17 07:12 PM,Jarod Wilson wrote:
>> On 2015-08-17 12:55 PM, Veaceslav Falico wrote:
>>> On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:
 From: Uwe Koziolek 

 With some very finicky switch hardware, active backup bonding can get
 into
 a situation where we play ping-pong between interfaces, trying to get
 one
 to come up as the active slave. There seems to be an issue with the
 switch's arp replies either taking too long, or simply getting lost,
 so we
 wind up unable to get any interface up and active. Sometimes, the issue
 sorts itself out after a while, sometimes it doesn't.

 Testing with num_grat_arp has proven fruitless, but sending an
 additional
 arp on curr_arp_slave if we're still in the arp_interval timeslice in
 bond_ab_arp_probe(), has shown to produce 100% reliability in testing
 with
 this hardware combination.
>>> Sorry, I don't understand the logic of why it works, and what exactly
>>> are
>>> we fixiing here.
>>>
>>> It also breaks completely the logic for link state management in case
>>> of no
>>> current active slave for 2*arp_interval.
>>>
>>> Could you please elaborate what exactly is fixed here, and how it
>>> works? :)
>> I can either duplicate some information from the bug, or Uwe can, to
>> illustrate the exact nature of the problem.
>>
>>> p.s. num_grat_arp maybe could help?
>> That was my thought as well, but as I understand it, that route was
>> explored, and it didn't help any. I don't actually have a reproducer
>> setup of my own, unfortunately, so I'm kind of caught in the middle
>> here...
>>
>> Uwe, can you perhaps further enlighten us as to what num_grat_arp
>> settings were tried that didn't help? I'm still of the mind that if
>> num_grat_arp *didn't* help, we probably need to do something keyed off
>> num_grat_arp.
> The bonding slaves are connected to high available switches, each of the
> slaves is connected to a different switch. If the bond is starting, only
> the selected slave sends one arp-request. If a matching arp_response was
> received, this slave and the bond is going into state up, sending the
> gratitious arps...
> But if you got no arp reply the next slave was selected.
> With most of the newer switches, not overloaded, or with other software
> bugs, or with a single switch configuration, you would get a arp response
> on the first arp request.
> But in case of high availability configuration with non perfect switches
> like HP ProCurve 54xx, also with some Cisco models, you may not get a
> response on the first arp request.
>
> I have seen network snoops, there the switches are not responding to the
> first arp request on slave 1, the second arp request was sent on slave 2
> but the response was received on slave one,  and all following arp
> requests are anwsered on the wrong slave for a longer time.
Could you elaborate on the exact "high availability
 configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
 updating the forwarding tables and the bond flip flopping between the
 slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
 the forwarding table, but (for whatever reason) there is no reply.  ARP
 on slave 2 is sent (using the same source MAC, unless you set
 fail_over_mac), but forwarding tables still send that MAC to slave 1, so
 reply is sent there.
>>> High availability:
>>> 2 managed switches with routing capabilities have an interconnect.
>>> One slave of a bonding interface is connected to the first switch, the
>>> second slave is connected to the other switch.
>>> The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
>>> remember also HP E 3500 and  E 3800 are also
>>> affected, for the affected Cisco models I can't answer today.
>>> Affected single switch configurations was not seen.
>>>
>>> Yes, race conditions with delayed upgrades of the forwarding tables is a
>>> well matching explanation for the problem.
>>>
> The proposed change sents up to 3 arp requests on a down bond using the
> same slave, delayed by arp_interval.
> Using problematic switches i have seen the the arp response on the right
> slave at latest on the second arp request. So the bond is going into state
> up.
>
> How does it works:
> 

Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-01 Thread Uwe Koziolek

On Tue, 01.09.2015 at 00:21 +0200 Jarod Wilson wrote:

On 2015-08-17 4:51 PM, Uwe Koziolek wrote:

On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:

Uwe Koziolek  wrote:


On2015-08-17 07:12 PM,Jarod Wilson wrote:

...

Uwe, can you perhaps further enlighten us as to what num_grat_arp
settings were tried that didn't help? I'm still of the mind that if
num_grat_arp *didn't* help, we probably need to do something keyed 
off

num_grat_arp.
The bonding slaves are connected to high available switches, each 
of the
slaves is connected to a different switch. If the bond is starting, 
only
the selected slave sends one arp-request. If a matching 
arp_response was

received, this slave and the bond is going into state up, sending the
gratitious arps...
But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other 
software

bugs, or with a single switch configuration, you would get a arp
response
on the first arp request.
But in case of high availability configuration with non perfect 
switches

like HP ProCurve 54xx, also with some Cisco models, you may not get a
response on the first arp request.

I have seen network snoops, there the switches are not responding 
to the
first arp request on slave 1, the second arp request was sent on 
slave 2

but the response was received on slave one,  and all following arp
requests are anwsered on the wrong slave for a longer time.

Could you elaborate on the exact "high availability
configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
updating the forwarding tables and the bond flip flopping between the
slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
the forwarding table, but (for whatever reason) there is no reply.  ARP
on slave 2 is sent (using the same source MAC, unless you set
fail_over_mac), but forwarding tables still send that MAC to slave 
1, so

reply is sent there.

High availability:
2 managed switches with routing capabilities have an interconnect.
One slave of a bonding interface is connected to the first switch, the
second slave is connected to the other switch.
The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
remember also HP E 3500 and  E 3800 are also
affected, for the affected Cisco models I can't answer today.
Affected single switch configurations was not seen.

Yes, race conditions with delayed upgrades of the forwarding tables is a
well matching explanation for the problem.

The proposed change sents up to 3 arp requests on a down bond using 
the

same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the 
right

slave at latest on the second arp request. So the bond is going into
state
up.

How does it works:
The bonds in up state are handled on the beginning of 
bond_ab_arp_probe

procedure, the other part of this procedure is handling the slave
change.
The proposed change is bypassing the slave change for 2 additional 
calls

of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also
implemented for a down bond.

Does this delay failover or bringup on switches that are not
"problematic"?  I.e., if arp_interval is, say, 1000 (1 second), will
this impact failover / recovery times?

-J

It depends.
failover times are not impacted, this is handled different.
Only the transition from a down bonding interface (bond and all slaves
are down) to the state up can be increased by up to 2 times 
arp_interval,

If the selected interface did not came up .If well working switches are
used, and everything other is also ok, there are no impacts.


Jay, any further thoughts on this given Uwe's reply? Uwe, did you have 
a chance to get affected Cisco model numbers too?



The affected Cisco model was a C3750.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-01 Thread Uwe Koziolek

On Tue, Sep 01, 2015 at 05:41 PM +0200, Andy Gospodarek wrote:

On Mon, Aug 17, 2015 at 10:51:27PM +0200, Uwe Koziolek wrote:

On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:

Uwe Koziolek  wrote:


On2015-08-17 07:12 PM,Jarod Wilson wrote:

On 2015-08-17 12:55 PM, Veaceslav Falico wrote:

On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:

From: Uwe Koziolek 

With some very finicky switch hardware, active backup bonding can get
into
a situation where we play ping-pong between interfaces, trying to get
one
to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost,
so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an
additional
arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing
with
this hardware combination.

Sorry, I don't understand the logic of why it works, and what exactly
are
we fixiing here.

It also breaks completely the logic for link state management in case
of no
current active slave for 2*arp_interval.

Could you please elaborate what exactly is fixed here, and how it
works? :)

I can either duplicate some information from the bug, or Uwe can, to
illustrate the exact nature of the problem.


p.s. num_grat_arp maybe could help?

That was my thought as well, but as I understand it, that route was
explored, and it didn't help any. I don't actually have a reproducer
setup of my own, unfortunately, so I'm kind of caught in the middle
here...

Uwe, can you perhaps further enlighten us as to what num_grat_arp
settings were tried that didn't help? I'm still of the mind that if
num_grat_arp *didn't* help, we probably need to do something keyed off
num_grat_arp.

The bonding slaves are connected to high available switches, each of the
slaves is connected to a different switch. If the bond is starting, only
the selected slave sends one arp-request. If a matching arp_response was
received, this slave and the bond is going into state up, sending the
gratitious arps...
But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other software
bugs, or with a single switch configuration, you would get a arp response
on the first arp request.
But in case of high availability configuration with non perfect switches
like HP ProCurve 54xx, also with some Cisco models, you may not get a
response on the first arp request.

I have seen network snoops, there the switches are not responding to the
first arp request on slave 1, the second arp request was sent on slave 2
but the response was received on slave one,  and all following arp
requests are anwsered on the wrong slave for a longer time.

Could you elaborate on the exact "high availability
configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
updating the forwarding tables and the bond flip flopping between the
slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
the forwarding table, but (for whatever reason) there is no reply.  ARP
on slave 2 is sent (using the same source MAC, unless you set
fail_over_mac), but forwarding tables still send that MAC to slave 1, so
reply is sent there.

High availability:
2 managed switches with routing capabilities have an interconnect.
One slave of a bonding interface is connected to the first switch, the
second slave is connected to the other switch.
The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
remember also HP E 3500 and  E 3800 are also
affected, for the affected Cisco models I can't answer today.
Affected single switch configurations was not seen.

Yes, race conditions with delayed upgrades of the forwarding tables is a
well matching explanation for the problem.


The proposed change sents up to 3 arp requests on a down bond using the
same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the right
slave at latest on the second arp request. So the bond is going into state
up.

How does it works:
The bonds in up state are handled on the beginning of bond_ab_arp_probe
procedure, the other part of this procedure is handling the slave change.
The proposed change is bypassing the slave change for 2 additional calls
of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also
implemented for a down bond.

Does this delay failover or bringup on switches that are not
"problematic"?  I.e., if arp_interval is, say, 1000 (1 second), will
this impact failover / recovery times?

-J

It depends.
failover times are not impacted, this is handled different.
Only the transition from a down bonding interface (bond and all slaves are
down) 

Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-01 Thread Andy Gospodarek
On Mon, Aug 17, 2015 at 10:51:27PM +0200, Uwe Koziolek wrote:
> On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:
> >Uwe Koziolek  wrote:
> >
> >>On2015-08-17 07:12 PM,Jarod Wilson wrote:
> >>>On 2015-08-17 12:55 PM, Veaceslav Falico wrote:
> On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:
> >From: Uwe Koziolek 
> >
> >With some very finicky switch hardware, active backup bonding can get
> >into
> >a situation where we play ping-pong between interfaces, trying to get
> >one
> >to come up as the active slave. There seems to be an issue with the
> >switch's arp replies either taking too long, or simply getting lost,
> >so we
> >wind up unable to get any interface up and active. Sometimes, the issue
> >sorts itself out after a while, sometimes it doesn't.
> >
> >Testing with num_grat_arp has proven fruitless, but sending an
> >additional
> >arp on curr_arp_slave if we're still in the arp_interval timeslice in
> >bond_ab_arp_probe(), has shown to produce 100% reliability in testing
> >with
> >this hardware combination.
> Sorry, I don't understand the logic of why it works, and what exactly
> are
> we fixiing here.
> 
> It also breaks completely the logic for link state management in case
> of no
> current active slave for 2*arp_interval.
> 
> Could you please elaborate what exactly is fixed here, and how it
> works? :)
> >>>I can either duplicate some information from the bug, or Uwe can, to
> >>>illustrate the exact nature of the problem.
> >>>
> p.s. num_grat_arp maybe could help?
> >>>That was my thought as well, but as I understand it, that route was
> >>>explored, and it didn't help any. I don't actually have a reproducer
> >>>setup of my own, unfortunately, so I'm kind of caught in the middle
> >>>here...
> >>>
> >>>Uwe, can you perhaps further enlighten us as to what num_grat_arp
> >>>settings were tried that didn't help? I'm still of the mind that if
> >>>num_grat_arp *didn't* help, we probably need to do something keyed off
> >>>num_grat_arp.
> >>The bonding slaves are connected to high available switches, each of the
> >>slaves is connected to a different switch. If the bond is starting, only
> >>the selected slave sends one arp-request. If a matching arp_response was
> >>received, this slave and the bond is going into state up, sending the
> >>gratitious arps...
> >>But if you got no arp reply the next slave was selected.
> >>With most of the newer switches, not overloaded, or with other software
> >>bugs, or with a single switch configuration, you would get a arp response
> >>on the first arp request.
> >>But in case of high availability configuration with non perfect switches
> >>like HP ProCurve 54xx, also with some Cisco models, you may not get a
> >>response on the first arp request.
> >>
> >>I have seen network snoops, there the switches are not responding to the
> >>first arp request on slave 1, the second arp request was sent on slave 2
> >>but the response was received on slave one,  and all following arp
> >>requests are anwsered on the wrong slave for a longer time.
> > Could you elaborate on the exact "high availability
> >configuration" here, including the model(s) of switch(es) involved?
> >
> > Is this some kind of race between the switch or switches
> >updating the forwarding tables and the bond flip flopping between the
> >slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
> >the forwarding table, but (for whatever reason) there is no reply.  ARP
> >on slave 2 is sent (using the same source MAC, unless you set
> >fail_over_mac), but forwarding tables still send that MAC to slave 1, so
> >reply is sent there.
> High availability:
> 2 managed switches with routing capabilities have an interconnect.
> One slave of a bonding interface is connected to the first switch, the
> second slave is connected to the other switch.
> The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
> remember also HP E 3500 and  E 3800 are also
> affected, for the affected Cisco models I can't answer today.
> Affected single switch configurations was not seen.
> 
> Yes, race conditions with delayed upgrades of the forwarding tables is a
> well matching explanation for the problem.
> 
> >>The proposed change sents up to 3 arp requests on a down bond using the
> >>same slave, delayed by arp_interval.
> >>Using problematic switches i have seen the the arp response on the right
> >>slave at latest on the second arp request. So the bond is going into state
> >>up.
> >>
> >>How does it works:
> >>The bonds in up state are handled on the beginning of bond_ab_arp_probe
> >>procedure, the other part of this procedure is handling the slave change.
> >>The proposed change is bypassing the slave change for 2 additional calls
> >>of bond_ab_arp_probe.
> >>Now the retries are not only for an up bond available, they are also
> 

Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-01 Thread Uwe Koziolek

On Tue, 01.09.2015 at 00:21 +0200 Jarod Wilson wrote:

On 2015-08-17 4:51 PM, Uwe Koziolek wrote:

On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:

Uwe Koziolek  wrote:


On2015-08-17 07:12 PM,Jarod Wilson wrote:

...

Uwe, can you perhaps further enlighten us as to what num_grat_arp
settings were tried that didn't help? I'm still of the mind that if
num_grat_arp *didn't* help, we probably need to do something keyed 
off

num_grat_arp.
The bonding slaves are connected to high available switches, each 
of the
slaves is connected to a different switch. If the bond is starting, 
only
the selected slave sends one arp-request. If a matching 
arp_response was

received, this slave and the bond is going into state up, sending the
gratitious arps...
But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other 
software

bugs, or with a single switch configuration, you would get a arp
response
on the first arp request.
But in case of high availability configuration with non perfect 
switches

like HP ProCurve 54xx, also with some Cisco models, you may not get a
response on the first arp request.

I have seen network snoops, there the switches are not responding 
to the
first arp request on slave 1, the second arp request was sent on 
slave 2

but the response was received on slave one,  and all following arp
requests are anwsered on the wrong slave for a longer time.

Could you elaborate on the exact "high availability
configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
updating the forwarding tables and the bond flip flopping between the
slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
the forwarding table, but (for whatever reason) there is no reply.  ARP
on slave 2 is sent (using the same source MAC, unless you set
fail_over_mac), but forwarding tables still send that MAC to slave 
1, so

reply is sent there.

High availability:
2 managed switches with routing capabilities have an interconnect.
One slave of a bonding interface is connected to the first switch, the
second slave is connected to the other switch.
The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
remember also HP E 3500 and  E 3800 are also
affected, for the affected Cisco models I can't answer today.
Affected single switch configurations was not seen.

Yes, race conditions with delayed upgrades of the forwarding tables is a
well matching explanation for the problem.

The proposed change sents up to 3 arp requests on a down bond using 
the

same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the 
right

slave at latest on the second arp request. So the bond is going into
state
up.

How does it works:
The bonds in up state are handled on the beginning of 
bond_ab_arp_probe

procedure, the other part of this procedure is handling the slave
change.
The proposed change is bypassing the slave change for 2 additional 
calls

of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also
implemented for a down bond.

Does this delay failover or bringup on switches that are not
"problematic"?  I.e., if arp_interval is, say, 1000 (1 second), will
this impact failover / recovery times?

-J

It depends.
failover times are not impacted, this is handled different.
Only the transition from a down bonding interface (bond and all slaves
are down) to the state up can be increased by up to 2 times 
arp_interval,

If the selected interface did not came up .If well working switches are
used, and everything other is also ok, there are no impacts.


Jay, any further thoughts on this given Uwe's reply? Uwe, did you have 
a chance to get affected Cisco model numbers too?



The affected Cisco model was a C3750.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-01 Thread Uwe Koziolek

On Tue, Sep 01, 2015 at 05:41 PM +0200, Andy Gospodarek wrote:

On Mon, Aug 17, 2015 at 10:51:27PM +0200, Uwe Koziolek wrote:

On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:

Uwe Koziolek  wrote:


On2015-08-17 07:12 PM,Jarod Wilson wrote:

On 2015-08-17 12:55 PM, Veaceslav Falico wrote:

On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:

From: Uwe Koziolek 

With some very finicky switch hardware, active backup bonding can get
into
a situation where we play ping-pong between interfaces, trying to get
one
to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost,
so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an
additional
arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing
with
this hardware combination.

Sorry, I don't understand the logic of why it works, and what exactly
are
we fixiing here.

It also breaks completely the logic for link state management in case
of no
current active slave for 2*arp_interval.

Could you please elaborate what exactly is fixed here, and how it
works? :)

I can either duplicate some information from the bug, or Uwe can, to
illustrate the exact nature of the problem.


p.s. num_grat_arp maybe could help?

That was my thought as well, but as I understand it, that route was
explored, and it didn't help any. I don't actually have a reproducer
setup of my own, unfortunately, so I'm kind of caught in the middle
here...

Uwe, can you perhaps further enlighten us as to what num_grat_arp
settings were tried that didn't help? I'm still of the mind that if
num_grat_arp *didn't* help, we probably need to do something keyed off
num_grat_arp.

The bonding slaves are connected to high available switches, each of the
slaves is connected to a different switch. If the bond is starting, only
the selected slave sends one arp-request. If a matching arp_response was
received, this slave and the bond is going into state up, sending the
gratitious arps...
But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other software
bugs, or with a single switch configuration, you would get a arp response
on the first arp request.
But in case of high availability configuration with non perfect switches
like HP ProCurve 54xx, also with some Cisco models, you may not get a
response on the first arp request.

I have seen network snoops, there the switches are not responding to the
first arp request on slave 1, the second arp request was sent on slave 2
but the response was received on slave one,  and all following arp
requests are anwsered on the wrong slave for a longer time.

Could you elaborate on the exact "high availability
configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
updating the forwarding tables and the bond flip flopping between the
slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
the forwarding table, but (for whatever reason) there is no reply.  ARP
on slave 2 is sent (using the same source MAC, unless you set
fail_over_mac), but forwarding tables still send that MAC to slave 1, so
reply is sent there.

High availability:
2 managed switches with routing capabilities have an interconnect.
One slave of a bonding interface is connected to the first switch, the
second slave is connected to the other switch.
The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
remember also HP E 3500 and  E 3800 are also
affected, for the affected Cisco models I can't answer today.
Affected single switch configurations was not seen.

Yes, race conditions with delayed upgrades of the forwarding tables is a
well matching explanation for the problem.


The proposed change sents up to 3 arp requests on a down bond using the
same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the right
slave at latest on the second arp request. So the bond is going into state
up.

How does it works:
The bonds in up state are handled on the beginning of bond_ab_arp_probe
procedure, the other part of this procedure is handling the slave change.
The proposed change is bypassing the slave change for 2 additional calls
of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also
implemented for a down bond.

Does this delay failover or bringup on switches that are not
"problematic"?  I.e., if arp_interval is, say, 1000 (1 second), will
this impact failover / recovery times?

-J

It depends.
failover times are not impacted, this is handled different.
Only the transition from a 

Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-09-01 Thread Andy Gospodarek
On Mon, Aug 17, 2015 at 10:51:27PM +0200, Uwe Koziolek wrote:
> On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:
> >Uwe Koziolek  wrote:
> >
> >>On2015-08-17 07:12 PM,Jarod Wilson wrote:
> >>>On 2015-08-17 12:55 PM, Veaceslav Falico wrote:
> On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:
> >From: Uwe Koziolek 
> >
> >With some very finicky switch hardware, active backup bonding can get
> >into
> >a situation where we play ping-pong between interfaces, trying to get
> >one
> >to come up as the active slave. There seems to be an issue with the
> >switch's arp replies either taking too long, or simply getting lost,
> >so we
> >wind up unable to get any interface up and active. Sometimes, the issue
> >sorts itself out after a while, sometimes it doesn't.
> >
> >Testing with num_grat_arp has proven fruitless, but sending an
> >additional
> >arp on curr_arp_slave if we're still in the arp_interval timeslice in
> >bond_ab_arp_probe(), has shown to produce 100% reliability in testing
> >with
> >this hardware combination.
> Sorry, I don't understand the logic of why it works, and what exactly
> are
> we fixiing here.
> 
> It also breaks completely the logic for link state management in case
> of no
> current active slave for 2*arp_interval.
> 
> Could you please elaborate what exactly is fixed here, and how it
> works? :)
> >>>I can either duplicate some information from the bug, or Uwe can, to
> >>>illustrate the exact nature of the problem.
> >>>
> p.s. num_grat_arp maybe could help?
> >>>That was my thought as well, but as I understand it, that route was
> >>>explored, and it didn't help any. I don't actually have a reproducer
> >>>setup of my own, unfortunately, so I'm kind of caught in the middle
> >>>here...
> >>>
> >>>Uwe, can you perhaps further enlighten us as to what num_grat_arp
> >>>settings were tried that didn't help? I'm still of the mind that if
> >>>num_grat_arp *didn't* help, we probably need to do something keyed off
> >>>num_grat_arp.
> >>The bonding slaves are connected to high available switches, each of the
> >>slaves is connected to a different switch. If the bond is starting, only
> >>the selected slave sends one arp-request. If a matching arp_response was
> >>received, this slave and the bond is going into state up, sending the
> >>gratitious arps...
> >>But if you got no arp reply the next slave was selected.
> >>With most of the newer switches, not overloaded, or with other software
> >>bugs, or with a single switch configuration, you would get a arp response
> >>on the first arp request.
> >>But in case of high availability configuration with non perfect switches
> >>like HP ProCurve 54xx, also with some Cisco models, you may not get a
> >>response on the first arp request.
> >>
> >>I have seen network snoops, there the switches are not responding to the
> >>first arp request on slave 1, the second arp request was sent on slave 2
> >>but the response was received on slave one,  and all following arp
> >>requests are anwsered on the wrong slave for a longer time.
> > Could you elaborate on the exact "high availability
> >configuration" here, including the model(s) of switch(es) involved?
> >
> > Is this some kind of race between the switch or switches
> >updating the forwarding tables and the bond flip flopping between the
> >slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
> >the forwarding table, but (for whatever reason) there is no reply.  ARP
> >on slave 2 is sent (using the same source MAC, unless you set
> >fail_over_mac), but forwarding tables still send that MAC to slave 1, so
> >reply is sent there.
> High availability:
> 2 managed switches with routing capabilities have an interconnect.
> One slave of a bonding interface is connected to the first switch, the
> second slave is connected to the other switch.
> The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
> remember also HP E 3500 and  E 3800 are also
> affected, for the affected Cisco models I can't answer today.
> Affected single switch configurations was not seen.
> 
> Yes, race conditions with delayed upgrades of the forwarding tables is a
> well matching explanation for the problem.
> 
> >>The proposed change sents up to 3 arp requests on a down bond using the
> >>same slave, delayed by arp_interval.
> >>Using problematic switches i have seen the the arp response on the right
> >>slave at latest on the second arp request. So the bond is going into state
> >>up.
> >>
> >>How does it works:
> >>The bonds in up state are handled on the beginning of bond_ab_arp_probe
> >>procedure, the other part of this procedure is handling the slave change.
> >>The proposed change is bypassing the slave change for 2 additional calls
> >>of bond_ab_arp_probe.
> >>Now the retries are not only 

Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-08-31 Thread Jarod Wilson

On 2015-08-17 4:51 PM, Uwe Koziolek wrote:

On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:

Uwe Koziolek  wrote:


On2015-08-17 07:12 PM,Jarod Wilson wrote:

...

Uwe, can you perhaps further enlighten us as to what num_grat_arp
settings were tried that didn't help? I'm still of the mind that if
num_grat_arp *didn't* help, we probably need to do something keyed off
num_grat_arp.

The bonding slaves are connected to high available switches, each of the
slaves is connected to a different switch. If the bond is starting, only
the selected slave sends one arp-request. If a matching arp_response was
received, this slave and the bond is going into state up, sending the
gratitious arps...
But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other software
bugs, or with a single switch configuration, you would get a arp
response
on the first arp request.
But in case of high availability configuration with non perfect switches
like HP ProCurve 54xx, also with some Cisco models, you may not get a
response on the first arp request.

I have seen network snoops, there the switches are not responding to the
first arp request on slave 1, the second arp request was sent on slave 2
but the response was received on slave one,  and all following arp
requests are anwsered on the wrong slave for a longer time.

Could you elaborate on the exact "high availability
configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
updating the forwarding tables and the bond flip flopping between the
slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
the forwarding table, but (for whatever reason) there is no reply.  ARP
on slave 2 is sent (using the same source MAC, unless you set
fail_over_mac), but forwarding tables still send that MAC to slave 1, so
reply is sent there.

High availability:
2 managed switches with routing capabilities have an interconnect.
One slave of a bonding interface is connected to the first switch, the
second slave is connected to the other switch.
The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
remember also HP E 3500 and  E 3800 are also
affected, for the affected Cisco models I can't answer today.
Affected single switch configurations was not seen.

Yes, race conditions with delayed upgrades of the forwarding tables is a
well matching explanation for the problem.


The proposed change sents up to 3 arp requests on a down bond using the
same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the right
slave at latest on the second arp request. So the bond is going into
state
up.

How does it works:
The bonds in up state are handled on the beginning of bond_ab_arp_probe
procedure, the other part of this procedure is handling the slave
change.
The proposed change is bypassing the slave change for 2 additional calls
of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also
implemented for a down bond.

Does this delay failover or bringup on switches that are not
"problematic"?  I.e., if arp_interval is, say, 1000 (1 second), will
this impact failover / recovery times?

-J

It depends.
failover times are not impacted, this is handled different.
Only the transition from a down bonding interface (bond and all slaves
are down) to the state up can be increased by up to 2 times arp_interval,
If the selected interface did not came up .If well working switches are
used, and everything other is also ok, there are no impacts.


Jay, any further thoughts on this given Uwe's reply? Uwe, did you have a 
chance to get affected Cisco model numbers too?


--
Jarod Wilson
ja...@redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-08-31 Thread Jarod Wilson

On 2015-08-17 4:51 PM, Uwe Koziolek wrote:

On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:

Uwe Koziolek  wrote:


On2015-08-17 07:12 PM,Jarod Wilson wrote:

...

Uwe, can you perhaps further enlighten us as to what num_grat_arp
settings were tried that didn't help? I'm still of the mind that if
num_grat_arp *didn't* help, we probably need to do something keyed off
num_grat_arp.

The bonding slaves are connected to high available switches, each of the
slaves is connected to a different switch. If the bond is starting, only
the selected slave sends one arp-request. If a matching arp_response was
received, this slave and the bond is going into state up, sending the
gratitious arps...
But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other software
bugs, or with a single switch configuration, you would get a arp
response
on the first arp request.
But in case of high availability configuration with non perfect switches
like HP ProCurve 54xx, also with some Cisco models, you may not get a
response on the first arp request.

I have seen network snoops, there the switches are not responding to the
first arp request on slave 1, the second arp request was sent on slave 2
but the response was received on slave one,  and all following arp
requests are anwsered on the wrong slave for a longer time.

Could you elaborate on the exact "high availability
configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
updating the forwarding tables and the bond flip flopping between the
slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
the forwarding table, but (for whatever reason) there is no reply.  ARP
on slave 2 is sent (using the same source MAC, unless you set
fail_over_mac), but forwarding tables still send that MAC to slave 1, so
reply is sent there.

High availability:
2 managed switches with routing capabilities have an interconnect.
One slave of a bonding interface is connected to the first switch, the
second slave is connected to the other switch.
The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i
remember also HP E 3500 and  E 3800 are also
affected, for the affected Cisco models I can't answer today.
Affected single switch configurations was not seen.

Yes, race conditions with delayed upgrades of the forwarding tables is a
well matching explanation for the problem.


The proposed change sents up to 3 arp requests on a down bond using the
same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the right
slave at latest on the second arp request. So the bond is going into
state
up.

How does it works:
The bonds in up state are handled on the beginning of bond_ab_arp_probe
procedure, the other part of this procedure is handling the slave
change.
The proposed change is bypassing the slave change for 2 additional calls
of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also
implemented for a down bond.

Does this delay failover or bringup on switches that are not
"problematic"?  I.e., if arp_interval is, say, 1000 (1 second), will
this impact failover / recovery times?

-J

It depends.
failover times are not impacted, this is handled different.
Only the transition from a down bonding interface (bond and all slaves
are down) to the state up can be increased by up to 2 times arp_interval,
If the selected interface did not came up .If well working switches are
used, and everything other is also ok, there are no impacts.


Jay, any further thoughts on this given Uwe's reply? Uwe, did you have a 
chance to get affected Cisco model numbers too?


--
Jarod Wilson
ja...@redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-08-17 Thread Uwe Koziolek

On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:

Uwe Koziolek  wrote:


On2015-08-17 07:12 PM,Jarod Wilson wrote:

On 2015-08-17 12:55 PM, Veaceslav Falico wrote:

On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:

From: Uwe Koziolek 

With some very finicky switch hardware, active backup bonding can get
into
a situation where we play ping-pong between interfaces, trying to get
one
to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost,
so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an
additional
arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing
with
this hardware combination.

Sorry, I don't understand the logic of why it works, and what exactly
are
we fixiing here.

It also breaks completely the logic for link state management in case
of no
current active slave for 2*arp_interval.

Could you please elaborate what exactly is fixed here, and how it
works? :)

I can either duplicate some information from the bug, or Uwe can, to
illustrate the exact nature of the problem.


p.s. num_grat_arp maybe could help?

That was my thought as well, but as I understand it, that route was
explored, and it didn't help any. I don't actually have a reproducer
setup of my own, unfortunately, so I'm kind of caught in the middle
here...

Uwe, can you perhaps further enlighten us as to what num_grat_arp
settings were tried that didn't help? I'm still of the mind that if
num_grat_arp *didn't* help, we probably need to do something keyed off
num_grat_arp.

The bonding slaves are connected to high available switches, each of the
slaves is connected to a different switch. If the bond is starting, only
the selected slave sends one arp-request. If a matching arp_response was
received, this slave and the bond is going into state up, sending the
gratitious arps...
But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other software
bugs, or with a single switch configuration, you would get a arp response
on the first arp request.
But in case of high availability configuration with non perfect switches
like HP ProCurve 54xx, also with some Cisco models, you may not get a
response on the first arp request.

I have seen network snoops, there the switches are not responding to the
first arp request on slave 1, the second arp request was sent on slave 2
but the response was received on slave one,  and all following arp
requests are anwsered on the wrong slave for a longer time.

Could you elaborate on the exact "high availability
configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
updating the forwarding tables and the bond flip flopping between the
slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
the forwarding table, but (for whatever reason) there is no reply.  ARP
on slave 2 is sent (using the same source MAC, unless you set
fail_over_mac), but forwarding tables still send that MAC to slave 1, so
reply is sent there.

High availability:
2 managed switches with routing capabilities have an interconnect.
One slave of a bonding interface is connected to the first switch, the 
second slave is connected to the other switch.
The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i 
remember also HP E 3500 and  E 3800 are also

affected, for the affected Cisco models I can't answer today.
Affected single switch configurations was not seen.

Yes, race conditions with delayed upgrades of the forwarding tables is a 
well matching explanation for the problem.



The proposed change sents up to 3 arp requests on a down bond using the
same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the right
slave at latest on the second arp request. So the bond is going into state
up.

How does it works:
The bonds in up state are handled on the beginning of bond_ab_arp_probe
procedure, the other part of this procedure is handling the slave change.
The proposed change is bypassing the slave change for 2 additional calls
of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also
implemented for a down bond.

Does this delay failover or bringup on switches that are not
"problematic"?  I.e., if arp_interval is, say, 1000 (1 second), will
this impact failover / recovery times?

-J

It depends.
failover times are not impacted, this is handled different.
Only the transition from a down bonding interface (bond and all slaves 
are down) to the state up can be increased by up to 2 times arp_interval,
If the selected interface did not came up .If well 

Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-08-17 Thread Jay Vosburgh
Uwe Koziolek  wrote:

>On2015-08-17 07:12 PM,Jarod Wilson wrote:
>> On 2015-08-17 12:55 PM, Veaceslav Falico wrote:
>>> On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:
 From: Uwe Koziolek 

 With some very finicky switch hardware, active backup bonding can get
 into
 a situation where we play ping-pong between interfaces, trying to get
 one
 to come up as the active slave. There seems to be an issue with the
 switch's arp replies either taking too long, or simply getting lost,
 so we
 wind up unable to get any interface up and active. Sometimes, the issue
 sorts itself out after a while, sometimes it doesn't.

 Testing with num_grat_arp has proven fruitless, but sending an
 additional
 arp on curr_arp_slave if we're still in the arp_interval timeslice in
 bond_ab_arp_probe(), has shown to produce 100% reliability in testing
 with
 this hardware combination.
>>>
>>> Sorry, I don't understand the logic of why it works, and what exactly
>>> are
>>> we fixiing here.
>>>
>>> It also breaks completely the logic for link state management in case
>>> of no
>>> current active slave for 2*arp_interval.
>>>
>>> Could you please elaborate what exactly is fixed here, and how it
>>> works? :)
>>
>> I can either duplicate some information from the bug, or Uwe can, to
>> illustrate the exact nature of the problem.
>>
>>> p.s. num_grat_arp maybe could help?
>>
>> That was my thought as well, but as I understand it, that route was
>> explored, and it didn't help any. I don't actually have a reproducer
>> setup of my own, unfortunately, so I'm kind of caught in the middle
>> here...
>>
>> Uwe, can you perhaps further enlighten us as to what num_grat_arp
>> settings were tried that didn't help? I'm still of the mind that if
>> num_grat_arp *didn't* help, we probably need to do something keyed off
>> num_grat_arp.
>The bonding slaves are connected to high available switches, each of the
>slaves is connected to a different switch. If the bond is starting, only
>the selected slave sends one arp-request. If a matching arp_response was
>received, this slave and the bond is going into state up, sending the
>gratitious arps...
>But if you got no arp reply the next slave was selected.
>With most of the newer switches, not overloaded, or with other software
>bugs, or with a single switch configuration, you would get a arp response
>on the first arp request.
>But in case of high availability configuration with non perfect switches
>like HP ProCurve 54xx, also with some Cisco models, you may not get a
>response on the first arp request.
>
>I have seen network snoops, there the switches are not responding to the
>first arp request on slave 1, the second arp request was sent on slave 2
>but the response was received on slave one,  and all following arp
>requests are anwsered on the wrong slave for a longer time.

Could you elaborate on the exact "high availability
configuration" here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
updating the forwarding tables and the bond flip flopping between the
slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
the forwarding table, but (for whatever reason) there is no reply.  ARP
on slave 2 is sent (using the same source MAC, unless you set
fail_over_mac), but forwarding tables still send that MAC to slave 1, so
reply is sent there.

>The proposed change sents up to 3 arp requests on a down bond using the
>same slave, delayed by arp_interval.
>Using problematic switches i have seen the the arp response on the right
>slave at latest on the second arp request. So the bond is going into state
>up.
>
>How does it works:
>The bonds in up state are handled on the beginning of bond_ab_arp_probe
>procedure, the other part of this procedure is handling the slave change.
>The proposed change is bypassing the slave change for 2 additional calls
>of bond_ab_arp_probe.
>Now the retries are not only for an up bond available, they are also
>implemented for a down bond.

Does this delay failover or bringup on switches that are not
"problematic"?  I.e., if arp_interval is, say, 1000 (1 second), will
this impact failover / recovery times?

-J

>The num_grat_arp has no chance to solve the problem. The num_grat_arp is
>only used, if a different slave is going active.
>But in our case, the bonding slaves are not going into the state active
>for a longer time.
>>
 [jarod: manufacturing of changelog]
 CC: Jay Vosburgh 
 CC: Veaceslav Falico 
 CC: Andy Gospodarek 
 CC: net...@vger.kernel.org
 Signed-off-by: Uwe Koziolek 
 Signed-off-by: Jarod Wilson 
 ---
 drivers/net/bonding/bond_main.c | 5 +
 1 file changed, 5 insertions(+)

 diff --git a/drivers/net/bonding/bond_main.c
 b/drivers/net/bonding/bond_main.c
 index 0c627b4..60b9483 100644
 --- 

Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-08-17 Thread Uwe Koziolek

On2015-08-17 07:12 PM,Jarod Wilson wrote:

On 2015-08-17 12:55 PM, Veaceslav Falico wrote:

On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:

From: Uwe Koziolek 

With some very finicky switch hardware, active backup bonding can get
into
a situation where we play ping-pong between interfaces, trying to 
get one

to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost,
so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an 
additional

arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing
with
this hardware combination.


Sorry, I don't understand the logic of why it works, and what exactly 
are

we fixiing here.

It also breaks completely the logic for link state management in case 
of no

current active slave for 2*arp_interval.

Could you please elaborate what exactly is fixed here, and how it 
works? :)


I can either duplicate some information from the bug, or Uwe can, to 
illustrate the exact nature of the problem.



p.s. num_grat_arp maybe could help?


That was my thought as well, but as I understand it, that route was 
explored, and it didn't help any. I don't actually have a reproducer 
setup of my own, unfortunately, so I'm kind of caught in the middle 
here...


Uwe, can you perhaps further enlighten us as to what num_grat_arp 
settings were tried that didn't help? I'm still of the mind that if 
num_grat_arp *didn't* help, we probably need to do something keyed off 
num_grat_arp.
The bonding slaves are connected to high available switches, each of the 
slaves is connected to a different switch. If the bond is starting, only 
the selected slave sends one arp-request. If a matching arp_response was 
received, this slave and the bond is going into state up, sending the 
gratitious arps...

But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other software 
bugs, or with a single switch configuration, you would get a arp 
response on the first arp request.
But in case of high availability configuration with non perfect switches 
like HP ProCurve 54xx, also with some Cisco models, you may not get a 
response on the first arp request.


I have seen network snoops, there the switches are not responding to the 
first arp request on slave 1, the second arp request was sent on slave 2 
but the response was received on slave one,  and all following arp 
requests are anwsered on the wrong slave for a longer time.


The proposed change sents up to 3 arp requests on a down bond using the 
same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the right 
slave at latest on the second arp request. So the bond is going into 
state up.


How does it works:
The bonds in up state are handled on the beginning of bond_ab_arp_probe 
procedure, the other part of this procedure is handling the slave change.
The proposed change is bypassing the slave change for 2 additional calls 
of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also 
implemented for a down bond.


The num_grat_arp has no chance to solve the problem. The num_grat_arp is 
only used, if a different slave is going active.
But in our case, the bonding slaves are not going into the state active 
for a longer time.



[jarod: manufacturing of changelog]
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: net...@vger.kernel.org
Signed-off-by: Uwe Koziolek 
Signed-off-by: Jarod Wilson 
---
drivers/net/bonding/bond_main.c | 5 +
1 file changed, 5 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c
b/drivers/net/bonding/bond_main.c
index 0c627b4..60b9483 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2794,6 +2794,11 @@ static bool bond_ab_arp_probe(struct bonding
*bond)
return should_notify_rtnl;
}

+if (bond_time_in_interval(bond, curr_arp_slave->last_link_up, 
2)) {

+bond_arp_send_all(bond, curr_arp_slave);
+return should_notify_rtnl;
+}
+
bond_set_slave_inactive_flags(curr_arp_slave,
BOND_SLAVE_NOTIFY_LATER);

bond_for_each_slave_rcu(bond, slave, iter) {
--
1.8.3.1






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-08-17 Thread Jarod Wilson

On 2015-08-17 12:55 PM, Veaceslav Falico wrote:

On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:

From: Uwe Koziolek 

With some very finicky switch hardware, active backup bonding can get
into
a situation where we play ping-pong between interfaces, trying to get one
to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost,
so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an additional
arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing
with
this hardware combination.


Sorry, I don't understand the logic of why it works, and what exactly are
we fixiing here.

It also breaks completely the logic for link state management in case of no
current active slave for 2*arp_interval.

Could you please elaborate what exactly is fixed here, and how it works? :)


I can either duplicate some information from the bug, or Uwe can, to 
illustrate the exact nature of the problem.



p.s. num_grat_arp maybe could help?


That was my thought as well, but as I understand it, that route was 
explored, and it didn't help any. I don't actually have a reproducer 
setup of my own, unfortunately, so I'm kind of caught in the middle here...


Uwe, can you perhaps further enlighten us as to what num_grat_arp 
settings were tried that didn't help? I'm still of the mind that if 
num_grat_arp *didn't* help, we probably need to do something keyed off 
num_grat_arp.




[jarod: manufacturing of changelog]
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: net...@vger.kernel.org
Signed-off-by: Uwe Koziolek 
Signed-off-by: Jarod Wilson 
---
drivers/net/bonding/bond_main.c | 5 +
1 file changed, 5 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c
b/drivers/net/bonding/bond_main.c
index 0c627b4..60b9483 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2794,6 +2794,11 @@ static bool bond_ab_arp_probe(struct bonding
*bond)
return should_notify_rtnl;
}

+if (bond_time_in_interval(bond, curr_arp_slave->last_link_up, 2)) {
+bond_arp_send_all(bond, curr_arp_slave);
+return should_notify_rtnl;
+}
+
bond_set_slave_inactive_flags(curr_arp_slave,
BOND_SLAVE_NOTIFY_LATER);

bond_for_each_slave_rcu(bond, slave, iter) {
--
1.8.3.1




--
Jarod Wilson
ja...@redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-08-17 Thread Veaceslav Falico

On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:

From: Uwe Koziolek 

With some very finicky switch hardware, active backup bonding can get into
a situation where we play ping-pong between interfaces, trying to get one
to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost, so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an additional
arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing with
this hardware combination.


Sorry, I don't understand the logic of why it works, and what exactly are
we fixiing here.

It also breaks completely the logic for link state management in case of no
current active slave for 2*arp_interval.

Could you please elaborate what exactly is fixed here, and how it works? :)

p.s. num_grat_arp maybe could help?



[jarod: manufacturing of changelog]
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: net...@vger.kernel.org
Signed-off-by: Uwe Koziolek 
Signed-off-by: Jarod Wilson 
---
drivers/net/bonding/bond_main.c | 5 +
1 file changed, 5 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 0c627b4..60b9483 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2794,6 +2794,11 @@ static bool bond_ab_arp_probe(struct bonding *bond)
return should_notify_rtnl;
}

+   if (bond_time_in_interval(bond, curr_arp_slave->last_link_up, 2)) {
+   bond_arp_send_all(bond, curr_arp_slave);
+   return should_notify_rtnl;
+   }
+
bond_set_slave_inactive_flags(curr_arp_slave, BOND_SLAVE_NOTIFY_LATER);

bond_for_each_slave_rcu(bond, slave, iter) {
--
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-08-17 Thread Jarod Wilson

On 2015-08-17 12:55 PM, Veaceslav Falico wrote:

On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:

From: Uwe Koziolek uwe.kozio...@redknee.com

With some very finicky switch hardware, active backup bonding can get
into
a situation where we play ping-pong between interfaces, trying to get one
to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost,
so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an additional
arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing
with
this hardware combination.


Sorry, I don't understand the logic of why it works, and what exactly are
we fixiing here.

It also breaks completely the logic for link state management in case of no
current active slave for 2*arp_interval.

Could you please elaborate what exactly is fixed here, and how it works? :)


I can either duplicate some information from the bug, or Uwe can, to 
illustrate the exact nature of the problem.



p.s. num_grat_arp maybe could help?


That was my thought as well, but as I understand it, that route was 
explored, and it didn't help any. I don't actually have a reproducer 
setup of my own, unfortunately, so I'm kind of caught in the middle here...


Uwe, can you perhaps further enlighten us as to what num_grat_arp 
settings were tried that didn't help? I'm still of the mind that if 
num_grat_arp *didn't* help, we probably need to do something keyed off 
num_grat_arp.




[jarod: manufacturing of changelog]
CC: Jay Vosburgh j.vosbu...@gmail.com
CC: Veaceslav Falico vfal...@gmail.com
CC: Andy Gospodarek go...@cumulusnetworks.com
CC: net...@vger.kernel.org
Signed-off-by: Uwe Koziolek uwe.kozio...@redknee.com
Signed-off-by: Jarod Wilson ja...@redhat.com
---
drivers/net/bonding/bond_main.c | 5 +
1 file changed, 5 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c
b/drivers/net/bonding/bond_main.c
index 0c627b4..60b9483 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2794,6 +2794,11 @@ static bool bond_ab_arp_probe(struct bonding
*bond)
return should_notify_rtnl;
}

+if (bond_time_in_interval(bond, curr_arp_slave-last_link_up, 2)) {
+bond_arp_send_all(bond, curr_arp_slave);
+return should_notify_rtnl;
+}
+
bond_set_slave_inactive_flags(curr_arp_slave,
BOND_SLAVE_NOTIFY_LATER);

bond_for_each_slave_rcu(bond, slave, iter) {
--
1.8.3.1




--
Jarod Wilson
ja...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-08-17 Thread Veaceslav Falico

On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:

From: Uwe Koziolek uwe.kozio...@redknee.com

With some very finicky switch hardware, active backup bonding can get into
a situation where we play ping-pong between interfaces, trying to get one
to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost, so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an additional
arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing with
this hardware combination.


Sorry, I don't understand the logic of why it works, and what exactly are
we fixiing here.

It also breaks completely the logic for link state management in case of no
current active slave for 2*arp_interval.

Could you please elaborate what exactly is fixed here, and how it works? :)

p.s. num_grat_arp maybe could help?



[jarod: manufacturing of changelog]
CC: Jay Vosburgh j.vosbu...@gmail.com
CC: Veaceslav Falico vfal...@gmail.com
CC: Andy Gospodarek go...@cumulusnetworks.com
CC: net...@vger.kernel.org
Signed-off-by: Uwe Koziolek uwe.kozio...@redknee.com
Signed-off-by: Jarod Wilson ja...@redhat.com
---
drivers/net/bonding/bond_main.c | 5 +
1 file changed, 5 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 0c627b4..60b9483 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2794,6 +2794,11 @@ static bool bond_ab_arp_probe(struct bonding *bond)
return should_notify_rtnl;
}

+   if (bond_time_in_interval(bond, curr_arp_slave-last_link_up, 2)) {
+   bond_arp_send_all(bond, curr_arp_slave);
+   return should_notify_rtnl;
+   }
+
bond_set_slave_inactive_flags(curr_arp_slave, BOND_SLAVE_NOTIFY_LATER);

bond_for_each_slave_rcu(bond, slave, iter) {
--
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-08-17 Thread Uwe Koziolek

On2015-08-17 07:12 PM,Jarod Wilson wrote:

On 2015-08-17 12:55 PM, Veaceslav Falico wrote:

On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:

From: Uwe Koziolek uwe.kozio...@redknee.com

With some very finicky switch hardware, active backup bonding can get
into
a situation where we play ping-pong between interfaces, trying to 
get one

to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost,
so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an 
additional

arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing
with
this hardware combination.


Sorry, I don't understand the logic of why it works, and what exactly 
are

we fixiing here.

It also breaks completely the logic for link state management in case 
of no

current active slave for 2*arp_interval.

Could you please elaborate what exactly is fixed here, and how it 
works? :)


I can either duplicate some information from the bug, or Uwe can, to 
illustrate the exact nature of the problem.



p.s. num_grat_arp maybe could help?


That was my thought as well, but as I understand it, that route was 
explored, and it didn't help any. I don't actually have a reproducer 
setup of my own, unfortunately, so I'm kind of caught in the middle 
here...


Uwe, can you perhaps further enlighten us as to what num_grat_arp 
settings were tried that didn't help? I'm still of the mind that if 
num_grat_arp *didn't* help, we probably need to do something keyed off 
num_grat_arp.
The bonding slaves are connected to high available switches, each of the 
slaves is connected to a different switch. If the bond is starting, only 
the selected slave sends one arp-request. If a matching arp_response was 
received, this slave and the bond is going into state up, sending the 
gratitious arps...

But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other software 
bugs, or with a single switch configuration, you would get a arp 
response on the first arp request.
But in case of high availability configuration with non perfect switches 
like HP ProCurve 54xx, also with some Cisco models, you may not get a 
response on the first arp request.


I have seen network snoops, there the switches are not responding to the 
first arp request on slave 1, the second arp request was sent on slave 2 
but the response was received on slave one,  and all following arp 
requests are anwsered on the wrong slave for a longer time.


The proposed change sents up to 3 arp requests on a down bond using the 
same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the right 
slave at latest on the second arp request. So the bond is going into 
state up.


How does it works:
The bonds in up state are handled on the beginning of bond_ab_arp_probe 
procedure, the other part of this procedure is handling the slave change.
The proposed change is bypassing the slave change for 2 additional calls 
of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also 
implemented for a down bond.


The num_grat_arp has no chance to solve the problem. The num_grat_arp is 
only used, if a different slave is going active.
But in our case, the bonding slaves are not going into the state active 
for a longer time.



[jarod: manufacturing of changelog]
CC: Jay Vosburgh j.vosbu...@gmail.com
CC: Veaceslav Falico vfal...@gmail.com
CC: Andy Gospodarek go...@cumulusnetworks.com
CC: net...@vger.kernel.org
Signed-off-by: Uwe Koziolek uwe.kozio...@redknee.com
Signed-off-by: Jarod Wilson ja...@redhat.com
---
drivers/net/bonding/bond_main.c | 5 +
1 file changed, 5 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c
b/drivers/net/bonding/bond_main.c
index 0c627b4..60b9483 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2794,6 +2794,11 @@ static bool bond_ab_arp_probe(struct bonding
*bond)
return should_notify_rtnl;
}

+if (bond_time_in_interval(bond, curr_arp_slave-last_link_up, 
2)) {

+bond_arp_send_all(bond, curr_arp_slave);
+return should_notify_rtnl;
+}
+
bond_set_slave_inactive_flags(curr_arp_slave,
BOND_SLAVE_NOTIFY_LATER);

bond_for_each_slave_rcu(bond, slave, iter) {
--
1.8.3.1






--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-08-17 Thread Jay Vosburgh
Uwe Koziolek uwe.kozio...@redknee.com wrote:

On2015-08-17 07:12 PM,Jarod Wilson wrote:
 On 2015-08-17 12:55 PM, Veaceslav Falico wrote:
 On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:
 From: Uwe Koziolek uwe.kozio...@redknee.com

 With some very finicky switch hardware, active backup bonding can get
 into
 a situation where we play ping-pong between interfaces, trying to get
 one
 to come up as the active slave. There seems to be an issue with the
 switch's arp replies either taking too long, or simply getting lost,
 so we
 wind up unable to get any interface up and active. Sometimes, the issue
 sorts itself out after a while, sometimes it doesn't.

 Testing with num_grat_arp has proven fruitless, but sending an
 additional
 arp on curr_arp_slave if we're still in the arp_interval timeslice in
 bond_ab_arp_probe(), has shown to produce 100% reliability in testing
 with
 this hardware combination.

 Sorry, I don't understand the logic of why it works, and what exactly
 are
 we fixiing here.

 It also breaks completely the logic for link state management in case
 of no
 current active slave for 2*arp_interval.

 Could you please elaborate what exactly is fixed here, and how it
 works? :)

 I can either duplicate some information from the bug, or Uwe can, to
 illustrate the exact nature of the problem.

 p.s. num_grat_arp maybe could help?

 That was my thought as well, but as I understand it, that route was
 explored, and it didn't help any. I don't actually have a reproducer
 setup of my own, unfortunately, so I'm kind of caught in the middle
 here...

 Uwe, can you perhaps further enlighten us as to what num_grat_arp
 settings were tried that didn't help? I'm still of the mind that if
 num_grat_arp *didn't* help, we probably need to do something keyed off
 num_grat_arp.
The bonding slaves are connected to high available switches, each of the
slaves is connected to a different switch. If the bond is starting, only
the selected slave sends one arp-request. If a matching arp_response was
received, this slave and the bond is going into state up, sending the
gratitious arps...
But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other software
bugs, or with a single switch configuration, you would get a arp response
on the first arp request.
But in case of high availability configuration with non perfect switches
like HP ProCurve 54xx, also with some Cisco models, you may not get a
response on the first arp request.

I have seen network snoops, there the switches are not responding to the
first arp request on slave 1, the second arp request was sent on slave 2
but the response was received on slave one,  and all following arp
requests are anwsered on the wrong slave for a longer time.

Could you elaborate on the exact high availability
configuration here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
updating the forwarding tables and the bond flip flopping between the
slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
the forwarding table, but (for whatever reason) there is no reply.  ARP
on slave 2 is sent (using the same source MAC, unless you set
fail_over_mac), but forwarding tables still send that MAC to slave 1, so
reply is sent there.

The proposed change sents up to 3 arp requests on a down bond using the
same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the right
slave at latest on the second arp request. So the bond is going into state
up.

How does it works:
The bonds in up state are handled on the beginning of bond_ab_arp_probe
procedure, the other part of this procedure is handling the slave change.
The proposed change is bypassing the slave change for 2 additional calls
of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also
implemented for a down bond.

Does this delay failover or bringup on switches that are not
problematic?  I.e., if arp_interval is, say, 1000 (1 second), will
this impact failover / recovery times?

-J

The num_grat_arp has no chance to solve the problem. The num_grat_arp is
only used, if a different slave is going active.
But in our case, the bonding slaves are not going into the state active
for a longer time.

 [jarod: manufacturing of changelog]
 CC: Jay Vosburgh j.vosbu...@gmail.com
 CC: Veaceslav Falico vfal...@gmail.com
 CC: Andy Gospodarek go...@cumulusnetworks.com
 CC: net...@vger.kernel.org
 Signed-off-by: Uwe Koziolek uwe.kozio...@redknee.com
 Signed-off-by: Jarod Wilson ja...@redhat.com
 ---
 drivers/net/bonding/bond_main.c | 5 +
 1 file changed, 5 insertions(+)

 diff --git a/drivers/net/bonding/bond_main.c
 b/drivers/net/bonding/bond_main.c
 index 0c627b4..60b9483 100644
 --- a/drivers/net/bonding/bond_main.c
 +++ b/drivers/net/bonding/bond_main.c
 @@ -2794,6 +2794,11 @@ static bool 

Re: [PATCH] net/bonding: send arp in interval if no active slave

2015-08-17 Thread Uwe Koziolek

On Mon, Aug 17, 2015 at 09:14PM +0200, Jay Vosburgh wrote:

Uwe Koziolek uwe.kozio...@redknee.com wrote:


On2015-08-17 07:12 PM,Jarod Wilson wrote:

On 2015-08-17 12:55 PM, Veaceslav Falico wrote:

On Mon, Aug 17, 2015 at 12:23:03PM -0400, Jarod Wilson wrote:

From: Uwe Koziolek uwe.kozio...@redknee.com

With some very finicky switch hardware, active backup bonding can get
into
a situation where we play ping-pong between interfaces, trying to get
one
to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost,
so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an
additional
arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing
with
this hardware combination.

Sorry, I don't understand the logic of why it works, and what exactly
are
we fixiing here.

It also breaks completely the logic for link state management in case
of no
current active slave for 2*arp_interval.

Could you please elaborate what exactly is fixed here, and how it
works? :)

I can either duplicate some information from the bug, or Uwe can, to
illustrate the exact nature of the problem.


p.s. num_grat_arp maybe could help?

That was my thought as well, but as I understand it, that route was
explored, and it didn't help any. I don't actually have a reproducer
setup of my own, unfortunately, so I'm kind of caught in the middle
here...

Uwe, can you perhaps further enlighten us as to what num_grat_arp
settings were tried that didn't help? I'm still of the mind that if
num_grat_arp *didn't* help, we probably need to do something keyed off
num_grat_arp.

The bonding slaves are connected to high available switches, each of the
slaves is connected to a different switch. If the bond is starting, only
the selected slave sends one arp-request. If a matching arp_response was
received, this slave and the bond is going into state up, sending the
gratitious arps...
But if you got no arp reply the next slave was selected.
With most of the newer switches, not overloaded, or with other software
bugs, or with a single switch configuration, you would get a arp response
on the first arp request.
But in case of high availability configuration with non perfect switches
like HP ProCurve 54xx, also with some Cisco models, you may not get a
response on the first arp request.

I have seen network snoops, there the switches are not responding to the
first arp request on slave 1, the second arp request was sent on slave 2
but the response was received on slave one,  and all following arp
requests are anwsered on the wrong slave for a longer time.

Could you elaborate on the exact high availability
configuration here, including the model(s) of switch(es) involved?

Is this some kind of race between the switch or switches
updating the forwarding tables and the bond flip flopping between the
slaves?  E.g., source MAC from ARP sent on slave 1 is used to populate
the forwarding table, but (for whatever reason) there is no reply.  ARP
on slave 2 is sent (using the same source MAC, unless you set
fail_over_mac), but forwarding tables still send that MAC to slave 1, so
reply is sent there.

High availability:
2 managed switches with routing capabilities have an interconnect.
One slave of a bonding interface is connected to the first switch, the 
second slave is connected to the other switch.
The switch models are HP ProCurve 5406 and HP ProCurve 5412. As far as i 
remember also HP E 3500 and  E 3800 are also

affected, for the affected Cisco models I can't answer today.
Affected single switch configurations was not seen.

Yes, race conditions with delayed upgrades of the forwarding tables is a 
well matching explanation for the problem.



The proposed change sents up to 3 arp requests on a down bond using the
same slave, delayed by arp_interval.
Using problematic switches i have seen the the arp response on the right
slave at latest on the second arp request. So the bond is going into state
up.

How does it works:
The bonds in up state are handled on the beginning of bond_ab_arp_probe
procedure, the other part of this procedure is handling the slave change.
The proposed change is bypassing the slave change for 2 additional calls
of bond_ab_arp_probe.
Now the retries are not only for an up bond available, they are also
implemented for a down bond.

Does this delay failover or bringup on switches that are not
problematic?  I.e., if arp_interval is, say, 1000 (1 second), will
this impact failover / recovery times?

-J

It depends.
failover times are not impacted, this is handled different.
Only the transition from a down bonding interface (bond and all slaves 
are down) to the state up can be increased by up to 2 times arp_interval,
If the