Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-21 Thread David Ahern
On 11/21/18 5:54 PM, Alexis Bauvin wrote:
>>> There is one issue I can see with SO_REUSEPORT (if my understanding of it is
>>> correct). From what I understood, enabling this option will balance incoming
>>> connections (for TCP) / dgrams (for UDP) based on a 4-tuple hash (sip, dip,
>>> sport, dport) between sockets listening on the same port.
>>
>> AFAIK there is no balancing done. There is an order to which socket is
>> selected - and it includes the VRF device if relevant.
> 
> Maybe balance was not the correct word, "route" may be more appropriate. 
> Still you
> understood me, thanks for the details!
> 
> Yet, the "if relevant" part is interesting. Does enabling the
> net.ipv4.udp_l3mdev_accept sysctl counts as making vrfs not releavant? In 
> that case,
> both sockets are treated equally, right?

If udp_l3mdev_accept is disabled the default VRF is treated like a  real
VRF as opposed to no VRF (Vyatta's changes), meaning the scope of the
socket is just the VRF (or just the default VRF).

If udp_l3mdev_accept is enabled it allows an unbound UDP socket to work
across all VRFs.

Gives user's options for how to deploy their s/w especially when it
comes to all of the existing software that is has no idea about VRFs.


Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-21 Thread Alexis Bauvin
Le 21 nov. 2018 à 20:28, David Ahern  a écrit :
> On 11/21/18 7:05 AM, Alexis Bauvin wrote:
>> Le 20 nov. 2018 à 18:09, David Ahern  a écrit :
>>> On 11/20/18 9:58 AM, Alexis Bauvin wrote:
 A socket bound to vrf-blue listens on *:4789, thus owning the port. If 
 moving an
 underlay to the default vrf (ip link set dummy-b nomaster), a new socket 
 will be
 created, unbound to any interface and listening on *:4789. However, 
 because it
 will be in the default vrf, it will try to take ownership of port 4789 on 
 ALL
 vrfs, and fail because this port is already owned in vrf-blue for vxlan-a.
>>> 
>>> SO_REUSEPORT will fix that and incoming traffic through a vrf and
>>> default (non-)vrf will work. The recent changes by Vyatta provide even
>>> better isolation of default vrf and overlapping ports.
>> 
>> Did not think about that one, I will give it a shot.
>> 
>> There is one issue I can see with SO_REUSEPORT (if my understanding of it is
>> correct). From what I understood, enabling this option will balance incoming
>> connections (for TCP) / dgrams (for UDP) based on a 4-tuple hash (sip, dip,
>> sport, dport) between sockets listening on the same port.
> 
> AFAIK there is no balancing done. There is an order to which socket is
> selected - and it includes the VRF device if relevant.

Maybe balance was not the correct word, "route" may be more appropriate. Still 
you
understood me, thanks for the details!

Yet, the "if relevant" part is interesting. Does enabling the
net.ipv4.udp_l3mdev_accept sysctl counts as making vrfs not releavant? In that 
case,
both sockets are treated equally, right?

>> 
>> If we have two separate vxlan fabrics, with one underlay in the default vrf, 
>> and
>> another in some random vrf. Since the socket for the default vrf would own 
>> the
>> port on all vrfs, the port would effectively be reused between the two vrfs.
>> Wouldn't vxlan packets be directed to "random" (as in not related to the 
>> vxlan
>> fabric itself) sockets, meaning a complete mix of the two fabrics? This would
>> imply a complete drop of the packets not directed to the correct socket.
>> 
>> I guess the Vyatta changes you are talking about are "vrf: allow simultaneous
>> service instances in default and other VRFs"? If so, it looks like it would
>> solve the default vrf problem, not even requiring SO_REUSEPORT.
>> 
> 
> yes.

So it would seem with the above that SO_REUSEPORT is a solution when not using
Vyatta's changes, which are a solution themselves (I will test those anyways).


Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-21 Thread David Ahern
On 11/21/18 7:05 AM, Alexis Bauvin wrote:
> Le 20 nov. 2018 à 18:09, David Ahern  a écrit :
>> On 11/20/18 9:58 AM, Alexis Bauvin wrote:
>>> A socket bound to vrf-blue listens on *:4789, thus owning the port. If 
>>> moving an
>>> underlay to the default vrf (ip link set dummy-b nomaster), a new socket 
>>> will be
>>> created, unbound to any interface and listening on *:4789. However, because 
>>> it
>>> will be in the default vrf, it will try to take ownership of port 4789 on 
>>> ALL
>>> vrfs, and fail because this port is already owned in vrf-blue for vxlan-a.
>>
>> SO_REUSEPORT will fix that and incoming traffic through a vrf and
>> default (non-)vrf will work. The recent changes by Vyatta provide even
>> better isolation of default vrf and overlapping ports.
> 
> Did not think about that one, I will give it a shot.
> 
> There is one issue I can see with SO_REUSEPORT (if my understanding of it is
> correct). From what I understood, enabling this option will balance incoming
> connections (for TCP) / dgrams (for UDP) based on a 4-tuple hash (sip, dip,
> sport, dport) between sockets listening on the same port.

AFAIK there is no balancing done. There is an order to which socket is
selected - and it includes the VRF device if relevant.

> 
> If we have two separate vxlan fabrics, with one underlay in the default vrf, 
> and
> another in some random vrf. Since the socket for the default vrf would own the
> port on all vrfs, the port would effectively be reused between the two vrfs.
> Wouldn't vxlan packets be directed to "random" (as in not related to the vxlan
> fabric itself) sockets, meaning a complete mix of the two fabrics? This would
> imply a complete drop of the packets not directed to the correct socket.
> 
> I guess the Vyatta changes you are talking about are "vrf: allow simultaneous
> service instances in default and other VRFs"? If so, it looks like it would
> solve the default vrf problem, not even requiring SO_REUSEPORT.
> 

yes.


Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-21 Thread Alexis Bauvin
Le 20 nov. 2018 à 18:09, David Ahern  a écrit :
> On 11/20/18 9:58 AM, Alexis Bauvin wrote:
>> A socket bound to vrf-blue listens on *:4789, thus owning the port. If 
>> moving an
>> underlay to the default vrf (ip link set dummy-b nomaster), a new socket 
>> will be
>> created, unbound to any interface and listening on *:4789. However, because 
>> it
>> will be in the default vrf, it will try to take ownership of port 4789 on ALL
>> vrfs, and fail because this port is already owned in vrf-blue for vxlan-a.
> 
> SO_REUSEPORT will fix that and incoming traffic through a vrf and
> default (non-)vrf will work. The recent changes by Vyatta provide even
> better isolation of default vrf and overlapping ports.

Did not think about that one, I will give it a shot.

There is one issue I can see with SO_REUSEPORT (if my understanding of it is
correct). From what I understood, enabling this option will balance incoming
connections (for TCP) / dgrams (for UDP) based on a 4-tuple hash (sip, dip,
sport, dport) between sockets listening on the same port.

If we have two separate vxlan fabrics, with one underlay in the default vrf, and
another in some random vrf. Since the socket for the default vrf would own the
port on all vrfs, the port would effectively be reused between the two vrfs.
Wouldn't vxlan packets be directed to "random" (as in not related to the vxlan
fabric itself) sockets, meaning a complete mix of the two fabrics? This would
imply a complete drop of the packets not directed to the correct socket.

I guess the Vyatta changes you are talking about are "vrf: allow simultaneous
service instances in default and other VRFs"? If so, it looks like it would
solve the default vrf problem, not even requiring SO_REUSEPORT.

Thanks!


Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-20 Thread Alexis Bauvin
Le 20 nov. 2018 à 17:13, David Ahern  a écrit :
> On 11/20/18 8:48 AM, David Ahern wrote:
>> On 11/20/18 8:35 AM, Roopa Prabhu wrote:
>>> On Tue, Nov 20, 2018 at 7:04 AM David Ahern  
>>> wrote:
 
 On 11/20/18 7:23 AM, Alexis Bauvin wrote:
> When underlay VRF changes, either because the lower device itself changed,
> or its VRF changed, this patch releases the current socket of the VXLAN
> device and recreates another one in the right VRF. This allows for
> on-the-fly change of the underlay VRF of a VXLAN device.
> 
> Signed-off-by: Alexis Bauvin 
> Reviewed-by: Amine Kherbouche 
> Tested-by: Amine Kherbouche 
> ---
> drivers/net/vxlan.c | 94 +
> 1 file changed, 94 insertions(+)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index a3de08122269..1e6ccad6df6a 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -208,6 +208,18 @@ static inline struct vxlan_rdst 
> *first_remote_rtnl(struct vxlan_fdb *fdb)
>  return list_first_entry(>remotes, struct vxlan_rdst, list);
> }
> 
> +static int vxlan_is_in_l3mdev_chain(struct net_device *chain,
> + struct net_device *dev)
> +{
> + if (!chain)
> + return 0;
> +
> + if (chain->ifindex == dev->ifindex)
> + return 1;
> + return vxlan_is_in_l3mdev_chain(netdev_master_upper_dev_get(chain),
> + dev);
> +}
 
 This should return bool and true/false.
 
 Also, why l3mdev in the name? None of the checks look at whether it is
 an l3mdev master.
 
 And again here, someone more familiar with the vxlan code should review it.
 
>>> 
>>> 
>>> I understand the need for patch 2. But I don't understand the need for
>>> the complexity this patch introduces (especially implicit down and up
>>> of the vxlan device).
>>> Alexis, If your underlay routing changes,  you can down and up the
>>> vxlan device from user-space correct ?. This should be true for any
>>> tunnel device.
>>> 
>> 
>> I believe this patch handles changes to the VRF association of the bridge.
>> 
> Re-reading the commit message, this handles changes in VRF association
> of the lower device.

Yes

> If the vxlan socket in general (vrf or not) can be bound to the lower
> device instead of the VRF then it simplifies things a lot.

It would indeed make things simpler, as this patch would mostly be useless.
There is no technical reason not to have done so, except for the conceptual
idea that you bind to the vrf device to add vrf-awareness.

It only removes part of the complexity, as handling a change of lower device
itself would still be needed (ip link set vxlan-blue type vxlan dev eth1)
to rebind the socket.

I will try to bind the sockets to the lower device to see if there are any
shortcomings to this method.


Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-20 Thread David Ahern
On 11/20/18 9:58 AM, Alexis Bauvin wrote:
> A socket bound to vrf-blue listens on *:4789, thus owning the port. If moving 
> an
> underlay to the default vrf (ip link set dummy-b nomaster), a new socket will 
> be
> created, unbound to any interface and listening on *:4789. However, because it
> will be in the default vrf, it will try to take ownership of port 4789 on ALL
> vrfs, and fail because this port is already owned in vrf-blue for vxlan-a.

SO_REUSEPORT will fix that and incoming traffic through a vrf and
default (non-)vrf will work. The recent changes by Vyatta provide even
better isolation of default vrf and overlapping ports.


Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-20 Thread Alexis Bauvin
Le 20 nov. 2018 à 16:35, Roopa Prabhu  a écrit :
> 
> On Tue, Nov 20, 2018 at 7:04 AM David Ahern  wrote:
>> 
>> On 11/20/18 7:23 AM, Alexis Bauvin wrote:
>>> When underlay VRF changes, either because the lower device itself changed,
>>> or its VRF changed, this patch releases the current socket of the VXLAN
>>> device and recreates another one in the right VRF. This allows for
>>> on-the-fly change of the underlay VRF of a VXLAN device.
>>> 
>>> Signed-off-by: Alexis Bauvin 
>>> Reviewed-by: Amine Kherbouche 
>>> Tested-by: Amine Kherbouche 
>>> ---
>>> drivers/net/vxlan.c | 94 +
>>> 1 file changed, 94 insertions(+)
>>> 
>>> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>>> index a3de08122269..1e6ccad6df6a 100644
>>> --- a/drivers/net/vxlan.c
>>> +++ b/drivers/net/vxlan.c
>>> @@ -208,6 +208,18 @@ static inline struct vxlan_rdst 
>>> *first_remote_rtnl(struct vxlan_fdb *fdb)
>>>  return list_first_entry(>remotes, struct vxlan_rdst, list);
>>> }
>>> 
>>> +static int vxlan_is_in_l3mdev_chain(struct net_device *chain,
>>> + struct net_device *dev)
>>> +{
>>> + if (!chain)
>>> + return 0;
>>> +
>>> + if (chain->ifindex == dev->ifindex)
>>> + return 1;
>>> + return vxlan_is_in_l3mdev_chain(netdev_master_upper_dev_get(chain),
>>> + dev);
>>> +}
>> 
>> This should return bool and true/false.
>> 
>> Also, why l3mdev in the name? None of the checks look at whether it is
>> an l3mdev master.
>> 
>> And again here, someone more familiar with the vxlan code should review it.
>> 
> 
> 
> I understand the need for patch 2. But I don't understand the need for
> the complexity this patch introduces (especially implicit down and up
> of the vxlan device).
> Alexis, If your underlay routing changes,  you can down and up the
> vxlan device from user-space correct ?. This should be true for any
> tunnel device.

Yes. Without this patch, down-up the vxlan interface is enough to handle
the underlay VRF change. This patch is more a convenience than a real
need.

Furthermore, an issue with automatic down/up of the interface is a silent
fail when mixing underlays in the default vrf and in specific ones. Say
you are in the following situation:

   +--+
   |  |
   | vrf-blue |
   |  |
   +--++--+
  ||
 ++++
 |  |
++++++
| || |
| dummy-a || dummy-b |
| || |
+-++-+
 .  .
 .(lower device).
 .  .
+-++-+
| || |
| vxlan-a || vxlan-b |
| || |
+-++-+

A socket bound to vrf-blue listens on *:4789, thus owning the port. If moving an
underlay to the default vrf (ip link set dummy-b nomaster), a new socket will be
created, unbound to any interface and listening on *:4789. However, because it
will be in the default vrf, it will try to take ownership of port 4789 on ALL
vrfs, and fail because this port is already owned in vrf-blue for vxlan-a.

At least, a manual down/up of the vxlan interface would fail when upping the
interface, and spit out "Address already in use" through netlink to the
responsible ip link set up.

Dropping this patch would be sensible given its implicit actions, or fixing it
to make the vxlan interface really down when such a situation happens.

It may not be the right place to ask, but I don’t know the reason behind this
default vrf behaviour.


Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-20 Thread David Ahern
On 11/20/18 9:27 AM, Alexis Bauvin wrote:
>  maybe even move
> the function to net/core/dev.c, with other master-related functions.
> What do you think?

yes, it is a generic function.


Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-20 Thread Alexis Bauvin
Le 20 nov. 2018 à 16:04, David Ahern  a écrit :
> 
> On 11/20/18 7:23 AM, Alexis Bauvin wrote:
>> When underlay VRF changes, either because the lower device itself changed,
>> or its VRF changed, this patch releases the current socket of the VXLAN
>> device and recreates another one in the right VRF. This allows for
>> on-the-fly change of the underlay VRF of a VXLAN device.
>> 
>> Signed-off-by: Alexis Bauvin 
>> Reviewed-by: Amine Kherbouche 
>> Tested-by: Amine Kherbouche 
>> ---
>> drivers/net/vxlan.c | 94 +
>> 1 file changed, 94 insertions(+)
>> 
>> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> index a3de08122269..1e6ccad6df6a 100644
>> --- a/drivers/net/vxlan.c
>> +++ b/drivers/net/vxlan.c
>> @@ -208,6 +208,18 @@ static inline struct vxlan_rdst 
>> *first_remote_rtnl(struct vxlan_fdb *fdb)
>>  return list_first_entry(>remotes, struct vxlan_rdst, list);
>> }
>> 
>> +static int vxlan_is_in_l3mdev_chain(struct net_device *chain,
>> +struct net_device *dev)
>> +{
>> +if (!chain)
>> +return 0;
>> +
>> +if (chain->ifindex == dev->ifindex)
>> +return 1;
>> +return vxlan_is_in_l3mdev_chain(netdev_master_upper_dev_get(chain),
>> +dev);
>> +}
> 
> This should return bool and true/false.

Will change in v4.

> Also, why l3mdev in the name? None of the checks look at whether it is
> an l3mdev master.

The original intention was to detect if, through a master change, the
l3mdev of the lower device changed. However, it is right it is much more
generic than that. It will tell if dev is a master (direct or indirect)
of chain.

A rename to vxlan_is_upper_master would be preferred, maybe even move
the function to net/core/dev.c, with other master-related functions.
What do you think?

> And again here, someone more familiar with the vxlan code should review it.
> 
>> +
>> /* Find VXLAN socket based on network namespace, address family and UDP port
>>  * and enabled unshareable flags.
>>  */
>> @@ -3720,6 +3732,33 @@ struct net_device *vxlan_dev_create(struct net *net, 
>> const char *name,
>> }
>> EXPORT_SYMBOL_GPL(vxlan_dev_create);
>> 
>> +static int vxlan_reopen(struct vxlan_net *vn, struct vxlan_dev *vxlan)
>> +{
>> +int ret = 0;
>> +
>> +if (vxlan_addr_multicast(>default_dst.remote_ip) &&
>> +!vxlan_group_used(vn, vxlan))
>> +ret = vxlan_igmp_leave(vxlan);
>> +vxlan_sock_release(vxlan);
>> +
>> +if (ret < 0)
>> +return ret;
>> +
>> +ret = vxlan_sock_add(vxlan);
>> +if (ret < 0)
>> +return ret;
>> +
>> +if (vxlan_addr_multicast(>default_dst.remote_ip)) {
>> +ret = vxlan_igmp_join(vxlan);
>> +if (ret == -EADDRINUSE)
>> +ret = 0;
>> +if (ret)
>> +vxlan_sock_release(vxlan);
>> +}
>> +
>> +return ret;
>> +}
>> +
>> static void vxlan_handle_lowerdev_unregister(struct vxlan_net *vn,
>>   struct net_device *dev)
>> {
>> @@ -3742,6 +3781,55 @@ static void vxlan_handle_lowerdev_unregister(struct 
>> vxlan_net *vn,
>>  unregister_netdevice_many(_kill);
>> }
>> 
>> +static void vxlan_handle_change_upper(struct vxlan_net *vn,
>> +  struct net_device *dev)
>> +{
>> +struct vxlan_dev *vxlan, *next;
>> +
>> +list_for_each_entry_safe(vxlan, next, >vxlan_list, next) {
>> +struct net_device *lower;
>> +int err;
>> +
>> +lower = __dev_get_by_index(vxlan->net,
>> +   vxlan->cfg.remote_ifindex);
>> +if (!vxlan_is_in_l3mdev_chain(lower, dev))
>> +continue;
>> +
>> +err = vxlan_reopen(vn, vxlan);
>> +if (err < 0)
>> +netdev_err(vxlan->dev, "Failed to reopen socket: %d\n",
>> +   err);
>> +}
>> +}
>> +
>> +static void vxlan_handle_change(struct vxlan_net *vn, struct net_device 
>> *dev)
>> +{
>> +struct vxlan_dev *vxlan = netdev_priv(dev);
>> +struct vxlan_sock *sock;
>> +int l3mdev_index;
>> +
>> +#if IS_ENABLED(CONFIG_IPV6)
>> +bool metadata = vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA;
>> +bool ipv6 = vxlan->cfg.flags & VXLAN_F_IPV6 || metadata;
>> +
>> +sock = ipv6 ? rcu_dereference(vxlan->vn6_sock)
>> +: rcu_dereference(vxlan->vn4_sock);
>> +#else
>> +sock = rcu_dereference(vxlan->vn4_sock);
>> +#endif
>> +
>> +l3mdev_index =
>> +l3mdev_master_upper_ifindex_by_index(vxlan->net,
>> + vxlan->cfg.remote_ifindex);
>> +if (sock->sock->sk->sk_bound_dev_if != l3mdev_index) {
>> +int ret = vxlan_reopen(vn, vxlan);
>> +
>> +if (ret < 0)
>> +netdev_err(vxlan->dev, "Failed to reopen socket: %d\n",
>> +   

Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-20 Thread David Ahern
On 11/20/18 8:48 AM, David Ahern wrote:
> On 11/20/18 8:35 AM, Roopa Prabhu wrote:
>> On Tue, Nov 20, 2018 at 7:04 AM David Ahern  wrote:
>>>
>>> On 11/20/18 7:23 AM, Alexis Bauvin wrote:
 When underlay VRF changes, either because the lower device itself changed,
 or its VRF changed, this patch releases the current socket of the VXLAN
 device and recreates another one in the right VRF. This allows for
 on-the-fly change of the underlay VRF of a VXLAN device.

 Signed-off-by: Alexis Bauvin 
 Reviewed-by: Amine Kherbouche 
 Tested-by: Amine Kherbouche 
 ---
  drivers/net/vxlan.c | 94 +
  1 file changed, 94 insertions(+)

 diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
 index a3de08122269..1e6ccad6df6a 100644
 --- a/drivers/net/vxlan.c
 +++ b/drivers/net/vxlan.c
 @@ -208,6 +208,18 @@ static inline struct vxlan_rdst 
 *first_remote_rtnl(struct vxlan_fdb *fdb)
   return list_first_entry(>remotes, struct vxlan_rdst, list);
  }

 +static int vxlan_is_in_l3mdev_chain(struct net_device *chain,
 + struct net_device *dev)
 +{
 + if (!chain)
 + return 0;
 +
 + if (chain->ifindex == dev->ifindex)
 + return 1;
 + return vxlan_is_in_l3mdev_chain(netdev_master_upper_dev_get(chain),
 + dev);
 +}
>>>
>>> This should return bool and true/false.
>>>
>>> Also, why l3mdev in the name? None of the checks look at whether it is
>>> an l3mdev master.
>>>
>>> And again here, someone more familiar with the vxlan code should review it.
>>>
>>
>>
>> I understand the need for patch 2. But I don't understand the need for
>> the complexity this patch introduces (especially implicit down and up
>> of the vxlan device).
>> Alexis, If your underlay routing changes,  you can down and up the
>> vxlan device from user-space correct ?. This should be true for any
>> tunnel device.
>>
> 
> I believe this patch handles changes to the VRF association of the bridge.
> 
Re-reading the commit message, this handles changes in VRF association
of the lower device.

If the vxlan socket in general (vrf or not) can be bound to the lower
device instead of the VRF then it simplifies things a lot.


Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-20 Thread David Ahern
On 11/20/18 8:35 AM, Roopa Prabhu wrote:
> On Tue, Nov 20, 2018 at 7:04 AM David Ahern  wrote:
>>
>> On 11/20/18 7:23 AM, Alexis Bauvin wrote:
>>> When underlay VRF changes, either because the lower device itself changed,
>>> or its VRF changed, this patch releases the current socket of the VXLAN
>>> device and recreates another one in the right VRF. This allows for
>>> on-the-fly change of the underlay VRF of a VXLAN device.
>>>
>>> Signed-off-by: Alexis Bauvin 
>>> Reviewed-by: Amine Kherbouche 
>>> Tested-by: Amine Kherbouche 
>>> ---
>>>  drivers/net/vxlan.c | 94 +
>>>  1 file changed, 94 insertions(+)
>>>
>>> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>>> index a3de08122269..1e6ccad6df6a 100644
>>> --- a/drivers/net/vxlan.c
>>> +++ b/drivers/net/vxlan.c
>>> @@ -208,6 +208,18 @@ static inline struct vxlan_rdst 
>>> *first_remote_rtnl(struct vxlan_fdb *fdb)
>>>   return list_first_entry(>remotes, struct vxlan_rdst, list);
>>>  }
>>>
>>> +static int vxlan_is_in_l3mdev_chain(struct net_device *chain,
>>> + struct net_device *dev)
>>> +{
>>> + if (!chain)
>>> + return 0;
>>> +
>>> + if (chain->ifindex == dev->ifindex)
>>> + return 1;
>>> + return vxlan_is_in_l3mdev_chain(netdev_master_upper_dev_get(chain),
>>> + dev);
>>> +}
>>
>> This should return bool and true/false.
>>
>> Also, why l3mdev in the name? None of the checks look at whether it is
>> an l3mdev master.
>>
>> And again here, someone more familiar with the vxlan code should review it.
>>
> 
> 
> I understand the need for patch 2. But I don't understand the need for
> the complexity this patch introduces (especially implicit down and up
> of the vxlan device).
> Alexis, If your underlay routing changes,  you can down and up the
> vxlan device from user-space correct ?. This should be true for any
> tunnel device.
> 

I believe this patch handles changes to the VRF association of the bridge.


Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-20 Thread Roopa Prabhu
On Tue, Nov 20, 2018 at 7:04 AM David Ahern  wrote:
>
> On 11/20/18 7:23 AM, Alexis Bauvin wrote:
> > When underlay VRF changes, either because the lower device itself changed,
> > or its VRF changed, this patch releases the current socket of the VXLAN
> > device and recreates another one in the right VRF. This allows for
> > on-the-fly change of the underlay VRF of a VXLAN device.
> >
> > Signed-off-by: Alexis Bauvin 
> > Reviewed-by: Amine Kherbouche 
> > Tested-by: Amine Kherbouche 
> > ---
> >  drivers/net/vxlan.c | 94 +
> >  1 file changed, 94 insertions(+)
> >
> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> > index a3de08122269..1e6ccad6df6a 100644
> > --- a/drivers/net/vxlan.c
> > +++ b/drivers/net/vxlan.c
> > @@ -208,6 +208,18 @@ static inline struct vxlan_rdst 
> > *first_remote_rtnl(struct vxlan_fdb *fdb)
> >   return list_first_entry(>remotes, struct vxlan_rdst, list);
> >  }
> >
> > +static int vxlan_is_in_l3mdev_chain(struct net_device *chain,
> > + struct net_device *dev)
> > +{
> > + if (!chain)
> > + return 0;
> > +
> > + if (chain->ifindex == dev->ifindex)
> > + return 1;
> > + return vxlan_is_in_l3mdev_chain(netdev_master_upper_dev_get(chain),
> > + dev);
> > +}
>
> This should return bool and true/false.
>
> Also, why l3mdev in the name? None of the checks look at whether it is
> an l3mdev master.
>
> And again here, someone more familiar with the vxlan code should review it.
>


I understand the need for patch 2. But I don't understand the need for
the complexity this patch introduces (especially implicit down and up
of the vxlan device).
Alexis, If your underlay routing changes,  you can down and up the
vxlan device from user-space correct ?. This should be true for any
tunnel device.


Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-20 Thread David Ahern
On 11/20/18 7:23 AM, Alexis Bauvin wrote:
> When underlay VRF changes, either because the lower device itself changed,
> or its VRF changed, this patch releases the current socket of the VXLAN
> device and recreates another one in the right VRF. This allows for
> on-the-fly change of the underlay VRF of a VXLAN device.
> 
> Signed-off-by: Alexis Bauvin 
> Reviewed-by: Amine Kherbouche 
> Tested-by: Amine Kherbouche 
> ---
>  drivers/net/vxlan.c | 94 +
>  1 file changed, 94 insertions(+)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index a3de08122269..1e6ccad6df6a 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -208,6 +208,18 @@ static inline struct vxlan_rdst 
> *first_remote_rtnl(struct vxlan_fdb *fdb)
>   return list_first_entry(>remotes, struct vxlan_rdst, list);
>  }
>  
> +static int vxlan_is_in_l3mdev_chain(struct net_device *chain,
> + struct net_device *dev)
> +{
> + if (!chain)
> + return 0;
> +
> + if (chain->ifindex == dev->ifindex)
> + return 1;
> + return vxlan_is_in_l3mdev_chain(netdev_master_upper_dev_get(chain),
> + dev);
> +}

This should return bool and true/false.

Also, why l3mdev in the name? None of the checks look at whether it is
an l3mdev master.

And again here, someone more familiar with the vxlan code should review it.

> +
>  /* Find VXLAN socket based on network namespace, address family and UDP port
>   * and enabled unshareable flags.
>   */
> @@ -3720,6 +3732,33 @@ struct net_device *vxlan_dev_create(struct net *net, 
> const char *name,
>  }
>  EXPORT_SYMBOL_GPL(vxlan_dev_create);
>  
> +static int vxlan_reopen(struct vxlan_net *vn, struct vxlan_dev *vxlan)
> +{
> + int ret = 0;
> +
> + if (vxlan_addr_multicast(>default_dst.remote_ip) &&
> + !vxlan_group_used(vn, vxlan))
> + ret = vxlan_igmp_leave(vxlan);
> + vxlan_sock_release(vxlan);
> +
> + if (ret < 0)
> + return ret;
> +
> + ret = vxlan_sock_add(vxlan);
> + if (ret < 0)
> + return ret;
> +
> + if (vxlan_addr_multicast(>default_dst.remote_ip)) {
> + ret = vxlan_igmp_join(vxlan);
> + if (ret == -EADDRINUSE)
> + ret = 0;
> + if (ret)
> + vxlan_sock_release(vxlan);
> + }
> +
> + return ret;
> +}
> +
>  static void vxlan_handle_lowerdev_unregister(struct vxlan_net *vn,
>struct net_device *dev)
>  {
> @@ -3742,6 +3781,55 @@ static void vxlan_handle_lowerdev_unregister(struct 
> vxlan_net *vn,
>   unregister_netdevice_many(_kill);
>  }
>  
> +static void vxlan_handle_change_upper(struct vxlan_net *vn,
> +   struct net_device *dev)
> +{
> + struct vxlan_dev *vxlan, *next;
> +
> + list_for_each_entry_safe(vxlan, next, >vxlan_list, next) {
> + struct net_device *lower;
> + int err;
> +
> + lower = __dev_get_by_index(vxlan->net,
> +vxlan->cfg.remote_ifindex);
> + if (!vxlan_is_in_l3mdev_chain(lower, dev))
> + continue;
> +
> + err = vxlan_reopen(vn, vxlan);
> + if (err < 0)
> + netdev_err(vxlan->dev, "Failed to reopen socket: %d\n",
> +err);
> + }
> +}
> +
> +static void vxlan_handle_change(struct vxlan_net *vn, struct net_device *dev)
> +{
> + struct vxlan_dev *vxlan = netdev_priv(dev);
> + struct vxlan_sock *sock;
> + int l3mdev_index;
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> + bool metadata = vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA;
> + bool ipv6 = vxlan->cfg.flags & VXLAN_F_IPV6 || metadata;
> +
> + sock = ipv6 ? rcu_dereference(vxlan->vn6_sock)
> + : rcu_dereference(vxlan->vn4_sock);
> +#else
> + sock = rcu_dereference(vxlan->vn4_sock);
> +#endif
> +
> + l3mdev_index =
> + l3mdev_master_upper_ifindex_by_index(vxlan->net,
> +  vxlan->cfg.remote_ifindex);
> + if (sock->sock->sk->sk_bound_dev_if != l3mdev_index) {
> + int ret = vxlan_reopen(vn, vxlan);
> +
> + if (ret < 0)
> + netdev_err(vxlan->dev, "Failed to reopen socket: %d\n",
> +ret);
> + }
> +}
> +
>  static int vxlan_netdevice_event(struct notifier_block *unused,
>unsigned long event, void *ptr)
>  {
> @@ -3756,6 +3844,12 @@ static int vxlan_netdevice_event(struct notifier_block 
> *unused,
>   } else if (event == NETDEV_UDP_TUNNEL_PUSH_INFO ||
>  event == NETDEV_UDP_TUNNEL_DROP_INFO) {
>   vxlan_offload_rx_ports(dev, event == 
> NETDEV_UDP_TUNNEL_PUSH_INFO);
> + } else if (event == NETDEV_CHANGEUPPER) {