Re: [PATCH] net: bridge: Allow bridge to joing multicast groups

2019-08-01 Thread Vivien Didelot
Hi Horatiu,

On Thu, 25 Jul 2019 13:44:04 +0200, Horatiu Vultur 
 wrote:
> There is no way to configure the bridge, to receive only specific link
> layer multicast addresses. From the description of the command 'bridge
> fdb append' is supposed to do that, but there was no way to notify the
> network driver that the bridge joined a group, because LLADDR was added
> to the unicast netdev_hw_addr_list.
> 
> Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set
> and if the source is NULL, which represent the bridge itself. Then add
> address to multicast netdev_hw_addr_list for each bridge interfaces.
> And then the .ndo_set_rx_mode function on the driver is called. To notify
> the driver that the list of multicast mac addresses changed.
> 
> Signed-off-by: Horatiu Vultur 
> ---
>  net/bridge/br_fdb.c | 49 ++---
>  1 file changed, 46 insertions(+), 3 deletions(-)
> 
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index b1d3248..d93746d 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -175,6 +175,29 @@ static void fdb_add_hw_addr(struct net_bridge *br, const 
> unsigned char *addr)
>   }
>  }
>  
> +static void fdb_add_hw_maddr(struct net_bridge *br, const unsigned char 
> *addr)
> +{
> + int err;
> + struct net_bridge_port *p;
> +
> + ASSERT_RTNL();
> +
> + list_for_each_entry(p, >port_list, list) {
> + if (!br_promisc_port(p)) {
> + err = dev_mc_add(p->dev, addr);
> + if (err)
> + goto undo;
> + }
> + }
> +
> + return;
> +undo:
> + list_for_each_entry_continue_reverse(p, >port_list, list) {
> + if (!br_promisc_port(p))
> + dev_mc_del(p->dev, addr);
> + }
> +}
> +
>  /* When a static FDB entry is deleted, the HW address from that entry is
>   * also removed from the bridge private HW address list and updates all
>   * the ports with needed information.
> @@ -192,13 +215,27 @@ static void fdb_del_hw_addr(struct net_bridge *br, 
> const unsigned char *addr)
>   }
>  }
>  
> +static void fdb_del_hw_maddr(struct net_bridge *br, const unsigned char 
> *addr)
> +{
> + struct net_bridge_port *p;
> +
> + ASSERT_RTNL();
> +
> + list_for_each_entry(p, >port_list, list) {
> + if (!br_promisc_port(p))
> + dev_mc_del(p->dev, addr);
> + }
> +}
> +
>  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>  bool swdev_notify)
>  {
>   trace_fdb_delete(br, f);
>  
> - if (f->is_static)
> + if (f->is_static) {
>   fdb_del_hw_addr(br, f->key.addr.addr);
> + fdb_del_hw_maddr(br, f->key.addr.addr);
> + }
>  
>   hlist_del_init_rcu(>fdb_node);
>   rhashtable_remove_fast(>fdb_hash_tbl, >rhnode,
> @@ -843,13 +880,19 @@ static int fdb_add_entry(struct net_bridge *br, struct 
> net_bridge_port *source,
>   fdb->is_local = 1;
>   if (!fdb->is_static) {
>   fdb->is_static = 1;
> - fdb_add_hw_addr(br, addr);
> + if (flags & NLM_F_APPEND && !source)
> + fdb_add_hw_maddr(br, addr);
> + else
> + fdb_add_hw_addr(br, addr);
>   }
>   } else if (state & NUD_NOARP) {
>   fdb->is_local = 0;
>   if (!fdb->is_static) {
>   fdb->is_static = 1;
> - fdb_add_hw_addr(br, addr);
> + if (flags & NLM_F_APPEND && !source)
> + fdb_add_hw_maddr(br, addr);
> + else
> + fdb_add_hw_addr(br, addr);
>   }
>   } else {
>   fdb->is_local = 0;
> -- 
> 2.7.4
> 

I'm a bit late in the conversation. Isn't this what you want?

ip address add  dev br0 autojoin


Thanks,
Vivien


Re: [PATCH] net: bridge: Allow bridge to joing multicast groups

2019-07-31 Thread Andrew Lunn
> > 2) The interface is part of a bridge. There are a few sub-cases
> > 
> > a) IGMP snooping is being performed. We can block multicast where
> > there is no interest in the group. But this is limited to IP
> > multicast.
> Agree. And this is done today by installing an explicit offload rule to limit
> the flooding of a specific group.
> 
> > b) IGMP snooping is not being used and all interfaces in the bridge
> > are ports of the switch. IP Multicast can be blocked to the CPU.
> Does it matter if IGMP snooping is used or not? A more general statement could
> be:
> 
> - "All interfaces in the bridge are ports of the switch. IP Multicast can be
>   blocked to the CPU."

We have seen IPv6 neighbour discovery break in conditions like this. I
don't know the exact details.

You also need to watch out for 224.0.0.0/24. This is the link local
multicast range. There is no need to join MC addresses in that
range. It is assumed they will always be received. So even if IGMP is
enabled, you still need to pass some multicast traffic to the CPU.

> > So one possibility here is to teach the SW bridge about non-IP
> > multicast addresses. Initially the switch should forward all MAC
> > multicast frames to the CPU. If the frame is not an IPv4 or IPv6
> > frame, and there has not been a call to set_rx_mode() for the MAC
> > address on the br0 interface, and the bridge only contains switch
> > ports, switchdev could be used to block the multicast to the CPU
> > frame, but forward it out all other ports of the bridge.
> Close, but not exactly (due to the arguments above).
> 
> Here is how I see it:
> 
> Teach the SW bridge about non-IP multicast addresses. Initially the switch
> should forward all MAC multicast frames to the CPU. Today MDB rules can be
> installed (either static or dynamic by IGMP), which limit the flooding of 
> IPv4/6
> multicast streams. In the same way, we should have a way to install a rule
> (FDM/ or MDB) to limit the flooding of L2 multicast frames.
> 
> If foreign interfaces (or br0 it self) is part of the destination list, then
> traffic also needs to go to the CPU.
> 
> By doing this, we can for explicitly configured dst mac address:
> - limit the flooding on the on the SW bridge interfaces
> - limit the flooding on the on the HW bridge interfaces
> - prevent them to go to the CPU if they are not needed

This is all very complex because of all the different corner cases. So
i don't think we want a user API to do the CPU part, we want the
network stack to do it. Otherwise the user is going to get is wrong,
break their network, and then come running to the list for help.

This also fits with how we do things in DSA. There is deliberately no
user space concept for configuring the DSA CPU port. To user space,
the switch is just a bunch of Linux interfaces. Everything to do with
the CPU port is hidden away in the DSA core layer, the DSA drivers,
and a little bit in the bridge.

   Andrew


Re: [PATCH] net: bridge: Allow bridge to joing multicast groups

2019-07-28 Thread Andrew Lunn
> Trying to get back to the original problem:
> 
> We have a network which implements the ODVA/DLR ring protocol. This protocol
> sends out a beacon frame as often as every 3 us (as far as I recall, default I
> believe is 400 us) to this MAC address: 01:21:6C:00:00:01.
> 
> Try take a quick look at slide 10 in [1].
> 
> If we assume that the SwitchDev driver implemented such that all multicast
> traffic goes to the CPU, then we should really have a way to install a HW
> offload path in the silicon, such that these packets does not go to the CPU 
> (as
> they are known not to be use full, and a frame every 3 us is a significant 
> load
> on small DMA connections and CPU resources).
> 
> If we assume that the SwitchDev driver implemented such that only "needed"
> multicast packets goes to the CPU, then we need a way to get these packets in
> case we want to implement the DLR protocol.
> 
> I'm sure that both models can work, and I do not think that this is the main
> issue here.
> 
> Our initial attempt was to allow install static L2-MAC entries and append
> multiple ports to such an entry in the MAC table. This was rejected, for 
> several
> good reasons it seems. But I'm not sure it was clear what we wanted to 
> achieve,
> and why we find it to be important. Hopefully this is clear with a real world
> use-case.
> 
> Any hints or ideas on what would be a better way to solve this problems will 
> be
> much appreciated.

I always try to think about how this would work if i had a bunch of
discrete network interfaces, not a switch. What APIs are involved in
configuring such a system? How does the Linux network stack perform
software DLR? How is the reception and blocking of the multicast group
performed?

Once you understand how it works in the software implement, it should
then be more obvious which switchdev hooks should be used to
accelerate this using hardware.

   Andrew


Re: [PATCH] net: bridge: Allow bridge to joing multicast groups

2019-07-26 Thread Andrew Lunn
On Fri, Jul 26, 2019 at 02:02:15PM +0200, Horatiu Vultur wrote:
> Hi Nikolay,
> 
> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
> > External E-Mail
> > 
> > 
> > On 26/07/2019 11:41, Nikolay Aleksandrov wrote:
> > > On 25/07/2019 17:21, Horatiu Vultur wrote:
> > >> Hi Nikolay,
> > >>
> > >> The 07/25/2019 16:21, Nikolay Aleksandrov wrote:
> > >>> External E-Mail
> > >>>
> > >>>
> > >>> On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
> >  On 25/07/2019 14:44, Horatiu Vultur wrote:
> > > There is no way to configure the bridge, to receive only specific link
> > > layer multicast addresses. From the description of the command 'bridge
> > > fdb append' is supposed to do that, but there was no way to notify the
> > > network driver that the bridge joined a group, because LLADDR was 
> > > added
> > > to the unicast netdev_hw_addr_list.
> > >
> > > Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is 
> > > set
> > > and if the source is NULL, which represent the bridge itself. Then add
> > > address to multicast netdev_hw_addr_list for each bridge interfaces.
> > > And then the .ndo_set_rx_mode function on the driver is called. To 
> > > notify
> > > the driver that the list of multicast mac addresses changed.
> > >
> > > Signed-off-by: Horatiu Vultur 
> > > ---
> > >  net/bridge/br_fdb.c | 49 
> > > ++---
> > >  1 file changed, 46 insertions(+), 3 deletions(-)
> > >
> > 
> >  Hi,
> >  I'm sorry but this patch is wrong on many levels, some notes below. In 
> >  general
> >  NLM_F_APPEND is only used in vxlan, the bridge does not handle that 
> >  flag at all.
> >  FDB is only for *unicast*, nothing is joined and no multicast should 
> >  be used with fdbs.
> >  MDB is used for multicast handling, but both of these are used for 
> >  forwarding.
> >  The reason the static fdbs are added to the filter is for non-promisc 
> >  ports, so they can
> >  receive traffic destined for these FDBs for forwarding.
> >  If you'd like to join any multicast group please use the standard way, 
> >  if you'd like to join
> >  it only on a specific port - join it only on that port (or ports) and 
> >  the bridge and you'll
> > >>>
> > >>> And obviously this is for the case where you're not enabling port 
> > >>> promisc mode (non-default).
> > >>> In general you'll only need to join the group on the bridge to receive 
> > >>> traffic for it
> > >>> or add it as an mdb entry to forward it.
> > >>>
> >  have the effect that you're describing. What do you mean there's no 
> >  way ?
> > >>
> > >> Thanks for the explanation.
> > >> There are few things that are not 100% clear to me and maybe you can
> > >> explain them, not to go totally in the wrong direction. Currently I am
> > >> writing a network driver on which I added switchdev support. Then I was
> > >> looking for a way to configure the network driver to copy link layer
> > >> multicast address to the CPU port.
> > >>
> > >> If I am using bridge mdb I can do it only for IP multicast addreses,
> > >> but how should I do it if I want non IP frames with link layer multicast
> > >> address to be copy to CPU? For example: all frames with multicast
> > >> address '01-21-6C-00-00-01' to be copy to CPU. What is the user space
> > >> command for that?
> > >>
> > > 
> > > Check SIOCADDMULTI (ip maddr from iproute2), f.e. add that mac to the port
> > > which needs to receive it and the bridge will send it up automatically 
> > > since
> > > it's unknown mcast (note that if there's a querier, you'll have to make 
> > > the
> > > bridge mcast router if it is not the querier itself). It would also flood 
> > > it to all
> > 
> > Actually you mentioned non-IP traffic, so the querier stuff is not a 
> > problem. This
> > traffic will always be flooded by the bridge (and also a copy will be 
> > locally sent up).
> > Thus only the flooding may need to be controlled.
> 
> OK, I see, but the part which is not clear to me is, which bridge
> command(from iproute2) to use so the bridge would notify the network
> driver(using swichdev or not) to configure the HW to copy all the frames
> with dmac '01-21-6C-00-00-01' to CPU? So that the bridge can receive
> those frames and then just to pass them up.

Hi Horatiu

Something to keep in mind.

My default, multicast should be flooded, and that includes the CPU
port for a DSA driver. Adding an MDB entry allows for optimisations,
limiting which ports a multicast frame goes out of. But it is just an
optimisation.

Andrew