On Sat, Jun 02, 2018 at 09:10:08AM -0700, Florian Fainelli wrote:
> On June 2, 2018 3:34:32 AM MST, Ilias Apalodimas 
> <ilias.apalodi...@linaro.org> wrote:
> >Hi Florian, 
> >
> >Thanks for taking time to look into this
> >
> >On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote:
> >> 
> >> 
> >> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote:
> >> > On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote:
> >> >> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote:
> >> >>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote:
> >> >>>> Device tree is supposed to describe the hardware. Using that
> >hardware
> >> >>>> in different ways is not something you should describe in DT.
> >> >>>>
> >> >>> The new switchdev mode is applied with a .config option in the
> >kernel. What you
> >> >>> see is pre-existing code, so i am not sure if i should change it
> >in this
> >> >>> patchset.
> >> >>
> >> >> If you break the code up into a library and two drivers, it
> >becomes a
> >> >> moot point.
> >> > Agree
> >> > 
> >> >>
> >> >> But what i don't like here is that the device tree says to do dual
> >> >> mac. But you ignore that and do sometime else. I would prefer that
> >if
> >> >> DT says dual mac, and switchdev is compiled in, the probe fails
> >with
> >> >> EINVAL. Rather than ignore something, make it clear it is invalid.
> >> > The switch has 3 modes of operation as is.
> >> > 1. switch mode, to enable that you don't need to add anything on
> >> > the DTS and linux registers a single netdev interface.
> >> > 2. dual mac mode, this is when you need to add dual_emac; on the
> >DTS.
> >> > 3. switchdev mode which is controlled by a .config option, since as
> >you 
> >> > pointed out DTS was not made for controlling config options. 
> >> > 
> >> > I agree that this is far from beautiful. If the driver remains as
> >in though,
> >> > i'd prefer either keeping what's there or making "switchdev" a DTS
> >option, 
> >> > following the pre-existing erroneous usage rather than making the
> >device 
> >> > unusable.  If we end up returning some error and refuse to
> >initialize, users 
> >> > that remote upgrade their equipment, without taking a good look at
> >changelog,
> >> > will loose access to their devices with no means of remotely fixing
> >that.
> >> 
> >> It seems to me that the mistake here is seeing multiple modes of
> >> operations for the cpsw. There are not actually many, there is one
> >> usage, and then there is what you can and cannot offload. 
> >CPSW has in fact 2 modes of operation, different FIFO usage/lookup
> >entry(it's
> >called ALE in the current driver) by-pass(which is used in dual emac
> >for 
> >example) and other features. Again Grygorii is better suited to answer
> >the 
> >exact differences.
> >> The basic
> >> premise with switchdev and DSA (which uses switchdev) is that each
> >> user-facing port of your switch needs to work as if it were a normal
> >> Ethernet NIC, that is what you call dual-MAC I believe. Then, when
> >you
> >> create a bridge and you enslave those ports into the bridge, you need
> >to
> >> have forwarding done in hardware between these two ports when the
> >> SMAC/DMAC are not for the host/CPU/management interface and you must
> >> simultaneously still have the host have the ability to send/receive
> >> traffic through the bridge device.
> >Yes dual emac does that. But dual emac configures the port facing VLAN
> >to the
> >CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is
> >configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU
> >port
> >That's exactly what the current RFC does as well, with the addition of
> >registering a sw0p0 (i'll explain why later on this mail)
> >A little more detail on the issue we are having. On my description 
> >sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the
> >ports
> >that have PHYs attached. 
> >
> >When we start in the new switchdev mode all interfaces are added to
> >VLAN 0
> >so CPU port + port1 + port2 are all in the same VLAN group. In that
> >case sw0p1
> >and sw0p2 are working as you describe. So those 2 interfaces can
> >send/receive
> >traffic normally which matches the switchdev case.
> >
> >When we add them on a bridge(let's say br0), VLAN1(or any default
> >bridge VLAN)
> >is now configured on sw0p1 and sw0p2 but *not* on the CPU port. 
> >From this point on the whole fuunctionality just collapses. The switch
> >will 
> >work and offload traffic between sw0p1/sw0p2 but the bridge won't be
> >able to 
> >get an ip address (since VLAN1 is not a member of the CPU port and the
> >packet 
> >gets dropped). 
> >IGMPv2/V3 messages will never reach the br_multicast.c code to trigger 
> >switchdev and configure the MDBs on the ports.  i am prety sure there
> >are other
> >fail scenarios which i haven't discovered already, but those 2 are the
> >most 
> >basic ones.  If we add VLAN1 on the CPU port, everything works as
> >intended of 
> >course.
> >
> >That's the reason we registered sw0p0 as the CPU port. It can't do any
> >"real"
> >traffic, but you can configure the CPU port independantly and not be
> >forced to
> >do an OR on every VLAN add/delete grouping the CPU port with your port
> >command.
> >The TL;DR version of this is that the switch is working exactly as
> >switchdev is
> >expecting offloading traffic to the hardware when possible as long as
> >the CPU
> >port is member of the proper VLANs
> >
> >Petr's patch solves this for us
> >(9c86ce2c1ae337fc10568a12aea812ed03de8319).
> >We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and
> >decide
> >when to add the CPU port or not. 
> >
> >There are still a couple of cases that are not covered though, if we
> >don't 
> >register the CPU port. We cant decide when to forward multicast
> >traffic on the CPU port if a join hasn't been sent from br0.
> >So let's say you got 2 hosts doing multicast and for whatever reason
> >the host
> >wants to see that traffic. 
> >With the CPU port present you can do a 
> >"bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will
> >offload
> >the traffic to the CPU port and thus the host. If this goes away we are
> >forced
> >to send a join.
> 
> Thanks for the detailed explanation. Somehow I was under the impression that 
> cpsw had the ability, through specific DMA descriptor bits to direct traffic 
> towards one external port or another and conversely, have that information 
> from the HW when receiving packets. 
That's one mode of operation when by-passing the ALE if my understanding of
the hardware is correct. You can choose not to do that. I am still earning all
the details of the ahrdware myself. On Rx though you still need the CPU to
participate to receive the packet(and yes the descriptor indicates the port)

> What you describe is exactly the same problem we have in DSA when the switch 
> advertises DSA_TAG_PROTO_NONE where only VLAN tags could help differentiate 
> traffic from external ports. At some point there was a discuss of making 
> DSA_TAG_PROTO_NONE automatically create one VLAN per port but this is a good 
> source for other problems...
I am pretty sure i am not the first one to encounter this kind of problems
> 
> Looking forward to your follow-up patch series!
Will do!
> 
> -- 
> Florian

Thanks
Ilias

Reply via email to