On Tue, 13 Jul 2021 12:26:35 +0300 Andrew Rybchenko <andrew.rybche...@oktetlabs.ru> wrote:
> On 7/13/21 11:18 AM, Havlík Martin wrote: > > Dne 2021-07-12 15:07, Ori Kam napsal: > >> Hi Jan, > >> > >>> -----Original Message----- > >>> From: Jan Viktorin <vikto...@cesnet.cz> > >>> Sent: Monday, July 12, 2021 12:46 AM > >>> > >>> On Sun, 11 Jul 2021 08:49:18 +0000 > >>> Ori Kam <or...@nvidia.com> wrote: > >>> > >>> > Hi Jan, > >>> > >>> Hi Ori, > >>> > >>> > > >>> > > >>> > > -----Original Message----- > >>> > > From: dev <dev-boun...@dpdk.org> On Behalf Of Jan Viktorin > >>> > > Sent: Wednesday, July 7, 2021 6:54 PM > >>> > > > >>> > > On Sun, 4 Jul 2021 15:18:01 +0000 > >>> > > Matan Azrad <ma...@nvidia.com> wrote: > >>> > > > >>> > > > From: Havlík Martin > >>> > > > > Dne 2021-06-23 09:04, Min Hu (Connor) napsal: > >>> > > > > > 在 2021/6/22 17:25, Martin Havlik 写道: > >>> > > > > >> When dedicated queues are enabled, mlx5 PMD fails to > >>> > > > > >> install RTE Flows if the underlying ethdev is not > >>> > > > > >> started: bond_ethdev_8023ad_flow_set(267) - > >>> > > bond_ethdev_8023ad_flow_set: > >>> > > > > port > >>> > > > > >> not started (slave_port=0 queue_id=1) > >>> > > > > >> > >>> > > > > > Why mlx5 PMD doing flow create relys on port started ? > >>> > > > > > I noticed other PMDs did not has that reliance. > >>> > > > > > > >>> > > > > After looking into it, I really can't answer this mlx5 > >>> > > > > centered question. Closest related info we found so far > >>> > > > > is the 5th point in > >>> > > > > https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats > >>> > > > > but it only specifies it's the application's > >>> > > > > responsibility and that flow rules are assumed invalid > >>> > > > > after port stop/close/restart but doesn't say anything > >>> > > > > about <stop - flow rule create - start> vs <stop - start > >>> > > > > - flow rule create> where the former is the point of > >>> > > > > failure on mlx5. I'm addressing the maintainers for mlx5, > >>> > > > > who might know a bit more on the topic. > >>> > > > > >>> > > > >>> > > Hello Matan, > >>> > > > >>> > > > From rte_ethdev.h > >>> > > > > >>> > > > * Please note that some configuration is not stored between > >>> > > > calls to > >>> > > > * rte_eth_dev_stop()/rte_eth_dev_start(). The following > >>> > > > configuration will > >>> > > > * be retained: > >>> > > > * > >>> > > > * - MTU > >>> > > > * - flow control settings > >>> > > > * - receive mode configuration (promiscuous mode, > >>> all-multicast > >>> > > > mode, > >>> > > > * hardware checksum mode, RSS/VMDQ settings etc.) > >>> > > > * - VLAN filtering configuration > >>> > > > * - default MAC address > >>> > > > * - MAC addresses supplied to MAC address array > >>> > > > * - flow director filtering mode (but not filtering > >>> > > >rules) > >>> > > > * - NIC queue statistics mappings > >>> > > > >>> > > just after this section, you can find the following statement: > >>> > > > >>> > > * Any other configuration will not be stored and will need > >>> > >to be > >>> > > re-entered > >>> > > * before a call to rte_eth_dev_start(). > >>> > > > >>> > > It is not very clear how is this exactly related to flows > >>> > > (and this applies for all the quoted section, I think) but at > >>> > > least it can > >>> be used as a > >>> counter argument. > >>> > > > >>> > I agree the doc is not clear, as I see it flows are not part of > >>> > configuration, at least not when we are talking about rte_flow. > >>> > > >>> > >>> Agree. > >>> > >>> > > >>> > > > > >>> > > > > >>> > > > Mlx5 assumes flows are allowed to be configured only after > >>> > > > rte_eth_dev_start(). Before start \ after stop - no flow is > >>> > > > valid anymore. > >>> > > > >>> > > I believe that this discussion is not about validity of > >>> > > flows. Let the flows be invalid after calling to > >>> > > rte_eth_dev_stop(). This is OK, flows must be recreated and > >>> > > the bonding driver works this way. But why not *before > >>> > > start*? > >>> > Think about it this way by changing the configuration you may > >>> > create invalid flows, for example, you can only change the > >>> > number of queues after port stop, so if you create a flow with > >>> > jump to queue 3 and then you remove queue 3 then, the flow that > >>> > is cached is not valid anymore. This goes for other > >>> > configuration that may affect the validity of a > >>> flow. > >>> > >>> This is a valid argument, of course. The thing is whether it is a > >>> responsibility > >>> of the PMD to take care of those corner cases or if this is up to > >>> the application developer. If we respect the fact that calling to > >>> stop invalidates all > >>> flows then when you do: > >>> > >>> > port stop 0 > >>> > flow create 0 ingress pattern ... / end actions queue 6 / end > >>> > port config > >>> rxq 3 > port start 0 > >>> > >>> it's clear that something is really wrong with the caller/user. I > >>> would say that > >>> this is an application bug. I would expect that you first > >>> reconfigure port and > >>> after that you modify flows. It seems quite logical and intuitive, > >>> doesn't it? > >>> > >> > >> I agree the use case I described is bad programming, but the same > >> logic can be > >> applied on stopping the port why flows should be deleted? I just took into account what Matan has written. Certainly, it is not necessary to delete flows on stop. > >> And even more important on some HW and some flows can only be > >> inserted after > >> start since there is no real traffic gate, basically the start > >> means adding the flows > >> if a flow is in the HW it is working, adding a gate will result in > >> PPS degradation. What do you mean by term "gate", here? Some FPGA logic? Or something else? Why "adding a gate" would result in PPS degradation? Can you make those ideas a bit clearer? > >> > >>> Anyway, the PMD can possibly catch this rxq inconsistency but I > >>> can imagine > >>> that there are more complicated sitations than just changing > >>> count of queues. Any idea for a more complex real case? > >>> > >> > >> Some of the resource may be created only after start, > >> But I don't have a very good example now. Please, try to find some reasonable example otherwise your argumentation does not make much sense... > >> > >>> Martin Havlik, can you please check how ixgbe and i40e behave in > >>> the case > >>> above with "rxq change after flow created"? > >>> > > Both PMDs do not raise any errors in the mentioned case. The packets > > that should go to the non-existent queue are dropped at some point > > in the processing (from what I could see and understand). Thanks for clarification. It would be nice if any other PMD can be tested for this behaviour. Can sombody help us with other NICs? > > > > I'm glad for the discussion that is taking place but I feel we have > > strayed from the original reason of it. That being the inability to > > enable dedicated queues for bonding mode 8023AD on mlx5. We have > > concluded that flow creation has to be done after port/device > > start. So moving the function call to create the flow after device > > start should solve the issue > > (https://mails.dpdk.org/archives/dev/2021-June/212210.html). From my > > testing, flow create after start is not a problem on Intel PMDs. > > It would be good to hear what net/bonding maintainers think > about it. I see no conclusion. > > If net/mlx5 behaviour is fixed, will it fix the initial > problem? I believe that if mlx5 allows to create flows before calling to rte_eth_dev_start() then the bonding would work as expected. > > > > > I'm turning to you, Connor, as you have made the original question > > on this. Is the patch I presented applicable? > > > > Martin > >>> > > >>> > > Does somebody know how other drivers behaves in this > >>> > > situation? (We know and can check for Intel, there it does > >>> > > not seem to be an issue.) > >>> > > > >>> > > By the way, the mlx5 behaviour opens a (probably short) time > >>> > > window between starting of a port and configuation of > >>> > > filtering flows. You may want to start the port with > >>> > > thousands of flows that apply just when the port starts (not > >>> > > after, that's late). This may introduce glitches in filtering > >>> > > and measuring of traffic (well, it is a > >>> question how > >>> serious issue could it be...). > >>> > > > >>> > Agree but this is always true nothing is done is zero time and > >>> > even if it was the insertion is not in zero time, (assuming > >>> > that even if the flows are stored by the PMD until start they > >>> > still will not all be inserted in the same time) > >>> > >>> Sorry, I probably do not understand well. Yes, creation of a flow > >>> would never > >>> be zero time, agree. But if I create flows before the port starts, > >>> than I do not > >>> have to care too much about how long the creation takes. Because > >>> the card is > >>> expected to be configured already before pushing the "start > >>> button". > >>> > >>> Maybe, you assume that the created flows would first be cached > >>> inside the > >>> PMD and when the port is finally started, it than transparently > >>> write the > >>> queues to the HW. But this is not what I was talking about, that's > >>> maybe even > >>> worse because you are hiding such behaviour from users... > >>> > >>> (I do not know the exact mlx5 implementation details, so my > >>> answer can miss something, sorry for that.) > >>> > >> > >> You are correct this is HW/PMD implementation issue, in case of > >> Nvidia we must By "must" you probably mean "would have to"? Is that correct? > >> cache the flows and only insert it after start. > >> Since RTE flow is generic and HW is not generic sometimes the PMD > >> needs to > >> translate this difference and in this gray area there is a game > >> between best API best performance understanding what the user Yes, as a former FPGA dev, I can understand your concerns. RTE Flow is a difficult beast. Jan > >> really wants. This is why the doc should be very clear on what is > >> the expected. but this is also the reason why such document is > >> very hard to agree on. > > Do I understand correctly that net/mlx5 maintainers will follow > up? > > >> > >> Ori > >>> > > >>> > > This matters for the bonding case as well, doesn't it?. It is > >>> > > not desirable to accidently omit a packet that was received > >>> > > by primary ingress logic instead of being redirected into the > >>> > > dedicated queue. > >>> > > > >>> > > Are there any chances that for mlx5 it would be possible to > >>> > > insert flow rules before calling rte_eth_dev_start? Anyway, > >>> > > the behaviour should be specified and documented in DPDK more > >>> > > precisely to avoid such uncertainty in the future. > >>> > > > >>> > I agree the documentation should be fixed. > >>> > >>> +1 > > Cc Thomas and Ferruh since ethdev documentation should be > clarified. > > It looks like there is no consensus if the patch is a right > direction or wrong. For me it looks wrong taking all above > arguments in to account (mainly necessity to be able to > insert flows before pushing start button which enables the > traffic if HW supports it). > > So, I'm applying first two patches and hold on this one.