> >> On 10/20/2020 10:07 AM, Ananyev, Konstantin wrote: > >>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> testpmd will initialize default max packet length to 1518 > >>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, > >>>>>>>>>>>>>>>> send the > >>>>>>>>>>>>>>>> max mtu length packet with vlan tag, the max packet length > >>>>>>>>>>>>>>>> will > >>>>>>>>>>>>>>>> exceed 1518 that will cause packets dropped directly from > >>>>>>>>>>>>>>>> NIC hw > >>>>>>>>>>>> side. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max > >>>>>>>>>>>>>>>> packet size, so, configures the correct max packet size in > >>>>>>>>>>>>>>>> dev_config > >>>>>>>>>>>>> ops. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size") > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Signed-off-by: SteveX Yang <stevex.y...@intel.com> > >>>>>>>>>>>>>>>> --- > >>>>>>>>>>>>>>>> drivers/net/ice/ice_ethdev.c | 11 +++++++++++ > >>>>>>>>>>>>>>>> 1 file changed, 11 insertions(+) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c > >>>>>>>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index > >>>>>>>>>>>>>>>> cfd357b05..6b7098444 100644 > >>>>>>>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c > >>>>>>>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c > >>>>>>>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev > >>>>>>>>>> *dev) > >>>>>>>>>>>>>>>> struct ice_adapter *ad = > >>>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private); > >>>>>>>>>>>>>>>> struct ice_pf *pf = > >>>>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private); > >>>>>>>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD; > >>>>>>>>>>>>>>>> int ret; > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> /* Initialize to TRUE. If any of Rx queues doesn't > >>>>>>>>>>>>>>>> meet the @@ > >>>>>>>>>>>>>>>> -3157,6 > >>>>>>>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev) > >>>>>>>>>>>>>>>> if (dev->data->dev_conf.rxmode.mq_mode & > >>>>>>>>>> ETH_MQ_RX_RSS_FLAG) > >>>>>>>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |= > >>>>>>>>>>>>> DEV_RX_OFFLOAD_RSS_HASH; > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> +/** > >>>>>>>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal > >>>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>> + * larger than total size of MTU and Ether overhead. > >>>>>>>>>>>>>>>> + */ > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Why we need this check? > >>>>>>>>>>>>>>> Can we just call ice_mtu_set directly > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I think that without that check we can silently overwrite > >>>>>>>>>>>>>> provided > >>>>>>>>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value. > >>>>>>>>>>>>> > >>>>>>>>>>>>> OK, I see > >>>>>>>>>>>>> > >>>>>>>>>>>>> But still have one question > >>>>>>>>>>>>> dev->data->mtu is initialized to 1518 as default , but if > >>>>>>>>>>>>> dev->data->application set > >>>>>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure. > >>>>>>>>>>>>> does that mean we will still will set mtu to 1518, is this > >>>>>>>>>>>>> expected? > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should > >>>>>>>>>>>> raise > >>>>>>>>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: > >>>>>>>>>>>> 1500). > >>>>>>>>>>> > >>>>>>>>>>> Ok, this describe the problem more general and better to replace > >>>>>>>>>>> exist > >>>>>>>>>> code comment and commit log for easy understanding. > >>>>>>>>>>> Please send a new version for reword > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> I didn't really get this set. > >>>>>>>>>> > >>>>>>>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a > >>>>>>>>>> frame bigger than > >>>>>>>>>> this size is dropped. > >>>>>>>>> > >>>>>>>>> Sure, it is normal case for dropping oversize data. > >>>>>>>>> > >>>>>>>>>> Isn't this what should be, why we are trying to overwrite user > >>>>>>>>>> configuration > >>>>>>>>>> in PMD to prevent this? > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> But it is a confliction that application/user sets mtu & > >>>>>>>>> max_rx_pkt_len at the same time. > >>>>>>>>> This fix will make a decision when confliction occurred. > >>>>>>>>> MTU value will come from user operation (e.g.: port config mtu 0 > >>>>>>>>> 1500) directly, > >>>>>>>>> so, the max_rx_pkt_len will resize itself to adapt expected MTU > >>>>>>>>> value if its size is smaller than MTU + Ether overhead. > >>>>>>>>> > >>>>>>>>>> During eth_dev allocation, mtu set to default '1500', by ethdev > >>>>>>>>>> layer. > >>>>>>>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'. > >>>>>>>>>> I think Qi's concern above is valid, what is user set > >>>>>>>>>> 'max_rx_pkt_len' to '1000' > >>>>>>>>>> and mean it? PMD will not honor the user config. > >>>>>>>>> > >>>>>>>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to > >>>>>>>>> '1000', what's the behavior expected? > >>>>>>>>> If still keep the 'max_rx_pkt_len' value, that means the larger > >>>>>>>>> 'mtu' will be invalid. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd? > >>>>>>>>>> > >>>>>>>>> The default 'max_rx_pkt_len' has been initialized to generical > >>>>>>>>> value (1518) and default 'mtu' is '1500' in testpmd, > >>>>>>>>> But it isn't suitable to those NIC drivers which Ether overhead is > >>>>>>>>> larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable. > >>>>>>>>> > >>>>>>>>>> And I guess even better what we need is to tell to the application > >>>>>>>>>> what the > >>>>>>>>>> frame overhead PMD accepts. > >>>>>>>>>> So the application can set proper 'max_rx_pkt_len' value per port > >>>>>>>>>> for a > >>>>>>>>>> given/requested MTU value. > >>>>>>>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD > >>>>>>>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, > >>>>>>>>>> perhaps > >>>>>>>>>> he has a solution now? > >>>>>>>> > >>>>>>>> From my perspective the main problem here: > >>>>>>>> We have 2 different variables for nearly the same thing: > >>>>>>>> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len. > >>>>>>>> and 2 different API to update them: dev_mtu_set() and > >>>>>>>> dev_configure(). > >>>>>>> > >>>>>>> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled' > >>>>>>> Although not sure that is practically what is done for all drivers. > >>>>>> > >>>>>> I think most of Intel PMDs use it unconditionally. > >>>>>> > >>>>>>> > >>>>>>>> And inside majority of Intel PMDs we don't keep these 2 variables in > >>>>>>>> sync: > >>>>>>>> - mtu_set() will update both variables. > >>>>>>>> - dev_configure() will update only max_rx_pkt_len, but will keep mtu > >>>>>>>> intact. > >>>>>>>> > >>>>>>>> This patch fixes this inconsistency, which I think is a good thing. > >>>>>>>> Though yes, it introduces change in behaviour. > >>>>>>>> > >>>>>>>> Let say the code: > >>>>>>>> rte_eth_dev_set_mtu(port, 1500); > >>>>>>>> dev_conf.max_rx_pkt_len = 1000; > >>>>>>>> rte_eth_dev_configure(port, 1, 1, &dev_conf); > >>>>>>>> > >>>>>>> > >>>>>>> 'rte_eth_dev_configure()' is one of the first APIs called, it is > >>>>>>> called before > >>>>>>> 'rte_eth_dev_set_mtu(). > >>>>>> > >>>>>> Usually yes. > >>>>>> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); > >>>>>> dev_configure(); dev_start(); > >>>>>> > >>>>>>> > >>>>>>> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by > >>>>>>> default by > >>>>>>> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' > >>>>>>> is. > >>>>>> > >>>>>> See above. > >>>>>> PMD doesn't know where this MTU value came from (default ethdev value > >>>>>> or user specified value) > >>>>>> and probably it shouldn't care. > >>>>>> > >>>>>>> > >>>>>>> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and > >>>>>>> 'max_rx_pkt_len' > >>>>>>> are updated (mostly). > >>>>>> > >>>>>> Yes, in mtu_set() we update both. > >>>>>> But we don't update MTU in dev_configure(), only max_rx_pkt_len. > >>>>>> That what this patch tries to fix (as I understand it). > >>>>> > >>>>> To be more precise - it doesn't change MTU value in dev_configure(), > >>>>> but instead doesn't allow max_rx_pkt_len to become smaller > >>>>> then MTU + OVERHEAD. > >>>>> Probably changing MTU value instead is a better choice. > >>>>> > >>>> > >>>> +1 to change mtu for this case. > >>>> And this is what happens in practice when there is no > >>>> 'rte_eth_dev_set_mtu()' > >>>> call, since PMD is using ('max_rx_pkt_len' - OVERHEAD) to set MTU. > >>> > >>> Hmm, I don't see that happens within Intel PMDs. > >>> As I can read the code: if user never call mtu_set(), then MTU value is > >>> left intact. > >>> > >> > >> I was checking ice, > >> in 'ice_dev_start()', 'rxmode.max_rx_pkt_len' is used to configure the > >> device. > > > > Yes, I am not arguing with that. > > What I am saying - dev_config() doesn't update MTU based on max_rx_pkt_len. > > While it probably should. > > > > Yes 'dev_configure()' doesn't update the 'dev->data->mtu' and > 'max_rx_pkt_len' & > 'dev->data->mtu' may diverge there. > > I think best place to update 'dev->data->mtu' is where the device is actually > updated, but to prevent the diversion above we can update 'dev->data->mtu' in > ethdev layer, in 'rte_eth_dev_configure()' based on 'max_rx_pkt_len', will it > work?
I think - yes. At least, I don't foresee any implications with that. > > Only concern I see is if user reads the MTU ('rte_eth_dev_get_mtu()') after > 'rte_eth_dev_configure()' but before device configured, user will get the > wrong > value, I guess that problem was already there but changing default value may > make it more visible. > > >> > >>>> But this won't solve the problem Steve is trying to solve. > >>> > >>> You mean we still need to update test-pmd code to calculate max_rx_pkt_len > >>> properly for default mtu value? > >>> > >> > >> Yes. > >> Because target of this set is able to receive packets with payload size > >> 'RTE_ETHER_MTU', if MTU is updated according to the provided > >> 'max_rx_pkt_len', > >> device still won't able to receive those packets. > > > > Agree. > > > >> > >>>>>>> > >>>>>>> > >>>>>>>> Before the patch will result: > >>>>>>>> mtu==1500, max_rx_pkt_len=1000; //out of sync looks wrong to me > >>>>>>>> > >>>>>>>> After the patch: > >>>>>>>> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour. > >>>>>>>> > >>>>>>>> If you think we need to preserve current behaviour, > >>>>>>>> then I suppose the easiest thing would be to change dev_config() code > >>>>>>>> to update mtu value based on max_rx_pkt_len. > >>>>>>>> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...} > >>>>>>>> So the code snippet above will result: > >>>>>>>> mtu=982,max_rx_pkt_len=1000; > >>>>>>>> > >>>>>>> > >>>>>>> The 'max_rx_ptk_len' is annoyance for a long time, what do you think > >>>>>>> to just > >>>>>>> drop it? > >>>>>>> > >>>>>>> By default device will be up with default MTU (1500), later > >>>>>>> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size > >>>>>>> setting at all. > >>>>>>> > >>>>>>> Will this work? > >>>>>> > >>>>>> I think it might, but that's a big change, probably too risky at that > >>>>>> stage... > >>>>>> > >>>> > >>>> Defintely, I was thinking for 21.11. Let me send a deprecation notice > >>>> and see > >>>> what happens. > >>>> > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> And for short term, for above Intel PMDs, there must be a place this > >>>>>>> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), > >>>>>>> that > >>>>>>> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME > >>>>>>> set, > >>>>>>> otherwise use the 'MTU' value. > >>>>>> > >>>>>> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set, > >>>>>> I think we still need to keep max_rx_pkt_len and MTU values in sync. > >>>>>> > >>>>>>> > >>>>>>> Without 'start()' updated the current logic won't work after stop & > >>>>>>> start anyway. > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> And why this same thing can't happen to other PMDs? If this is a > >>>>>>>>>> problem for > >>>>>>>>>> all PMDs, we should solve in other level, not for only some PMDs. > >>>>>>>>>> > >>>>>>>>> No, all PMDs exist the same issue, another proposal: > >>>>>>>>> - rte_ethdev provides the unique resize 'max_rx_pkt_len' in > >>>>>>>>> rte_eth_dev_configure(); > >>>>>>>>> - provide the uniform API for fetching the NIC's supported > >>>>>>>>> Ether Overhead size; > >>>>>>>>> Is it feasible? > >>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip > >>>>>>>>>>>> link > >>>>>>>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len > >>>>>>>>>>>> to > >>>>>>>>>>>> satisfy mtu requirement. > >>>>>>>>>>>> > >>>>>>>>>>>>> Should we just call ice_mtu_set(dev, > >>>>>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len) > >>>>>>>>>>>>> here? > >>>>>>>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to > >>>>>>>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the > >>>>>>>>>>>> 2nd > >>>>>>>>>>>> parameter, or not the max_rx_pkt_len. > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> And please remove above comment, since ether overhead is > >>>>>>>>>>>>>>> already > >>>>>>>>>>>>>> considered in ice_mtu_set. > >>>>>>>>>>>> Ether overhead is already considered in ice_mtu_set, but it also > >>>>>>>>>>>> should be considered as the adjustment condition that if > >>>>>>>>>>>> ice_mtu_set > >>>>>>>>>> need be invoked. > >>>>>>>>>>>> So, it perhaps should remain this comment before this if() > >>>>>>>>>>>> condition. > >>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return > >>>>>>>>>>>>>>>> +ret; } > >>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>> ret = ice_init_rss(pf); > >>>>>>>>>>>>>>>> if (ret) { > >>>>>>>>>>>>>>>> PMD_DRV_LOG(ERR, "Failed to enable rss for PF"); > >>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>> 2.17.1 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >>>>> > >>> > >