On Thu, Nov 02, 2017 at 02:52:16PM +0100, Gaëtan Rivet wrote: > On Wed, Nov 01, 2017 at 08:12:38PM +0000, Ophir Munk wrote: > > failsafe device has vlan stripping configured at startup however once > > a sub device is found as non-capable of vlan-stripping failsafe > > updates it configuration and removes vlan stripping from it. > > This update occurs only once at startup. Following a later plugin > > attempt and in case of vlan stripping mismatch between failsafe > > configuration and device capability - failsafe cannot recover and the > > device remains constantly in plug out state. > > > > The sequence of events leading to this situation is described as > > follows: > > 1. Start testpmd with failsafe where mlx4 is a sub device (not capable > > of vlan stripping). Expected printout: > > PMD: net_failsafe: Disabling VLAN stripping offload > > 2. Execute: > > testpmd> port stop all > > testpmd> port config all max-pkt-len 2048 > > testpmd> port start all > > 3. Do a plug out (e.g. disable sriov) > > 4. Do a plug in (e.g. enable sriov) > > 5. Expected result: failsafe successfully configures and starts its sub > > devices > > Actual result: failsafe is continuously failing with these messages: > > PMD: net_failsafe: VLAN stripping offload requested but not supported by > > sub_device 0 > > PMD: net_failsafe: device already configured, cannot fix live > > configuration > > PMD: net_failsafe: Unable to synchronize sub device state > > > > Root cause analysis: at startup failsafe removes vlan stripping from its > > configuration. After executing "port config all max-pkt-len 2048" > > testpmd marks failsafe in need for configuration update. > > After executing "port start all" testpmd overrides failsafe > > configuration with its own configuration which includes vlan stripping > > > > Have you tried launching testpmd with the option > > "--disable-hw-vlan" > > as your mlx4 port does not support it? >
On a second thought, I think there is a simple solution: The fail-safe should stop trying to be clever with port configuration. On rte_eth_dev_configure, simply apply the user configuration (without trying to detect support and disabling flags on the fly). If a PMD has an issue, it should warn the user. If it has an issue but does not warn, it is a bug for this PMD. This is the case for MLX4: either the PMD changes its behavior, or not, as long as users are fine with it. So a proper fix would be to remove the checks (fs_port_offload_validate and fs_port_disable_offload) and depend on the sub-device for proper configuration vetting. Thoughts? > > During the plugin attempt failsafe refuses to update its configuration > > by removing vlan stripping since it has already updated its > > configuration at startup. > > > > The fix is to remove the limitation of one time configuration at > > startup and allow it during plugin attempts. > > > > Cc: sta...@dpdk.org > > Fixes: bbc6a53dda44 ("net/failsafe: support Rx offload capabilities") > > > > Signed-off-by: Ophir Munk <ophi...@mellanox.com> > > --- > > The commit message includes bug and fix descriptions > > --- > > drivers/net/failsafe/failsafe_ops.c | 10 ---------- > > 1 file changed, 10 deletions(-) > > > > diff --git a/drivers/net/failsafe/failsafe_ops.c > > b/drivers/net/failsafe/failsafe_ops.c > > index f460551..953ee65 100644 > > --- a/drivers/net/failsafe/failsafe_ops.c > > +++ b/drivers/net/failsafe/failsafe_ops.c > > @@ -187,16 +187,6 @@ > > continue; > > DEBUG("Checking capabilities for sub_device %d", i); > > while ((capa_flag = fs_port_offload_validate(dev, sdev))) { > > - /* > > - * Refuse to change configuration if multiple devices > > - * are present and we already have configured at least > > - * some of them. > > - */ > > - if (PRIV(dev)->state >= DEV_ACTIVE && > > - PRIV(dev)->subs_tail > 1) { > > - ERROR("device already configured, cannot fix > > live configuration"); > > - return -1; > > - } > > ret = fs_port_disable_offload(&dev->data->dev_conf, > > capa_flag); > > if (ret) { > > -- > > 1.8.3.1 > > > > -- > Gaëtan Rivet > 6WIND -- Gaëtan Rivet 6WIND