Hi Gaetan From: Gaëtan Rivet, Thursday, February 8, 2018 7:20 PM > Hi Matan, > > Thanks for dealing with this. > > On Thu, Feb 08, 2018 at 04:34:12PM +0000, Matan Azrad wrote: > > Fail-safe PMD uses per sub-device flag called "remove" to indicate the > > scope where the sub-device isn't synchronized with the fail-safe state. > > > > This flag is set when fail-safe gets RMV notification about the > > physical removal of the sub-device and should be unset when the > > sub-device completes all the configurations cause it to arrive to the > > fail-safe state. > > > > The previous code wrongly unsets the flag after calling to the > > sub-device PMD dev_configure() operation and before all the > > configurations were done. > > > > Change the remove flag unsetting to be only after the sub-device > > successes to arrive to the fail-safe state. > > > > I'm not sure this is the right way to do this. > I think it's clear that it was a mistake to set sdev->remove to 0 only during > fs_dev_configure. > > The flag itself only means "there is something to be done on this device, > please clean up". > > Once the clean-up has happened, then the flag is not necessary anymore > and should be reset. > > So I thought that this fix would actually put the flag reset within > fs_dev_remove, right before reinstalling the hotplug alarm. > > At this point, the device state would have been set back to DEV_UNDEFINED, > so the remove flag is unnecessary for any operation trying to avoid > unplugged slaves. > > The "remove" flag is initialized at 0 when sub-devices are allocated (during > fail-safe init). This means that there would be a difference in the state of > the > slave between its first initialization and any subsequent init, after one > successful plugout. >
But what's about plug-in process? Do you want to allow control commands for a sub-device while it is plugging-in? Unset the remove flag in fs_dev_remove allows to control commands to occur in parallel to plug in process. Maybe the name of the flag should be changed to unsynchronized. > > Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Matan Azrad <ma...@mellanox.com> > > --- > > drivers/net/failsafe/failsafe_ether.c | 2 ++ > > drivers/net/failsafe/failsafe_ops.c | 2 +- > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/failsafe/failsafe_ether.c > > b/drivers/net/failsafe/failsafe_ether.c > > index 4c6e938..ca42376 100644 > > --- a/drivers/net/failsafe/failsafe_ether.c > > +++ b/drivers/net/failsafe/failsafe_ether.c > > @@ -377,6 +377,8 @@ > > i); > > goto err_remove; > > } > > + if (PRIV(dev)->state < DEV_STARTED) > > + sdev->remove = 0; > > Here the remove flag should already be 0. If it isn't, this is a > (logical) bug, which should be properly addressed instead of patched in this > way. Same answer as above. > > } > > } > > /* > > diff --git a/drivers/net/failsafe/failsafe_ops.c > > b/drivers/net/failsafe/failsafe_ops.c > > index 7a67e16..a7c2dba 100644 > > --- a/drivers/net/failsafe/failsafe_ops.c > > +++ b/drivers/net/failsafe/failsafe_ops.c > > @@ -131,7 +131,6 @@ > > dev->data->dev_conf.intr_conf.lsc = 0; > > } > > DEBUG("Configuring sub-device %d", i); > > - sdev->remove = 0; > > This is correct. > > > ret = rte_eth_dev_configure(PORT_ID(sdev), > > dev->data->nb_rx_queues, > > dev->data->nb_tx_queues, > > @@ -197,6 +196,7 @@ > > return ret; > > } > > sdev->state = DEV_STARTED; > > + sdev->remove = 0; > > This seems unnecessary, if this operation was already performed once the > device has been properly removed. Same answer as above. > > } > > if (PRIV(dev)->state < DEV_STARTED) > > PRIV(dev)->state = DEV_STARTED; > > -- > > 1.8.3.1 > > > > -- > Gaëtan Rivet > 6WIND