Hi Matan,

Thanks for dealing with this.

On Thu, Feb 08, 2018 at 04:34:12PM +0000, Matan Azrad wrote:
> Fail-safe PMD uses per sub-device flag called "remove" to indicate the
> scope where the sub-device isn't synchronized with the fail-safe state.
> 
> This flag is set when fail-safe gets RMV notification about the
> physical removal of the sub-device and should be unset when the
> sub-device completes all the configurations cause it to arrive to the
> fail-safe state.
> 
> The previous code wrongly unsets the flag after calling to the
> sub-device PMD dev_configure() operation and before all the
> configurations were done.
> 
> Change the remove flag unsetting to be only after the sub-device
> successes to arrive to the fail-safe state.
> 

I'm not sure this is the right way to do this.
I think it's clear that it was a mistake to set sdev->remove to 0
only during fs_dev_configure.

The flag itself only means "there is something to be done on this
device, please clean up".

Once the clean-up has happened, then the flag is not necessary anymore
and should be reset.

So I thought that this fix would actually put the flag reset within
fs_dev_remove, right before reinstalling the hotplug alarm.

At this point, the device state would have been set back to
DEV_UNDEFINED, so the remove flag is unnecessary for any operation
trying to avoid unplugged slaves.

The "remove" flag is initialized at 0 when sub-devices are allocated
(during fail-safe init). This means that there would be a difference in
the state of the slave between its first initialization and any
subsequent init, after one successful plugout.

> Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Matan Azrad <ma...@mellanox.com>
> ---
>  drivers/net/failsafe/failsafe_ether.c | 2 ++
>  drivers/net/failsafe/failsafe_ops.c   | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/failsafe/failsafe_ether.c 
> b/drivers/net/failsafe/failsafe_ether.c
> index 4c6e938..ca42376 100644
> --- a/drivers/net/failsafe/failsafe_ether.c
> +++ b/drivers/net/failsafe/failsafe_ether.c
> @@ -377,6 +377,8 @@
>                                     i);
>                               goto err_remove;
>                       }
> +                     if (PRIV(dev)->state < DEV_STARTED)
> +                             sdev->remove = 0;

Here the remove flag should already be 0. If it isn't, this is a
(logical) bug, which should be properly addressed instead of patched
in this way.

>               }
>       }
>       /*
> diff --git a/drivers/net/failsafe/failsafe_ops.c 
> b/drivers/net/failsafe/failsafe_ops.c
> index 7a67e16..a7c2dba 100644
> --- a/drivers/net/failsafe/failsafe_ops.c
> +++ b/drivers/net/failsafe/failsafe_ops.c
> @@ -131,7 +131,6 @@
>                       dev->data->dev_conf.intr_conf.lsc = 0;
>               }
>               DEBUG("Configuring sub-device %d", i);
> -             sdev->remove = 0;

This is correct.

>               ret = rte_eth_dev_configure(PORT_ID(sdev),
>                                       dev->data->nb_rx_queues,
>                                       dev->data->nb_tx_queues,
> @@ -197,6 +196,7 @@
>                       return ret;
>               }
>               sdev->state = DEV_STARTED;
> +             sdev->remove = 0;

This seems unnecessary, if this operation was already performed once the
device has been properly removed.

>       }
>       if (PRIV(dev)->state < DEV_STARTED)
>               PRIV(dev)->state = DEV_STARTED;
> -- 
> 1.8.3.1
> 

-- 
Gaëtan Rivet
6WIND

Reply via email to