Thank you Ferruh for the review. Please see inline.

On Tue, Feb 1, 2022 at 5:41 PM Ferruh Yigit <ferruh.yi...@intel.com> wrote:

> On 1/28/2022 12:48 PM, Kalesh A P wrote:
> > From: Kalesh AP <kalesh-anakkur.pura...@broadcom.com>
> >
> > Adding support for the device reset and recovery events in the
> > rte_eth_event framework. FW error and FW reset conditions would be
> > managed internally by the PMD without needing application intervention.
> > In such cases, PMD would need reset/recovery events to notify application
> > that PMD is undergoing a reset.
> >
> > While most of the recovery process is transparent to the application
> since
> > most of the driver ensures recovery from FW reset or FW error conditions,
> > the application will have to reprogram any flows which were offloaded to
> > the underlying hardware.
> >
> > Signed-off-by: Kalesh AP <kalesh-anakkur.pura...@broadcom.com>
> > Signed-off-by: Somnath Kotur <somnath.ko...@broadcom.com>
> > Reviewed-by: Ajit Khaparde <ajit.khapa...@broadcom.com>
>
> More developer cc'ed.
>
> > ---
> >   doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++
> >   lib/ethdev/rte_ethdev.h                 | 18 ++++++++++++++++++
> >   2 files changed, 42 insertions(+)
> >
> > diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> b/doc/guides/prog_guide/poll_mode_drv.rst
> > index 6831289..9ecc0e4 100644
> > --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > @@ -623,3 +623,27 @@ by application.
> >   The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
> >   the application to handle reset event. It is duty of application to
> >   handle all synchronization before it calls rte_eth_dev_reset().
> > +
> > +Error recovery support
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +When the PMD detects a FW reset or error condition, it may try to
> recover
> > +from the error without needing the application intervention. In such
> cases,
> > +PMD would need events to notify the application that it is undergoing
> > +an error recovery.
> > +
> > +The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
> > +application that PMD detected a FW reset or FW error condition. PMD may
> > +try to recover from the error by itself. Data path may be quiesced and
> > +control path operations may fail during the recovery period. The
> application
> > +should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from
> the PMD.
> > +
>
> Between the time FW error occurred and the application receive the
> RECOVERING event,
> datapath will continue to poll and application may call control APIs, so
> the event
> really is not solving the issue and driver somehow should be sure this
> won't crash
> the application, in that case not sure about the benefit of this event.
>
[Kalesh]: As soon as the driver detects a FW dead or reset condition, it
sets the fastpath pointers to dummy functions. This will prevent the crash.
All control path operations would fail with -EBUSY. This change is already
there in bnxt PMD. This event is a notification to the application that the
PMD is recovering from a FW error condition so that it can stop polling and
issue control path operations.

>
> > +The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the
> application
> > +that the it has recovered from the error condition. PMD re-configures
> the port
> > +to the state prior to the error condition. Control path and data path
> are up now.
> > +Since the device has undergone a reset, flow rules offloaded prior to
> reset
> > +may be lost and the application should recreate the rules again.
> > +
>
> I think the most difficult part here is clarify what application should do
> when this event received consistent for all devices, "flow rules may be
> lost"
> looks very vague to me.
> Unless it is not clear for application what to do when this event is
> received,
> it is not that useful or it will be specific to some PMDs. And I can see
> it is
> hard to clarify this but perhaps we can define a set of common behavior.
> Not sure what others are thinking.
>
[Kalesh]: Sure, let's wait for others' opinions as well.

>
> > +The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the
> application
> > +that it has failed to recover from the error condition. The device may
> not be
> > +usable anymore.
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index 147cc1c..a46819f 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
> >       RTE_ETH_EVENT_DESTROY,  /**< port is released */
> >       RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> >       RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
> > +     RTE_ETH_EVENT_ERR_RECOVERING,
> > +                     /**< port recovering from an error
> > +                      *
> > +                      * PMD detected a FW reset or error condition.
> > +                      * PMD will try to recover from the error.
> > +                      * Data path may be quiesced and Control path
> operations
> > +                      * may fail at this time.
> > +                      */
>
> Please put multi line comments before enum, Andrew did a set of cleanups
> for these.
>
[Kalesh]: Sure, will do.

>
> > +     RTE_ETH_EVENT_RECOVERED,
> > +                     /**< port recovered from an error
> > +                      *
> > +                      * PMD has recovered from the error condition.
> > +                      * Control path and Data path are up now.
> > +                      * PMD re-configures the port to the state prior
> to the error.
> > +                      * Since the device has undergone a reset, flow
> rules
> > +                      * offloaded prior to reset may be lost and
> > +                      * the application should recreate the rules again.
> > +                      */
> >       RTE_ETH_EVENT_MAX       /**< max value of this enum */
> >   };
> >
>
>

-- 
Regards,
Kalesh A P

Reply via email to