Ferruh Yigit <ferruh.yi...@intel.com> writes:

> On 1/28/2022 12:48 PM, Kalesh A P wrote:
>> From: Kalesh AP <kalesh-anakkur.pura...@broadcom.com>
>> Adding support for the device reset and recovery events in the
>> rte_eth_event framework. FW error and FW reset conditions would be
>> managed internally by the PMD without needing application intervention.
>> In such cases, PMD would need reset/recovery events to notify application
>> that PMD is undergoing a reset.
>> While most of the recovery process is transparent to the application since
>> most of the driver ensures recovery from FW reset or FW error conditions,
>> the application will have to reprogram any flows which were offloaded to
>> the underlying hardware.
>> Signed-off-by: Kalesh AP <kalesh-anakkur.pura...@broadcom.com>
>> Signed-off-by: Somnath Kotur <somnath.ko...@broadcom.com>
>> Reviewed-by: Ajit Khaparde <ajit.khapa...@broadcom.com>
>> ---
>>   doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++
>>   lib/ethdev/rte_ethdev.h                 | 18 ++++++++++++++++++
>>   2 files changed, 42 insertions(+)
>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>> b/doc/guides/prog_guide/poll_mode_drv.rst
>> index 6831289..9ecc0e4 100644
>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>> @@ -623,3 +623,27 @@ by application.
>>   The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
>>   the application to handle reset event. It is duty of application to
>>   handle all synchronization before it calls rte_eth_dev_reset().
>> +
>> +Error recovery support
>> +~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +When the PMD detects a FW reset or error condition, it may try to recover
>> +from the error without needing the application intervention. In such cases,
>> +PMD would need events to notify the application that it is undergoing
>> +an error recovery.
>> +
>> +The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
>> +application that PMD detected a FW reset or FW error condition. PMD may
>> +try to recover from the error by itself. Data path may be quiesced and
>> +control path operations may fail during the recovery period. The application
>> +should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from the 
>> PMD.
>> +
>> +The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the 
>> application
>> +that the it has recovered from the error condition. PMD re-configures the 
>> port
>> +to the state prior to the error condition. Control path and data path are 
>> up now.
>> +Since the device has undergone a reset, flow rules offloaded prior to reset
>> +may be lost and the application should recreate the rules again.
>> +
>> +The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the 
>> application
>> +that it has failed to recover from the error condition. The device may not 
>> be
>> +usable anymore.
>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>> index 147cc1c..a46819f 100644
>> --- a/lib/ethdev/rte_ethdev.h
>> +++ b/lib/ethdev/rte_ethdev.h
>> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>>      RTE_ETH_EVENT_DESTROY,  /**< port is released */
>>      RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>>      RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>> +    RTE_ETH_EVENT_ERR_RECOVERING,
>> +                    /**< port recovering from an error
>> +                     *
>> +                     * PMD detected a FW reset or error condition.
>> +                     * PMD will try to recover from the error.
>> +                     * Data path may be quiesced and Control path operations
>> +                     * may fail at this time.
>> +                     */
>> +    RTE_ETH_EVENT_RECOVERED,
>> +                    /**< port recovered from an error
>> +                     *
>> +                     * PMD has recovered from the error condition.
>> +                     * Control path and Data path are up now.
>> +                     * PMD re-configures the port to the state prior to the 
>> error.
>> +                     * Since the device has undergone a reset, flow rules
>> +                     * offloaded prior to reset may be lost and
>> +                     * the application should recreate the rules again.
>> +                     */
>>      RTE_ETH_EVENT_MAX       /**< max value of this enum */
>
>
> Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more 
> people
> to evaluate if it is a false positive:
>
>
> 1 function with some indirect sub-type change:
>   [C] 'function int rte_eth_dev_callback_register(uint16_t, 
> rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has 
> some indirect sub-type changes:
>     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
>       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, 
> void*)*' changed:
>         in pointed to type 'function type int (typedef uint16_t, enum 
> rte_eth_event_type, void*, void*)':
>           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
>             type size hasn't changed
>             2 enumerator insertions:
>               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
>               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
>             1 enumerator change:
>               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' 
> at rte_ethdev.h:3807:1

I don't immediately see the problem that this would cause.
There are no array sizes etc dependent on the value of MAX for instance.

Looks safe?

-- 
Regards, Ray K

Reply via email to