Re: [dpdk-dev] [PATCH v7 1/4] ethdev: support device reset and recovery events

Ray Kinsella Tue, 24 May 2022 08:11:30 -0700


fengchengwen <[email protected]> writes:


> Hi all,
>
>   This patch lasts for a long time. Are we waiting for 22.11 to deal with it?

That was my read, as can't reliably change the value of _MAX at this
stage without it having impact elsewhere. 


>   We have the same requirements for the reset or recovery mechanism, but 
> there are differences:
>
>     APP                                    PMD
>      |                                      |
>      |                                  detect error
>      |     <---report error event---        |
>      |                                      |
> do error stats                              |
> and report                                  |
>      |      ---start recover-->             |
>      |                                  do recover
>      |     <---report recover result        |
>      |                                      |
> if succ just log
> else may migrate
> service
>
> Can we generalize these processes(means that the implementation is at the 
> framework layer)? or only at PMD API?
>
>
> On 2022/2/15 0:06, Ray Kinsella wrote:
>> 
>> Thomas Monjalon <[email protected]> writes:
>> 
>>> 14/02/2022 11:16, Ray Kinsella:
>>>> Ray Kinsella <[email protected]> writes:
>>>>> Thomas Monjalon <[email protected]> writes:
>>>>>> 02/02/2022 12:44, Ray Kinsella:
>>>>>>> Ferruh Yigit <[email protected]> writes:
>>>>>>>> On 1/28/2022 12:48 PM, Kalesh A P wrote:
>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>>>>> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>>>>>>>>>       RTE_ETH_EVENT_DESTROY,  /**< port is released */
>>>>>>>>>       RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>>>>>>>>>       RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>>>>>>>>> +     RTE_ETH_EVENT_ERR_RECOVERING,
>>>>>>>>> +                     /**< port recovering from an error
>>>>>>>>> +                      *
>>>>>>>>> +                      * PMD detected a FW reset or error condition.
>>>>>>>>> +                      * PMD will try to recover from the error.
>>>>>>>>> +                      * Data path may be quiesced and Control path 
>>>>>>>>> operations
>>>>>>>>> +                      * may fail at this time.
>>>>>>>>> +                      */
>>>>>>>>> +     RTE_ETH_EVENT_RECOVERED,
>>>>>>>>> +                     /**< port recovered from an error
>>>>>>>>> +                      *
>>>>>>>>> +                      * PMD has recovered from the error condition.
>>>>>>>>> +                      * Control path and Data path are up now.
>>>>>>>>> +                      * PMD re-configures the port to the state 
>>>>>>>>> prior to the error.
>>>>>>>>> +                      * Since the device has undergone a reset, flow 
>>>>>>>>> rules
>>>>>>>>> +                      * offloaded prior to reset may be lost and
>>>>>>>>> +                      * the application should recreate the rules 
>>>>>>>>> again.
>>>>>>>>> +                      */
>>>>>>>>>       RTE_ETH_EVENT_MAX       /**< max value of this enum */
>>>>>>>>
>>>>>>>>
>>>>>>>> Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed 
>>>>>>>> more people
>>>>>>>> to evaluate if it is a false positive:
>>>>>>>>
>>>>>>>>
>>>>>>>> 1 function with some indirect sub-type change:
>>>>>>>>   [C] 'function int rte_eth_dev_callback_register(uint16_t, 
>>>>>>>> rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 
>>>>>>>> has some indirect sub-type changes:
>>>>>>>>     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type 
>>>>>>>> changes:
>>>>>>>>       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, 
>>>>>>>> void*, void*)*' changed:
>>>>>>>>         in pointed to type 'function type int (typedef uint16_t, enum 
>>>>>>>> rte_eth_event_type, void*, void*)':
>>>>>>>>           parameter 2 of type 'enum rte_eth_event_type' has sub-type 
>>>>>>>> changes:
>>>>>>>>             type size hasn't changed
>>>>>>>>             2 enumerator insertions:
>>>>>>>>               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value 
>>>>>>>> '11'
>>>>>>>>               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
>>>>>>>>             1 enumerator change:
>>>>>>>>               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' 
>>>>>>>> to '13' at rte_ethdev.h:3807:1
>>>>>>>
>>>>>>> I don't immediately see the problem that this would cause.
>>>>>>> There are no array sizes etc dependent on the value of MAX for instance.
>>>>>>>
>>>>>>> Looks safe?
>>>>>>
>>>>>> We never know how this enum will be used by the application.
>>>>>> The max value may be used for the size of an event array.
>>>>>> It looks a real ABI issue unfortunately.
>>>>>
>>>>> Right - but we only really care about it when an array size based on MAX
>>>>> is likely to be passed to DPDK, which doesn't apply in this case.
>>>
>>> I don't completely agree.
>>> A developer may assume an event will never exceed MAX value.
>>> However, after an upgrade of DPDK without app rebuild,
>>> a higher event value may be received in the app,
>>> breaking the assumption.
>>> Should we consider this case as an ABI breakage?
>> 
>> Nope - I think we should explicitly exclude MAX values from any
>> ABI guarantee, as being able to change them is key to our be able to
>> evolve DPDK while maintaining ABI stability. 
>> 
>> Consider what it means applying the ABI policy to a MAX value, you are
>> in effect saying that that no value can be added to this enumeration
>> until the next ABI version, for me this is very restrictive without a
>> solid reason. 
>> 
>>>
>>>>> I noted that some Linux folks explicitly mark similar MAX values as not
>>>>> part of the ABI.
>>>>>
>>>>> /usr/include/linux/perf_event.h
>>>>> 37:     PERF_TYPE_MAX,                          /* non-ABI */
>>>>> 60:     PERF_COUNT_HW_MAX,                      /* non-ABI */
>>>>> 79:     PERF_COUNT_HW_CACHE_MAX,                /* non-ABI */
>>>>> 87:     PERF_COUNT_HW_CACHE_OP_MAX,             /* non-ABI */
>>>>> 94:     PERF_COUNT_HW_CACHE_RESULT_MAX,         /* non-ABI */
>>>>> 116:    PERF_COUNT_SW_MAX,                      /* non-ABI */
>>>>> 149:    PERF_SAMPLE_MAX = 1U << 24,             /* non-ABI */
>>>>> 151:    __PERF_SAMPLE_CALLCHAIN_EARLY           = 1ULL << 63, /*
>>>>> non-ABI; internal use */
>>>>> 189:    PERF_SAMPLE_BRANCH_MAX_SHIFT            /* non-ABI */
>>>>> 267:    PERF_TXN_MAX            = (1 << 8), /* non-ABI */
>>>>> 301:    PERF_FORMAT_MAX = 1U << 4,              /* non-ABI */
>>>>> 1067:   PERF_RECORD_MAX,                        /* non-ABI */
>>>>> 1078:   PERF_RECORD_KSYMBOL_TYPE_MAX            /* non-ABI */
>>>>> 1087:   PERF_BPF_EVENT_MAX,             /* non-ABI */
>>>>
>>>> Any thoughts on similarly annotating all our _MAX enums in the same way?
>>>> We could also add a section in the ABI Policy to make it explicit _MAX
>>>> enum values are not part of the ABI - what do folks think?
>>>
>>> Interesting. I am not sure it is always ABI-safe though.
>> 
>> 


-- 
Regards, Ray K

Re: [dpdk-dev] [PATCH v7 1/4] ethdev: support device reset and recovery events

Reply via email to