fengchengwen <[email protected]> writes:
> Hi all, > > This patch lasts for a long time. Are we waiting for 22.11 to deal with it? That was my read, as can't reliably change the value of _MAX at this stage without it having impact elsewhere. > We have the same requirements for the reset or recovery mechanism, but > there are differences: > > APP PMD > | | > | detect error > | <---report error event--- | > | | > do error stats | > and report | > | ---start recover--> | > | do recover > | <---report recover result | > | | > if succ just log > else may migrate > service > > Can we generalize these processes(means that the implementation is at the > framework layer)? or only at PMD API? > > > On 2022/2/15 0:06, Ray Kinsella wrote: >> >> Thomas Monjalon <[email protected]> writes: >> >>> 14/02/2022 11:16, Ray Kinsella: >>>> Ray Kinsella <[email protected]> writes: >>>>> Thomas Monjalon <[email protected]> writes: >>>>>> 02/02/2022 12:44, Ray Kinsella: >>>>>>> Ferruh Yigit <[email protected]> writes: >>>>>>>> On 1/28/2022 12:48 PM, Kalesh A P wrote: >>>>>>>>> --- a/lib/ethdev/rte_ethdev.h >>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h >>>>>>>>> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type { >>>>>>>>> RTE_ETH_EVENT_DESTROY, /**< port is released */ >>>>>>>>> RTE_ETH_EVENT_IPSEC, /**< IPsec offload related event */ >>>>>>>>> RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */ >>>>>>>>> + RTE_ETH_EVENT_ERR_RECOVERING, >>>>>>>>> + /**< port recovering from an error >>>>>>>>> + * >>>>>>>>> + * PMD detected a FW reset or error condition. >>>>>>>>> + * PMD will try to recover from the error. >>>>>>>>> + * Data path may be quiesced and Control path >>>>>>>>> operations >>>>>>>>> + * may fail at this time. >>>>>>>>> + */ >>>>>>>>> + RTE_ETH_EVENT_RECOVERED, >>>>>>>>> + /**< port recovered from an error >>>>>>>>> + * >>>>>>>>> + * PMD has recovered from the error condition. >>>>>>>>> + * Control path and Data path are up now. >>>>>>>>> + * PMD re-configures the port to the state >>>>>>>>> prior to the error. >>>>>>>>> + * Since the device has undergone a reset, flow >>>>>>>>> rules >>>>>>>>> + * offloaded prior to reset may be lost and >>>>>>>>> + * the application should recreate the rules >>>>>>>>> again. >>>>>>>>> + */ >>>>>>>>> RTE_ETH_EVENT_MAX /**< max value of this enum */ >>>>>>>> >>>>>>>> >>>>>>>> Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed >>>>>>>> more people >>>>>>>> to evaluate if it is a false positive: >>>>>>>> >>>>>>>> >>>>>>>> 1 function with some indirect sub-type change: >>>>>>>> [C] 'function int rte_eth_dev_callback_register(uint16_t, >>>>>>>> rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 >>>>>>>> has some indirect sub-type changes: >>>>>>>> parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type >>>>>>>> changes: >>>>>>>> underlying type 'int (typedef uint16_t, enum rte_eth_event_type, >>>>>>>> void*, void*)*' changed: >>>>>>>> in pointed to type 'function type int (typedef uint16_t, enum >>>>>>>> rte_eth_event_type, void*, void*)': >>>>>>>> parameter 2 of type 'enum rte_eth_event_type' has sub-type >>>>>>>> changes: >>>>>>>> type size hasn't changed >>>>>>>> 2 enumerator insertions: >>>>>>>> 'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value >>>>>>>> '11' >>>>>>>> 'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12' >>>>>>>> 1 enumerator change: >>>>>>>> 'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' >>>>>>>> to '13' at rte_ethdev.h:3807:1 >>>>>>> >>>>>>> I don't immediately see the problem that this would cause. >>>>>>> There are no array sizes etc dependent on the value of MAX for instance. >>>>>>> >>>>>>> Looks safe? >>>>>> >>>>>> We never know how this enum will be used by the application. >>>>>> The max value may be used for the size of an event array. >>>>>> It looks a real ABI issue unfortunately. >>>>> >>>>> Right - but we only really care about it when an array size based on MAX >>>>> is likely to be passed to DPDK, which doesn't apply in this case. >>> >>> I don't completely agree. >>> A developer may assume an event will never exceed MAX value. >>> However, after an upgrade of DPDK without app rebuild, >>> a higher event value may be received in the app, >>> breaking the assumption. >>> Should we consider this case as an ABI breakage? >> >> Nope - I think we should explicitly exclude MAX values from any >> ABI guarantee, as being able to change them is key to our be able to >> evolve DPDK while maintaining ABI stability. >> >> Consider what it means applying the ABI policy to a MAX value, you are >> in effect saying that that no value can be added to this enumeration >> until the next ABI version, for me this is very restrictive without a >> solid reason. >> >>> >>>>> I noted that some Linux folks explicitly mark similar MAX values as not >>>>> part of the ABI. >>>>> >>>>> /usr/include/linux/perf_event.h >>>>> 37: PERF_TYPE_MAX, /* non-ABI */ >>>>> 60: PERF_COUNT_HW_MAX, /* non-ABI */ >>>>> 79: PERF_COUNT_HW_CACHE_MAX, /* non-ABI */ >>>>> 87: PERF_COUNT_HW_CACHE_OP_MAX, /* non-ABI */ >>>>> 94: PERF_COUNT_HW_CACHE_RESULT_MAX, /* non-ABI */ >>>>> 116: PERF_COUNT_SW_MAX, /* non-ABI */ >>>>> 149: PERF_SAMPLE_MAX = 1U << 24, /* non-ABI */ >>>>> 151: __PERF_SAMPLE_CALLCHAIN_EARLY = 1ULL << 63, /* >>>>> non-ABI; internal use */ >>>>> 189: PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */ >>>>> 267: PERF_TXN_MAX = (1 << 8), /* non-ABI */ >>>>> 301: PERF_FORMAT_MAX = 1U << 4, /* non-ABI */ >>>>> 1067: PERF_RECORD_MAX, /* non-ABI */ >>>>> 1078: PERF_RECORD_KSYMBOL_TYPE_MAX /* non-ABI */ >>>>> 1087: PERF_BPF_EVENT_MAX, /* non-ABI */ >>>> >>>> Any thoughts on similarly annotating all our _MAX enums in the same way? >>>> We could also add a section in the ABI Policy to make it explicit _MAX >>>> enum values are not part of the ABI - what do folks think? >>> >>> Interesting. I am not sure it is always ABI-safe though. >> >> -- Regards, Ray K

