22/07/2021 15:50, fengchengwen: > Hi, all > > I notice ethdev support dev_reset ops, which could be used to recover from > errors, and only 13+ drivers support this function. > And also there is event for reset: RTE_ETH_EVENT_INTR_RESET, and only 6 > drivers support it (most of them are VF). > > This provides users with two ways to handle hardware errors: > a. driver report RTE_ETH_EVENT_INTR_RESET, and application do reset ops. > b. application detect errors (the detection method is unclear), and call > reset ops to recover. > > According to the design of this API, error handling is assigned to the > application, and the driver is only responsible for reporting events. This > simplifies the driver design (for example, the driver does not need to > maintain > mutex locks). > > As we know, many modern NICs come with firmware, have PCIE interfaces, > support SR-IOV, the hardware errors can have: firmware reboot/PF reset/ > VF reset/FLR, but these errors(particularly firmware/PF) are not addressed in > most drivers. > > Question 1: what do we think of these errors(particularly firmware/PF)? Do > we think that the probability is very low and that there is no need to deal > with > them?
Even rare errors must be managed. > Question 2: I prefer to put error handling in the application layer, > because > doing it in the driver can make the driver complex, but there is no app to > register the INTR_RESET event handler. I think we can build a standard handler > in testpmd, What do you think? Absolutely. As any ethdev API, it must be tested with testpmd.