On 17/09/24 13:33, Ghimiray, Himal Prasad wrote: > > > On 17-09-2024 12:08, Raag Jadav wrote: >> On Tue, Sep 17, 2024 at 10:11:05AM +0530, Ghimiray, Himal Prasad wrote: >>> On 17-09-2024 09:32, Raag Jadav wrote: >>>> This was previously attempted as xe specific reset uevent but dropped >>>> in commit 77a0d4d1cea2 ("drm/xe/uapi: Remove reset uevent for now") >>>> as part of refactoring. >>>> >>>> Now that we have device wedged event supported by DRM core, make use >>>> of it. With this in place userspace will be notified of wedged device, >>>> on the basis of which, userspace may take respective action to recover >>>> the device. >>> >>> >>> As per earlier discussions, the UAPI was also supposed to provide the reason >>> for wedging( which is supposedly used by L0). IS that requirement nomore in >>> place ? >> >> Wondering how does that contribute to the usecase? > > > ZES_EVENT_TYPE_FLAG_DEVICE_RESET_REQUIRED uses zesDeviceGetState > > "Get information about the state of the device - if a reset is required, > reasons for the reset and if the device has been repaired. " > > https://spec.oneapi.io/level-zero/latest/sysman/api.html#zes__api_8h_1aec73230b938f08ad632d0b7817b66183 L0 doesn't read this uevent to know the reason, this uevent is for L0 to know that reset is required https://spec.oneapi.io/level-zero/latest/sysman/api.html#_CPPv4N21zes_event_type_flag_t41ZES_EVENT_TYPE_FLAG_DEVICE_RESET_REQUIREDE.
The reason is via a different API via https://spec.oneapi.io/level-zero/latest/sysman/api.html#zesdevicegetstate for which they can open any IOCTL which will fail with -ECANCELED when device is wedged and by that they can know the reason. Thanks, Aravind. > >> >> Raag