On Tue, Sep 10, 2024 at 06:53:19PM GMT, Raag Jadav wrote:
On Mon, Sep 09, 2024 at 03:01:50PM -0500, Lucas De Marchi wrote:
On Sun, Sep 08, 2024 at 11:08:39PM GMT, Asahi Lina wrote:
> On 9/8/24 12:07 AM, Lucas De Marchi wrote:
> > On Sat, Sep 07, 2024 at 08:38:30PM GMT, Asahi Lina wrote:
> > > On 9/6/24 6:42 PM, Raag Jadav wrote:
> > > > Introduce device wedged event, which will notify userspace of wedged
> > > > (hanged/unusable) state of the DRM device through a uevent. This is
> > > > useful especially in cases where the device is in unrecoverable state
> > > > and requires userspace intervention for recovery.
> > > >
> > > > Purpose of this implementation is to be vendor agnostic. Userspace
> > > > consumers (sysadmin) can define udev rules to parse this event and
> > > > take respective action to recover the device.
> > > >
> > > > Consumer expectations:
> > > > ----------------------
> > > > 1) Unbind driver
> > > > 2) Reset bus device
> > > > 3) Re-bind driver
> > >
> > > Is this supposed to be normative? For drm/asahi we have a "wedged"
> > > concept (firmware crashed), but the only possible recovery action is a
> > > full system reboot (which might still be desirable to allow userspace to
> > > trigger automatically in some scenarios) since there is no bus-level
> > > reset and no firmware reload possible.
> >
> > maybe let drivers hint possible/supported recovery mechanisms and then
> > sysadmin chooses what to do?
>
> How would we do this? A textual value for the event or something like
> that? ("WEDGED=bus-reset" vs "WEDGED=reboot"?)

If there's a need for more than one, than I think exposing the supported
ones sorted by "side effect" in sysfs would be good. Something like:

        $ cat /sys/class/drm/card0/device/wedge_recover
        rebind
        bus-reset
        reboot

How do we expect the drivers to flag supported ones? Extra hooks?

The comment above... wedge_recover would be a sysfs exposed by the
driver to userspace with the supported mechanisms.

WEDGED=<mechanism> (which is also crafted by the driver or with explicit
functions in drm) would report to userspace the minimum
needed mechanism for recovery.

Lucas De Marchi

Reply via email to