On Fri, 2021-04-02 at 18:11 +0100, Mark Brown wrote:
> On Thu, Mar 11, 2021 at 12:22:36PM +0200, Matti Vaittinen wrote:
> > +   if (d->fatal_cnt && h->retry_cnt > d->fatal_cnt) {
> > +           if (d->die)
> > +                   ret = d->die(rid);
> > +           else
> > +                   BUG();
> > +
> > +           /*
> > +            * If the 'last resort' IC recovery failed we will have
> > +            * nothing else left to do...
> > +            */
> > +           BUG_ON(ret);
> 
> This isn't good...  we should be trying to provide more system level
> handling of this, if nothing else it's quite possibly not a software
> bug
> here but rather a hardware failure.  An explicit message about what
> happened would be more likely to be understood as a hardware failure,

I do agree. I'll add a print in next version.

> and something which allows handling such as initiating a system
> shutdown
> would be good as well - I'm not sure if there's any existing
> mechanism
> to plumb userspace into, or perhaps some sort of policy configurable
> via
> sysfs.

I like the idea but don't know of such existing mechanism. The input
system power-key event is closest that comes to my mind - but I don't
think that would be quite right. Additionally, I am unsure what level
of user-space functionality can be expected to work? Maybe the severity
of configured notifications should be used to decide whether to do in-
kernel handling or to alert user-space. Anyways, that is something that
requires further pondering - I'd propose improving this later.

Best Regards
        Matti Vaittinen

Reply via email to