Re: Abuse of warnings for unhandled errors and programming errors

Markus Armbruster Thu, 11 Sep 2025 23:52:18 -0700

Daniel P. Berrangé <[email protected]> writes:

> On Fri, Aug 08, 2025 at 11:30:32AM +0200, Markus Armbruster wrote:
>> In "[PATCH 00/12] Error reporting cleanup, a fix, and &error_warn
>> removal", I challenged the use of warnings in a few places.  I think the
>> topic deserves a wider audience than the one a rather pedestrian cleanup
>> series draws.
>> 
>> 
>> To make my case, I need to start with errors.  We distinguish between
>> ordinary errors (for lack of a better word) and programming errors.
>> 
>> Ordinary errors are things like nonsensical user requests, unavailable
>> resources, and so forth.  A correct program is prepared for such
>> failures, detects them, and reports them to the user.  The user can then
>> fix their request, try again when resources are available, and so forth.
>> 
>> Tools for reporting ordinary errors are error_report(),
>> error_report_err(), &error_fatal, and friends.
>
> The thing about nonsense user rquests / unavailable resources , etc
> is that almost none of them should imply exiting QEMU, except if they
> occur in the context of system startup before the VM starts executing.
> Once running we should do everything in our power to not let the users
> workload die.
>
> From that POV, I tend to wish that error_fatal did not exist and that
> we instead propagated all fatal errors up until reaching main(), so
> we were not at risk of using error_fatal in runtime scenarios. We're
> largely stuck with what we've got though, due to our need to retrofit
> error reporting in to our existing codebase design.


Yes, &error_fatal is almost always wrong after the guest starts.

When it isn't wrong, it's quite convenient.

> I do try to push back in review any time we introduce new code that
> doesn't propagate errors as high up the stack as possible/practcal.
>
>> Programming errors are bugs.  A developer needs to fix the program.
>> Unlike ordinary errors, programming errors are *unexpected*.
>> 
>> Programming errors are commonly not recoverable.  The proper tool for
>> unrecoverable ones is assertions.  &error_abort can be a convenient way
>> to assert "this can't fail".
>
> We could have called it &error_assert but that's bike shed colouring :-)

Yup :)

>> On to warnings.
>> 
>> When some failure doesn't prevent satisfying some request, an ordinary
>> error can be misleading.  We make it a warning instead then.
>> 
>> What if it's a programming error we recover from?
>> 
>> Aside: trying to recover in a buggy program is risky, but that's not the
>> debate I want to have here.
>> 
>> How do we want such recoverable programming errors reported?
>> 
>> Warning?  We seem to be abusing warnings this way, and I hate it.  What
>> we have to report is a *bug*, and we should make that crystal clear.
>> "warning: FunctionYouNeverHeardAbout() failed" does not.  It could be
>> anything, and you likely need to look at the source to find out.
>> 
>> Ordinary error reporting with "internal error: " prefix, so the user
>> understands this is a bug, and all they can do about it is report it?
>> 
>> Log the bug somehow?
>> 
>> Thoughts?
>
> I don't see 'warnings' as something directly actionable for a user.
> Rather they are messages that I would want to see included in a log
> file that a user attaches to a bug report if they find some behavioural
> problem. If the user understands the warning great, but that isn't a
> requirement.
>
> IOW, while informative warnings is of course better than not, as long
> as the warning message contains sufficient info for the maintainer to
> understand what happened the minimum quality bar is satisfied IMHO.

That's a low quality bar indeed.  Here's mine:

1. A warning should make perfectly clear whether this is a bug that
should be reported, or an issue with usage, resources, etc. that can be
ignored, but may help understand future trouble, if any (typically an
ordinary error that wasn't fully handled).  Our errors make bug
vs. ordinary error clear.

2. A warning of the former kind (bug) should provide information
developers need to start debugging.  For errors, we give them a core
dump, source file and line number.  For warnings, we currently give them
grep and warm wishes.

3. A warning of the latter kind (not a bug) should at least try to
provide hints that help users diagnose and correct / work around what's
wrong.  "warning: failed to WSAEventSelect()" doesn't.  "warning:
trouble initializing slirp for user mode networking" might.

Re: Abuse of warnings for unhandled errors and programming errors

Reply via email to