Halil Pasic <pa...@linux.ibm.com> writes:

> On Wed, 10 May 2023 08:32:12 +0200
> Markus Armbruster <arm...@redhat.com> wrote:
>
>> Halil Pasic <pa...@linux.ibm.com> writes:
>> 
>> > On Mon, 08 May 2023 11:01:55 +0200
>> > Cornelia Huck <coh...@redhat.com> wrote:
>> >  
>> >> On Mon, May 08 2023, Markus Armbruster <arm...@redhat.com> wrote:
> [..]
>> > and we do check for availability and cover that via -ENOSYS.  
>> 
>> Yes, kvm_s390_flic_realize() checks and sets ->clear_io_supported
>> accordingly, and kvm_s390_clear_io_flic() returns -ENOSYS when it's
>> false.
>> 
>> Doc on the actual set:
>
> Right. Sorry for the misinformation.

No problem!  With the clue you provided, the exact match was easy to
find :)

>>   4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR
>>   --------------------------------------------
>> 
>>   :Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device,
>>                KVM_CAP_VCPU_ATTRIBUTES for vcpu device
>>                KVM_CAP_SYS_ATTRIBUTES for system (/dev/kvm) device (no set)
>>   :Type: device ioctl, vm ioctl, vcpu ioctl
>>   :Parameters: struct kvm_device_attr
>>   :Returns: 0 on success, -1 on error
>> 
>>   Errors:
>> 
>>     =====   =============================================================
>>     ENXIO   The group or attribute is unknown/unsupported for this device
>>             or hardware support is missing.
>>     EPERM   The attribute cannot (currently) be accessed this way
>>             (e.g. read-only attribute, or attribute that only makes
>>             sense when the device is in a different state)
>>     =====   =============================================================
>> 
>>     Other error conditions may be defined by individual device types.
>> 
>>   Gets/sets a specified piece of device configuration and/or state.  The
>>   semantics are device-specific.  See individual device documentation in
>>   the "devices" directory.  As with ONE_REG, the size of the data
>>   transferred is defined by the particular attribute.
>> 
>>   ::
>> 
>>     struct kvm_device_attr {
>>           __u32      flags;          /* no flags currently defined */
>>           __u32      group;          /* device-defined */
>>           __u64      attr;           /* group-defined */
>>           __u64      addr;           /* userspace address of attr data */
>>     };
>> 
>> 
>> kvm_s390_flic_realize() sets ->fd is to refer to the KVM_DEV_TYPE_FLIC
>> it creates.  I guess that means ENXIO and EPERM should never happen.
>
> I agree.
>
>> 
>> > For KVM_DEV_FLIC_CLEAR_IO_IRQ is just the following error code
>> > documented in linux/Documentation/virt/kvm/devices/s390_flic.rst
>> > which is to my knowledge the most authoritative source.
>> > """
>> > .. note:: The KVM_DEV_FLIC_CLEAR_IO_IRQ ioctl will return EINVAL in case a
>> >           zero schid is specified
>> > """
>> > but a look in the code will tell us that -EFAULT is also possible if the
>> > supplied address is broken.  
>> 
>> Common behavior.
>>
>
> Makes sense, just that I did not find it in the interface
> description/documentation.
>  
>> > To sum it up, there is nothing to go wrong with the given operation, and
>> > to my best knowledge seeing an error code on the ioctl would either
>> > indicate a programming error on the client side (QEMU messed it up) or
>> > there is something wrong with the kernel.  
>> 
>> Abort on "QEMU messed up" is proper.  Abort on "something wrong with the
>> kernel" less so.  More on that below.
>> 
>
> I think I understand where are you coming from. IMHO it boils down
> to how broken the kernel is.
>
>> >> > Is the error condition fatal, i.e. continuing would be unsafe?  
>> >
>> > If the kernel is broken, probably. It is certainly unexpected.
>> >  
>> >> >
>> >> > If it's a fatal programming error, then abort() is appropriate.
>> >> >
>> >> > If it's fatal, but not a programming error, we should exit(1) instead.  
>> >
>> > It might not be a QEMU programming error. I really see no reason why
>> > would a combination of a sane QEMU and a sane kernel give us another
>> > error code than -ENOSYS.
>> >  
>> >> >
>> >> > If it's a survivable programming error, use of abort() is a matter of
>> >> > taste.    
>> >
>> > The fact that we might have failed to clear up some interrupts which we
>> > are obligated to clean up by the s390 architecture is not expected to
>> > have grave consequences.   
>> 
>> Good to know.
>> 
>> >> From what I remember, this was introduced to clean up a potentially
>> >> queued interrupt that is not supposed to be delivered, so the worst
>> >> thing that could happen on failure is a spurious interrupt (same as what
>> >> could happen if the kernel flic doesn't provide this function in the
>> >> first place.) My main worry would be changes/breakages on the kernel
>> >> side (while the QEMU side remains unchanged).  
>> >
>> > Agreed. And I hope anybody changing the kernel would test the new error
>> > code and notice the QEMU crashes. This was my intention in the first
>> > place.
>> >  
>> >> So, I think we should continue to log the error in any case; but I don't
>> >> have a strong opinion as to whether we should use exit(1) (as I wouldn't
>> >> consider it a programming error) or just continue. Halil, your choice :)  
>> >
>> > Neither do I have a strong opinion. I think a hard crash is easier to
>> > spot than a warning message (I mean both in CI and in case of manual
>> > testing). But it is a trade-off. Just carrying on without checking error
>> > codes is in my opinion not really likely to get us in the pickle either.
>> > I don't think the function preformed is essential. Especially not for a
>> > Linux guest. For me this is really an 'assert' situation. Is there a
>> > QEMU way of dealing with that?  
>> 
>> My concern is unnecessary hard crashes in production, risking data loss.
>> 
>
> Nod.
>
>> When continuing would be unsafe, we should terminate the process.
>> 
>> When termination is due to programming error in the program itself, we
>> should use abort() to tell the user they need to get developers
>> involved, and also to produce a core dump for developers.
>> 
>
> So let us  assume we see a -EFAULT in that switch-case. It might be
> a broken QEMU or a broken kernel that is the root cause. I guess
> we would decide it is QEMU.
>
>> When it's due to programming error outside the program itself, say in
>> the kernel, I'd rather exit(1).  Reasonable people can differ in opinion
>> there.
>
> How it is better? Wouldn't both abruptly terminate the guest.

Note the context: when continuing would be unsafe.

>> What to do when a programming error is recoverable?
>> 
>> Developers testing the program may prefer to abort().  Users using the
>> program will very much prefer to recover.
>> 
>> assert() lets you code both behaviors controlled by macro NDEBUG.
>> Requires disciplined use of assert().  
>
> Exactly the reason I have said 'assert' situation. But IMHO assert does
> not fit because with NDEBUG it does literally nothing and thus remains
> silent.

Yes.  More on that below.

>> Not the case for QEMU; code
>> routinely (and tacitly!) relies on NDEBUG being off.  The consensus
>> among QEMU developers is that friends don't let friends compile QEMU
>> with NDEBUG.
>
> Can you expand on this? I do understand the words, but I don't
> understand the background and the reason why?

If you're confident your release builds are basically free of errors,
checking assertions would be a waste, so you compile with NDEBUG.  We
are not confident, so we don't.

You seem to have a different thing in mind, namely something like

    if (oopsie(!good)) {
        recovery code
    }

where oopsie() always logs, and optionally crashes[*].  This can be
useful, although testing the recovery code is hard, which doesn't
exactly inspire confidence.  Anyway, we're not using this technique in
QEMU.

>> My final argument against aborting on this particular error is local
>> consistency.  I see 14 calls of ioctl() in this file.
>
> This file is hw/intc/s390_flic_kvm.c, right?

Yes.  Should've said that.

>                                              But the lipsticked
> abort() is called in hw/s390x/css.c in function css_clear_io_interrupt().

Right.  One step up a call chain.

>> One handles all errors silently: kvm_s390_flic_realize()'s probe of
>> KVM_DEV_FLIC_CLEAR_IO_IRQ.  Okay.
>> 
>> Two complain to stderr and continue: flic_enable_pfault(),
>> flic_disable_wait_pfault().  One complains via trace and continues:
>> kvm_s390_flic_reset().
>> 
>> Aside: why the difference?  Also, consider warn_report() or
>> info_report() for complaining to stderr, and consider to show
>> strerror(errno).
>> 
>> One reports an error to stderr and fails: kvm_flic_ais_pre_save().
>> Okay.
>> 
>> Aside: I suspect reporting to stderr is wrong, because some of its
>> callers use the Error API.  Not this code's fault, the callback you're
>> implementing doesn't have an Error ** parameter.
>> 
>> One sets an Error and fails: kvm_s390_flic_realize().  Okay.
>> 
>> The remainder return -errno.  I lack the time to follow their call chain
>> to examine how failure is handled.  But one of them caught my eye, or
>> rather its caller did:
>> 
>>     static int __get_all_irqs(KVMS390FLICState *flic,
>>                               void **buf, int len)
>>     {
>>         int r;
>> 
>>         do {
>>             /* returns -ENOMEM if buffer is too small and number
>>              * of queued interrupts on success */
>>             r = flic_get_all_irqs(flic, *buf, len);
>>             if (r >= 0) {
>>                 break;
>>             }
>>             len *= 2;
>>             *buf = g_try_realloc(*buf, len);
>>             if (!buf) {
>>                 return -ENOMEM;
>>             }
>>         } while (r == -ENOMEM && len <= KVM_S390_FLIC_MAX_BUFFER);
>> 
>>         return r;
>>     }
>> 
>> This treats *all* ioctl() errors as if they were -ENOMEM.  Feels sloppy
>> to me.
>> 
>> Back to my consistency argument.  None of the other ioctl() failures I
>> inspected leads to abort().  
>
> I believe a failure of fsc->register_io_adapter() can lead to an abort
> in css_register_io_adapters(), as the errp passed in is &error_abort.

Yes, one of its two call chains passes &error_abort.

>> The failures that seem most similar to me
>> complain to stderr or trace, then continue.  Shouldn't this one, too?
>
> My intention was to reify the assumption that the member clear_io_irq()
> of any implementation of S390FLICStateClass is either telling us that
> this operation is not available by returning -ENOSYS, or it should
> do the job. This 'if not available return -ENOSYS is to my best knowledge
> locally consistent.
>
> The function kvm_s390_clear_io_flic() sits in 
> hw/intc/s390_flic_kvm.c handles unexpected errors which as we discussed
> are most likely either due to a programming error in QEMU or a programming
> error in the kernel by returning the -errno. Which is IMHO locally consistent.
>
> I did not check in detail , but I do agree from remembrance that most most of 
> the
> call chains most likely will not lead to an abort on an unexpected failure of 
> the
> ioctl(). 

Plausible.

> So in the end I do buy your argument that css_clear_io_interrupt() is
> not consistent with its environment. But is the more prevalent practice
> actually better? If we indeed ever happened to hit the case in
> question, I would prefer to have a dump, and try to figure out
> based on that if QEMU messed up, over not having one unless it is at
> a cost of an important wrokload. As I have said I have no strong
> opinion. 

"Always crash & dump core" is the current behavior for this particular
unexpected error, and possibly some related ones.

> Just logging a warning and carrying on is from my perspective
> also a sane thing to do. I even agree, if this situation were to happen
> in a production environment, I would prefer not bringing down the guest.

Current behavior for other related unexpected errors.

> But on the other hand, that abort() can be seen as a means of reducing
> the probability of seeing the situation in production by increasing the
> chance to catch the hypothetical bug before it ships. In the end I'm
> fine either way. 
>
> In my opinion the best way to deal with such situations would be to
> abort() in test/development and log a warning in production. Of course

Understand, but...

> assert() wouldn't give me that, and it wouldn't be locally consistent at
> all.

... nothing behaves like that so far.

Let's try to come to a conclusion.  We can either keep the current
behavior, i.e. abort().  Or we change it to just print something.

If we want the latter: fprintf() to stderr, warn_report(), or trace
point?

You are the maintainer, so the decision is yours.

I could stick a patch into a series of error-related cleanup patches I'm
working on.


[*] I'm rather fond of the trick to have oopsie() fork & crash.


Reply via email to