On Thu, May 27, 2021 at 11:44 AM Kevin O'Connor <ke...@koconnor.net> wrote:
>
> The purpose of this code is to restore the NMI_DISABLE_BIT to what it
> was prior to call32_prep().  If something calls the bios without the
> NMI_DISABLE_BIT set, it's this code that makes sure SeaBIOS returns to
> that calling code with NMI_DISABLE_BIT also not set.
>
> If you've run into some bug, I think it would help if you could
> further describe that bug.
>
> Cheers,
> -Kevin

Hi Kevin,

Thank you for the explanation! Sorry, it seems I misunderstood this part of the
code as I thought every access to PORT_CMOS_DATA should be performed with NMIs
disabled. Maybe giving some of the background on the issue will help me
understand this a bit better, indeed!

This has been originally reported by some Ubuntu users running specific VMs on
older versions of seabios, where they would occasionally see KVM emulation
failures and VMs going into "PAUSED" state (and being unable to resume without a
full VM reboot afterwards). Inspecting the ASM dumps [0] on those VMs revealed
that the last actions performed were accesses to PORT_CMOS_DATA, and those
seemed to be caused by rtc_mask(). Since these were on old versions of seabios,
they looked like a result of our builds missing patch 3156b71a535e (rtc: Disable
NMI in rtc_mask()) [1], which we tried to address initially.

After providing new packages of seabios with the rtc_mask() patch, some users
noticed that a few VMs still continued to present similar symptoms, but with a
different ASM dump this time. This was also seen on "newer" versions of our
seabios packages based on upstream 1.10.2, which should already include the
rtc_mask() patches by default (git describe --contains reports this patch being
introduced with rel-1.9.0~47). These new failed instances lead us to believe
that call32_post() was the culprit, since the trapping instruction was still the
same access to PORT_CMOS_DATA which was "unguarded" by an NMI_DISABLE_BIT. We
then provided another package for testing, implementing the patch I've proposed
originally in this thread, and our users reported no further KVM emulation
failures.

Unfortunately, I'm not entirely sure what originally causes the KVM emulation
failures, as I've been unable to reproduce these issues in test VMs. Our users
reported that the commands below can trigger the emulation failures, but I have
no details on the exact platform those are running on or any details of the
specific file system in use:

root@vsfo-2[]:/root> date; fsfreeze --freeze /flash
root@vsfo-2[]:/root> date; dd if=/dev/zero of=/flash/test bs=1 count=0 seek=1G

In any case, I'm somewhat puzzled by CMOS port accesses causing KVM emulation
failures. Could it be that an NMI comes in between outb/inb and we end up trying
to read from a nonsense CMOS index?

If I understand it correctly, my proposed patch effectively turns off NMIs
unconditionally which sounds like it should cause horrible breakage. Could you
help me understand why that doesn't happen with the rtc_read/write/mask
functions in src/hw/rtc.c as well?

Hopefully the above helps contextualize the issue a bit better, Kevin. Apologies
for asking so many questions, but would you have any suggestions on how we could
try to get more information on the seabios side of this?

Many thanks for the help!
Heitor

[0] https://pastebin.ubuntu.com/p/4dYFCqPpxb/
[1] 
https://review.coreboot.org/plugins/gitiles/seabios/+/3156b71a535e661%5E%21/#F0
_______________________________________________
SeaBIOS mailing list -- seabios@seabios.org
To unsubscribe send an email to seabios-le...@seabios.org

Reply via email to