On Thu, Apr 18, 2024 at 2:24 PM Jan Beulich <jbeul...@suse.com> wrote:
>
> On 08.04.2024 18:26, Ross Lagerwall wrote:
> > In a test, OVMF reported an error initializing the RTC without
> > indicating the precise nature of the error. The only plausible
> > explanation I can find is as follows:
> >
> > As part of the initialization, OVMF reads register C and then reads
> > register A repatedly until the UIP flag is not set. If this takes longer
> > than 100 ms, OVMF fails and reports an error. This may happen with the
> > following sequence of events:
> >
> > At guest time=0s, rtc_init() calls check_update_timer() which schedules
> > update_timer for t=(1 - 244us).
> >
> > At t=1s, the update_timer function happens to have been called >= 244us
> > late.
>
> Any theory on why this timer runs so late? (It needs dealing with anyway,
> but I'm still curious.)

I don't know specifically why our testcase spotted it. I reproduced the
issue by repeatedly running "xl debug-key h" so that a lot of time was
spent writing to the serial console. That caused timer interrupts to
run late but I suppose there could be a lot of reasons why timer
interrupts occasionally run late (e.g. a live patch critical region).

>
> > In the timer callback, it sets the UIP flag and schedules
> > update_timer2 for t=1s.
> >
> > Before update_timer2 runs, the guest reads register C which calls
> > check_update_timer(). check_update_timer() stops the scheduled
> > update_timer2 and since the guest time is now outside of the update
> > cycle, it schedules update_timer for t=(2 - 244us).
> >
> > The UIP flag will therefore be set for a whole second from t=1 to t=2
> > while the guest repeatedly reads register A waiting for the UIP flag to
> > clear. Fix it by clearing the UIP flag when scheduling update_timer.
>
> I can accept this as a workaround (perhaps with a tweak, see below), but
> I wonder whether we couldn't do this in a less ad hoc way. If stop_timer()
> returned whether the timer was actually stopped, couldn't the clearing of
> UIP be made conditional upon that?

I think that would work though I'd need to spend some time convincing
myself that it doesn't introduce any other race conditions. I'm not
convinced it is really any better than just unconditionally clearing
the flag though.

>
> > --- a/xen/arch/x86/hvm/rtc.c
> > +++ b/xen/arch/x86/hvm/rtc.c
> > @@ -202,6 +202,7 @@ static void check_update_timer(RTCState *s)
> >          }
> >          else
> >          {
> > +            s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
> >              next_update_time = (USEC_PER_SEC - guest_usec - 244) * 
> > NS_PER_USEC;
> >              expire_time = NOW() + next_update_time;
> >              s->next_update_time = expire_time;
>
> Is rtc_update_timer2() clearing UIP then still necessary, ahead of calling
> here? Hmm, yes, it is; the question is rather why the function calls
> check_update_timer() when all that'll do (due to UF now being set) is stop
> the other timer (in case it's also running) and clear ->use_timer.
>

Are you suggesting something like this?

diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 4bb1c7505534..eb4901bf129a 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -240,7 +240,8 @@ static void cf_check rtc_update_timer2(void *opaque)
         s->hw.cmos_data[RTC_REG_C] |= RTC_UF;
         s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
         rtc_update_irq(s);
-        check_update_timer(s);
+        stop_timer(&s->update_timer);
+        s->use_timer = 0;
     }
     spin_unlock(&s->lock);
 }

That may indeed be an improvement but I'm not sure it is really related
to this patch?

Thanks,
Ross

Reply via email to