> >
> > > + for i in range(0, attempts):
> > > + try:
> > > + obj = self.qmp_monitor.cmd_obj(msg)
> > > +
> > > + if obj and "return" in obj and not obj["return"]:
> > > + break
> > > +
> > > + except Exception as e: # pylint:
> > > disable=W0718
> > > + print(f"Command: {command}")
> > > + print(f"Failed to inject error: {e}.")
> > > + obj = None
> > > +
> > > + if attempts > 1:
> > > + print(f"Error inject attempt {i + 1}/{attempts} failed.")
> > > +
> > > + if i + 1 < attempts:
> > > + sleep(0.1)
>
> ... and here, we sleep for 0.1 seconds.
>
> >
> > Do we care about a sleep at the end? Feels like a micro optimization that
> > isn't needed.
>
> This is not a micro-optimization. It is more to ensure that we won't
> respin it too fast.
>
> What happens is that QMP interface asks the BIOS to send an async
> message to OSPM, cleaning an ack register. When the OSPM reads the
> error, it writes 1 to the ack register.
>
> If we send messages too fast, the logic at ghes.c will detect that
> the ack didn't happen, imediately returning an errocr code.
>
> On such case, we sleep for 100ms before trying again.
I was suggesting the opposite. Just sleep one more time at the end
before timing out.
So instead of
if i + 1 < attempts
sleep(0.1)
simply
sleep(0.1)
>
> In practice, on my Ryzen 9 machines with QEMU emulating ARM,
> even under massive error injection, 99% of the time no retries
> happen. The worse case scenario I got here is that sometimes
> Kernel got stuck and took between 5s to 10s to accept the error
> submission.
>
> >
> > > +
> > > + if not obj:
> > > return None
> >
> >
>