> >   
> > > +        for i in range(0, attempts):
> > > +            try:
> > > +                obj = self.qmp_monitor.cmd_obj(msg)
> > > +
> > > +                if obj and "return" in obj and not obj["return"]:
> > > +                    break
> > > +
> > > +            except Exception as e:                     # pylint: 
> > > disable=W0718
> > > +                print(f"Command: {command}")
> > > +                print(f"Failed to inject error: {e}.")
> > > +                obj = None
> > > +
> > > +            if attempts > 1:
> > > +                print(f"Error inject attempt {i + 1}/{attempts} failed.")
> > > +
> > > +            if i + 1 < attempts:
> > > +                sleep(0.1)  
> 
> ... and here, we sleep for 0.1 seconds.
> 
> > 
> > Do we care about a sleep at the end?  Feels like a micro optimization that
> > isn't needed.  
> 
> This is not a micro-optimization. It is more to ensure that we won't
> respin it too fast.
> 
> What happens is that QMP interface asks the BIOS to send an async
> message to OSPM, cleaning an ack register. When the OSPM reads the
> error, it writes 1 to the ack register.
> 
> If we send messages too fast, the logic at ghes.c will detect that
> the ack didn't happen, imediately returning an errocr code.
> 
> On such case, we sleep for 100ms before trying again.
I was suggesting the opposite.  Just sleep one more time at the end
before timing out.
So instead of
        if i + 1 < attempts
                sleep(0.1)

simply
        sleep(0.1)



> 
> In practice, on my Ryzen 9 machines with QEMU emulating ARM,
> even under massive error injection, 99% of the time no retries
> happen. The worse case scenario I got here is that sometimes
> Kernel got stuck and took between 5s to 10s to accept the error
> submission.
> 
> >   
> > > +
> > > +        if not obj:
> > >              return None  
> > 
> >   
> 


Reply via email to