On Tue, Oct 11, 2022 at 2:06 PM Jason A. Donenfeld <ja...@zx2c4.com> wrote: > > On Tue, Oct 11, 2022 at 09:46:01AM +0300, Pavel Dovgalyuk wrote: > > On 10.10.2022 18:32, Peter Maydell wrote: > > > On Mon, 10 Oct 2022 at 16:21, Jason A. Donenfeld <ja...@zx2c4.com> wrote: > > >> > > >> On Mon, Oct 10, 2022 at 11:54:50AM +0100, Peter Maydell wrote: > > >>> The error is essentially the record-and-replay subsystem saying "the > > >>> replay just asked for a random number at point when the recording > > >>> did not ask for one, and so there's no 'this is what the number was' > > >>> info in the record". > > >>> > > >>> I have had a quick look, and I think the reason for this is that > > >>> load_snapshot() ("reset the VM state to the snapshot state stored in the > > >>> disk image or migration stream") does a system reset. The replay > > >>> process involves a lot of "load state from a snapshot and play > > >>> forwards from there" operations. It doesn't expect that load_snapshot() > > >>> would result in something reading random data, but now that we are > > >>> calling qemu_guest_getrandom() in a reset hook, that happens. > > >> > > >> Hmm... so this seems like a bug in the replay code then? Shouldn't that > > >> reset handler get hit during both passes, so the entry should be in > > >> each? > > > > > > No, because record is just > > > "reset the system, record all the way to the end stop", > > > but replay is > > > "set the system to the point we want to start at by using > > > load_snapshot, play from there", and depending on the actions > > > you do in the debugger like reverse-continue we might repeatedly > > > do "reload that snapshot (implying a system reset) and play from there" > > > multiple times. > > > > The idea of the patches is fdt randomization during reset, right? > > But reset is used not only for real reboot, but also for restoring the > > snapshots. > > In the latter case it is like "just clear the hw registers to simplify > > the initialization". > > Therefore no other virtual hardware tried to read external data yet. And > > random numbers are external to the machine, they come from the outer world. > > > > It means that this is completely new reset case and new solution should > > be found for it. > > Do you have any proposals for that?
Okay I've actually read your message like 6 times now and think I may have come up with something. Initial testing indicates it works well. I'll send a new series shortly. Jason