Re: [PATCH] replay: synchronize on every virtual timer callback
On 21.05.2020 16:22, Paolo Bonzini wrote: On 06/05/20 10:17, Pavel Dovgalyuk wrote: Sometimes virtual timer callbacks depend on order of virtual timer processing and warping of virtual clock. Therefore every callback should be logged to make replay deterministic. This patch creates a checkpoint before every virtual timer callback. With these checkpoints virtual timers processing and clock warping events order is completely deterministic. Signed-off-by: Pavel Dovgalyuk --- util/qemu-timer.c |5 + 1 file changed, 5 insertions(+) diff --git a/util/qemu-timer.c b/util/qemu-timer.c index d548d3c1ad..47833f338f 100644 --- a/util/qemu-timer.c +++ b/util/qemu-timer.c @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_lock(_list->active_timers_lock); progress = true; +/* + * Callback may insert new checkpoints, therefore add new checkpoint + * for the virtual timers. + */ +need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL; You need to check replay_mode != REPLAY_MODE_NONE, either here or in the "if (need_replay_checkpoint)" above. If you choose the latter, you can remove the other "if (replay_mode != REPLAY_MODE_NONE)". I forgot about the changes that prohibited event processing for the virtual clock checkpoint. This allowed to make this part simpler, please check the new version. However, event processing still waits for refactoring. I'll do it after upstreaming the tests for the record/replay to prevent regression. Also, there is a comment that says that checkpointing "must only be done once since the clock value stays the same". Is that actually a "can" rather than a "must"? Should the central replay logic have something like a checkpoint count, that prevents adding back-to-back equal checkpoints? I don't really think that this happens very often, but it worth implementing. I'll try it later. Pavel Dovgalyuk
Re: [PATCH] replay: synchronize on every virtual timer callback
On 06/05/20 10:17, Pavel Dovgalyuk wrote: > Sometimes virtual timer callbacks depend on order > of virtual timer processing and warping of virtual clock. > Therefore every callback should be logged to make replay deterministic. > This patch creates a checkpoint before every virtual timer callback. > With these checkpoints virtual timers processing and clock warping > events order is completely deterministic. > > Signed-off-by: Pavel Dovgalyuk > --- > util/qemu-timer.c |5 + > 1 file changed, 5 insertions(+) > > diff --git a/util/qemu-timer.c b/util/qemu-timer.c > index d548d3c1ad..47833f338f 100644 > --- a/util/qemu-timer.c > +++ b/util/qemu-timer.c > @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) > qemu_mutex_lock(_list->active_timers_lock); > > progress = true; > +/* > + * Callback may insert new checkpoints, therefore add new checkpoint > + * for the virtual timers. > + */ > +need_replay_checkpoint = timer_list->clock->type == > QEMU_CLOCK_VIRTUAL; You need to check replay_mode != REPLAY_MODE_NONE, either here or in the "if (need_replay_checkpoint)" above. If you choose the latter, you can remove the other "if (replay_mode != REPLAY_MODE_NONE)". Also, there is a comment that says that checkpointing "must only be done once since the clock value stays the same". Is that actually a "can" rather than a "must"? Should the central replay logic have something like a checkpoint count, that prevents adding back-to-back equal checkpoints? Thanks, Paolo > } > qemu_mutex_unlock(_list->active_timers_lock); > >
Re: [PATCH] replay: synchronize on every virtual timer callback
On 20.05.2020 10:18, Philippe Mathieu-Daudé wrote: +Cleber On 5/20/20 8:54 AM, Pavel Dovgalyuk wrote: On 19.05.2020 18:42, Philippe Mathieu-Daudé wrote: On 5/19/20 12:38 PM, Pavel Dovgalyuk wrote: On 19.05.2020 13:32, Alex Bennée wrote: Pavel Dovgalyuk writes: On 19.05.2020 11:11, Alex Bennée wrote: Pavel Dovgalyuk writes: On 18.05.2020 18:56, Alex Bennée wrote: Philippe Mathieu-Daudé writes: + Alex On 5/6/20 10:17 AM, Pavel Dovgalyuk wrote: Sometimes virtual timer callbacks depend on order of virtual timer processing and warping of virtual clock. Therefore every callback should be logged to make replay deterministic. This patch creates a checkpoint before every virtual timer callback. With these checkpoints virtual timers processing and clock warping events order is completely deterministic. Signed-off-by: Pavel Dovgalyuk --- util/qemu-timer.c | 5 + 1 file changed, 5 insertions(+) diff --git a/util/qemu-timer.c b/util/qemu-timer.c index d548d3c1ad..47833f338f 100644 --- a/util/qemu-timer.c +++ b/util/qemu-timer.c @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_lock(_list->active_timers_lock); progress = true; + /* + * Callback may insert new checkpoints, therefore add new checkpoint + * for the virtual timers. + */ + need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL; } qemu_mutex_unlock(_list->active_timers_lock); So the problem I have with this as with all the record/replay stuff I need want to review is it's very hard to see things in action. I added a *very* basic record/replay test to the aarch64 softmmu tests but they won't exercise any of this code because no timers get fired. I'm assuming the sort of tests that is really needed is something that not only causes QEMU_CLOCK_VIRTUAL timers to fire and trigger logged HW events and ensure that things don't get confused in the process. I encounter most of the bugs in different OS boot scenarios. We also have internal tests that include some computational, disk, and network interaction tasks. Is it possible to add a test like booting a "real" OS and replaying it? Yes - for these bigger more complex setups we should use the acceptance tests that run under Avocado. See "make check-acceptance". I've installed avocado and avocado-framework, but got the following error: venv/bin/python: No module named avocado Hmm make check-acceptance should automatically setup local copies of avocado using virtualenv. You shouldn't need to install the system version. What should I try then? My workflow running selected tests is: $ git clone qemu $ mkdir qemu/build $ cd qemu/build qemu/build$ ../configure qemu/build$ make arm-softmmu/all qemu/build$ make check-venv qemu/build$ tests/venv/bin/python -m avocado \ --show=app,console -t machine:virt \ run tests/acceptance/ 'make check-acceptance' runs all the tests for the available QEMU targets built. It should call check-venv automatically. Thanks. Download has started with these command lines. Good news! But usually I run configure directly from the source directory. Could it be the cause of the failure? To be honest last time I ran ./configure from source directory was more than 2 years ago. The acceptance CI testing use out-of-tree build. I'm surprised it didn't worked as expected for you, because when Cleber implemented it, he was using in-tree builds. Maybe it bit-rotten since. I'm not interested in trying/debugging/maintaining it, but if you think it is worthwhile, I'll first simply add a job testing in-tree acceptance, shot it to Travis and see. Maybe that copy has some garbage. I started with new cloned repository and in-tree build, and everything was ok.
Re: [PATCH] replay: synchronize on every virtual timer callback
+Cleber On 5/20/20 8:54 AM, Pavel Dovgalyuk wrote: On 19.05.2020 18:42, Philippe Mathieu-Daudé wrote: On 5/19/20 12:38 PM, Pavel Dovgalyuk wrote: On 19.05.2020 13:32, Alex Bennée wrote: Pavel Dovgalyuk writes: On 19.05.2020 11:11, Alex Bennée wrote: Pavel Dovgalyuk writes: On 18.05.2020 18:56, Alex Bennée wrote: Philippe Mathieu-Daudé writes: + Alex On 5/6/20 10:17 AM, Pavel Dovgalyuk wrote: Sometimes virtual timer callbacks depend on order of virtual timer processing and warping of virtual clock. Therefore every callback should be logged to make replay deterministic. This patch creates a checkpoint before every virtual timer callback. With these checkpoints virtual timers processing and clock warping events order is completely deterministic. Signed-off-by: Pavel Dovgalyuk --- util/qemu-timer.c | 5 + 1 file changed, 5 insertions(+) diff --git a/util/qemu-timer.c b/util/qemu-timer.c index d548d3c1ad..47833f338f 100644 --- a/util/qemu-timer.c +++ b/util/qemu-timer.c @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_lock(_list->active_timers_lock); progress = true; + /* + * Callback may insert new checkpoints, therefore add new checkpoint + * for the virtual timers. + */ + need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL; } qemu_mutex_unlock(_list->active_timers_lock); So the problem I have with this as with all the record/replay stuff I need want to review is it's very hard to see things in action. I added a *very* basic record/replay test to the aarch64 softmmu tests but they won't exercise any of this code because no timers get fired. I'm assuming the sort of tests that is really needed is something that not only causes QEMU_CLOCK_VIRTUAL timers to fire and trigger logged HW events and ensure that things don't get confused in the process. I encounter most of the bugs in different OS boot scenarios. We also have internal tests that include some computational, disk, and network interaction tasks. Is it possible to add a test like booting a "real" OS and replaying it? Yes - for these bigger more complex setups we should use the acceptance tests that run under Avocado. See "make check-acceptance". I've installed avocado and avocado-framework, but got the following error: venv/bin/python: No module named avocado Hmm make check-acceptance should automatically setup local copies of avocado using virtualenv. You shouldn't need to install the system version. What should I try then? My workflow running selected tests is: $ git clone qemu $ mkdir qemu/build $ cd qemu/build qemu/build$ ../configure qemu/build$ make arm-softmmu/all qemu/build$ make check-venv qemu/build$ tests/venv/bin/python -m avocado \ --show=app,console -t machine:virt \ run tests/acceptance/ 'make check-acceptance' runs all the tests for the available QEMU targets built. It should call check-venv automatically. Thanks. Download has started with these command lines. Good news! But usually I run configure directly from the source directory. Could it be the cause of the failure? To be honest last time I ran ./configure from source directory was more than 2 years ago. The acceptance CI testing use out-of-tree build. I'm surprised it didn't worked as expected for you, because when Cleber implemented it, he was using in-tree builds. Maybe it bit-rotten since. I'm not interested in trying/debugging/maintaining it, but if you think it is worthwhile, I'll first simply add a job testing in-tree acceptance, shot it to Travis and see. Pavel Dovgalyuk
Re: [PATCH] replay: synchronize on every virtual timer callback
On 19.05.2020 18:42, Philippe Mathieu-Daudé wrote: On 5/19/20 12:38 PM, Pavel Dovgalyuk wrote: On 19.05.2020 13:32, Alex Bennée wrote: Pavel Dovgalyuk writes: On 19.05.2020 11:11, Alex Bennée wrote: Pavel Dovgalyuk writes: On 18.05.2020 18:56, Alex Bennée wrote: Philippe Mathieu-Daudé writes: + Alex On 5/6/20 10:17 AM, Pavel Dovgalyuk wrote: Sometimes virtual timer callbacks depend on order of virtual timer processing and warping of virtual clock. Therefore every callback should be logged to make replay deterministic. This patch creates a checkpoint before every virtual timer callback. With these checkpoints virtual timers processing and clock warping events order is completely deterministic. Signed-off-by: Pavel Dovgalyuk --- util/qemu-timer.c | 5 + 1 file changed, 5 insertions(+) diff --git a/util/qemu-timer.c b/util/qemu-timer.c index d548d3c1ad..47833f338f 100644 --- a/util/qemu-timer.c +++ b/util/qemu-timer.c @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_lock(_list->active_timers_lock); progress = true; + /* + * Callback may insert new checkpoints, therefore add new checkpoint + * for the virtual timers. + */ + need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL; } qemu_mutex_unlock(_list->active_timers_lock); So the problem I have with this as with all the record/replay stuff I need want to review is it's very hard to see things in action. I added a *very* basic record/replay test to the aarch64 softmmu tests but they won't exercise any of this code because no timers get fired. I'm assuming the sort of tests that is really needed is something that not only causes QEMU_CLOCK_VIRTUAL timers to fire and trigger logged HW events and ensure that things don't get confused in the process. I encounter most of the bugs in different OS boot scenarios. We also have internal tests that include some computational, disk, and network interaction tasks. Is it possible to add a test like booting a "real" OS and replaying it? Yes - for these bigger more complex setups we should use the acceptance tests that run under Avocado. See "make check-acceptance". I've installed avocado and avocado-framework, but got the following error: venv/bin/python: No module named avocado Hmm make check-acceptance should automatically setup local copies of avocado using virtualenv. You shouldn't need to install the system version. What should I try then? My workflow running selected tests is: $ git clone qemu $ mkdir qemu/build $ cd qemu/build qemu/build$ ../configure qemu/build$ make arm-softmmu/all qemu/build$ make check-venv qemu/build$ tests/venv/bin/python -m avocado \ --show=app,console -t machine:virt \ run tests/acceptance/ 'make check-acceptance' runs all the tests for the available QEMU targets built. It should call check-venv automatically. Thanks. Download has started with these command lines. But usually I run configure directly from the source directory. Could it be the cause of the failure? Pavel Dovgalyuk
Re: [PATCH] replay: synchronize on every virtual timer callback
On 5/19/20 12:38 PM, Pavel Dovgalyuk wrote: On 19.05.2020 13:32, Alex Bennée wrote: Pavel Dovgalyuk writes: On 19.05.2020 11:11, Alex Bennée wrote: Pavel Dovgalyuk writes: On 18.05.2020 18:56, Alex Bennée wrote: Philippe Mathieu-Daudé writes: + Alex On 5/6/20 10:17 AM, Pavel Dovgalyuk wrote: Sometimes virtual timer callbacks depend on order of virtual timer processing and warping of virtual clock. Therefore every callback should be logged to make replay deterministic. This patch creates a checkpoint before every virtual timer callback. With these checkpoints virtual timers processing and clock warping events order is completely deterministic. Signed-off-by: Pavel Dovgalyuk --- util/qemu-timer.c | 5 + 1 file changed, 5 insertions(+) diff --git a/util/qemu-timer.c b/util/qemu-timer.c index d548d3c1ad..47833f338f 100644 --- a/util/qemu-timer.c +++ b/util/qemu-timer.c @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_lock(_list->active_timers_lock); progress = true; + /* + * Callback may insert new checkpoints, therefore add new checkpoint + * for the virtual timers. + */ + need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL; } qemu_mutex_unlock(_list->active_timers_lock); So the problem I have with this as with all the record/replay stuff I need want to review is it's very hard to see things in action. I added a *very* basic record/replay test to the aarch64 softmmu tests but they won't exercise any of this code because no timers get fired. I'm assuming the sort of tests that is really needed is something that not only causes QEMU_CLOCK_VIRTUAL timers to fire and trigger logged HW events and ensure that things don't get confused in the process. I encounter most of the bugs in different OS boot scenarios. We also have internal tests that include some computational, disk, and network interaction tasks. Is it possible to add a test like booting a "real" OS and replaying it? Yes - for these bigger more complex setups we should use the acceptance tests that run under Avocado. See "make check-acceptance". I've installed avocado and avocado-framework, but got the following error: venv/bin/python: No module named avocado Hmm make check-acceptance should automatically setup local copies of avocado using virtualenv. You shouldn't need to install the system version. What should I try then? My workflow running selected tests is: $ git clone qemu $ mkdir qemu/build $ cd qemu/build qemu/build$ ../configure qemu/build$ make arm-softmmu/all qemu/build$ make check-venv qemu/build$ tests/venv/bin/python -m avocado \ --show=app,console -t machine:virt \ run tests/acceptance/ 'make check-acceptance' runs all the tests for the available QEMU targets built. It should call check-venv automatically.
Re: [PATCH] replay: synchronize on every virtual timer callback
On 19.05.2020 13:32, Alex Bennée wrote: Pavel Dovgalyuk writes: On 19.05.2020 11:11, Alex Bennée wrote: Pavel Dovgalyuk writes: On 18.05.2020 18:56, Alex Bennée wrote: Philippe Mathieu-Daudé writes: + Alex On 5/6/20 10:17 AM, Pavel Dovgalyuk wrote: Sometimes virtual timer callbacks depend on order of virtual timer processing and warping of virtual clock. Therefore every callback should be logged to make replay deterministic. This patch creates a checkpoint before every virtual timer callback. With these checkpoints virtual timers processing and clock warping events order is completely deterministic. Signed-off-by: Pavel Dovgalyuk --- util/qemu-timer.c |5 + 1 file changed, 5 insertions(+) diff --git a/util/qemu-timer.c b/util/qemu-timer.c index d548d3c1ad..47833f338f 100644 --- a/util/qemu-timer.c +++ b/util/qemu-timer.c @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_lock(_list->active_timers_lock); progress = true; +/* + * Callback may insert new checkpoints, therefore add new checkpoint + * for the virtual timers. + */ +need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL; } qemu_mutex_unlock(_list->active_timers_lock); So the problem I have with this as with all the record/replay stuff I need want to review is it's very hard to see things in action. I added a *very* basic record/replay test to the aarch64 softmmu tests but they won't exercise any of this code because no timers get fired. I'm assuming the sort of tests that is really needed is something that not only causes QEMU_CLOCK_VIRTUAL timers to fire and trigger logged HW events and ensure that things don't get confused in the process. I encounter most of the bugs in different OS boot scenarios. We also have internal tests that include some computational, disk, and network interaction tasks. Is it possible to add a test like booting a "real" OS and replaying it? Yes - for these bigger more complex setups we should use the acceptance tests that run under Avocado. See "make check-acceptance". I've installed avocado and avocado-framework, but got the following error: venv/bin/python: No module named avocado Hmm make check-acceptance should automatically setup local copies of avocado using virtualenv. You shouldn't need to install the system version. What should I try then?
Re: [PATCH] replay: synchronize on every virtual timer callback
Pavel Dovgalyuk writes: > On 19.05.2020 11:11, Alex Bennée wrote: >> Pavel Dovgalyuk writes: >> >>> On 18.05.2020 18:56, Alex Bennée wrote: Philippe Mathieu-Daudé writes: > + Alex > > On 5/6/20 10:17 AM, Pavel Dovgalyuk wrote: >> Sometimes virtual timer callbacks depend on order >> of virtual timer processing and warping of virtual clock. >> Therefore every callback should be logged to make replay deterministic. >> This patch creates a checkpoint before every virtual timer callback. >> With these checkpoints virtual timers processing and clock warping >> events order is completely deterministic. >> Signed-off-by: Pavel Dovgalyuk >> --- >> util/qemu-timer.c |5 + >> 1 file changed, 5 insertions(+) >> diff --git a/util/qemu-timer.c b/util/qemu-timer.c >> index d548d3c1ad..47833f338f 100644 >> --- a/util/qemu-timer.c >> +++ b/util/qemu-timer.c >> @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) >> qemu_mutex_lock(_list->active_timers_lock); >> progress = true; >> +/* >> + * Callback may insert new checkpoints, therefore add new >> checkpoint >> + * for the virtual timers. >> + */ >> +need_replay_checkpoint = timer_list->clock->type == >> QEMU_CLOCK_VIRTUAL; >> } >> qemu_mutex_unlock(_list->active_timers_lock); So the problem I have with this as with all the record/replay stuff I need want to review is it's very hard to see things in action. I added a *very* basic record/replay test to the aarch64 softmmu tests but they won't exercise any of this code because no timers get fired. I'm assuming the sort of tests that is really needed is something that not only causes QEMU_CLOCK_VIRTUAL timers to fire and trigger logged HW events and ensure that things don't get confused in the process. >>> I encounter most of the bugs in different OS boot scenarios. >>> >>> We also have internal tests that include some computational, disk, and >>> network interaction tasks. >>> >>> Is it possible to add a test like booting a "real" OS and replaying >>> it? >> Yes - for these bigger more complex setups we should use the acceptance >> tests that run under Avocado. See "make check-acceptance". > > I've installed avocado and avocado-framework, but got the following error: > > venv/bin/python: No module named avocado Hmm make check-acceptance should automatically setup local copies of avocado using virtualenv. You shouldn't need to install the system version. > >> If I read up the file I just get more questions than answers. For example why do we release the qemu_timers lock before processing the replay event? Is it that the replay event could cause another timer to >>> We release the lock, because accessing the replay module may process >>> some events and add more timers. >> OK. I guess the adding of the timer is a side effect of processing the >> event rather than something that gets added directly? > > Right. > > > Pavel Dovgalyuk -- Alex Bennée
Re: [PATCH] replay: synchronize on every virtual timer callback
On 19.05.2020 11:11, Alex Bennée wrote: Pavel Dovgalyuk writes: On 18.05.2020 18:56, Alex Bennée wrote: Philippe Mathieu-Daudé writes: + Alex On 5/6/20 10:17 AM, Pavel Dovgalyuk wrote: Sometimes virtual timer callbacks depend on order of virtual timer processing and warping of virtual clock. Therefore every callback should be logged to make replay deterministic. This patch creates a checkpoint before every virtual timer callback. With these checkpoints virtual timers processing and clock warping events order is completely deterministic. Signed-off-by: Pavel Dovgalyuk --- util/qemu-timer.c |5 + 1 file changed, 5 insertions(+) diff --git a/util/qemu-timer.c b/util/qemu-timer.c index d548d3c1ad..47833f338f 100644 --- a/util/qemu-timer.c +++ b/util/qemu-timer.c @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_lock(_list->active_timers_lock); progress = true; +/* + * Callback may insert new checkpoints, therefore add new checkpoint + * for the virtual timers. + */ +need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL; } qemu_mutex_unlock(_list->active_timers_lock); So the problem I have with this as with all the record/replay stuff I need want to review is it's very hard to see things in action. I added a *very* basic record/replay test to the aarch64 softmmu tests but they won't exercise any of this code because no timers get fired. I'm assuming the sort of tests that is really needed is something that not only causes QEMU_CLOCK_VIRTUAL timers to fire and trigger logged HW events and ensure that things don't get confused in the process. I encounter most of the bugs in different OS boot scenarios. We also have internal tests that include some computational, disk, and network interaction tasks. Is it possible to add a test like booting a "real" OS and replaying it? Yes - for these bigger more complex setups we should use the acceptance tests that run under Avocado. See "make check-acceptance". I've installed avocado and avocado-framework, but got the following error: venv/bin/python: No module named avocado If I read up the file I just get more questions than answers. For example why do we release the qemu_timers lock before processing the replay event? Is it that the replay event could cause another timer to We release the lock, because accessing the replay module may process some events and add more timers. OK. I guess the adding of the timer is a side effect of processing the event rather than something that gets added directly? Right. Pavel Dovgalyuk
Re: [PATCH] replay: synchronize on every virtual timer callback
Pavel Dovgalyuk writes: > On 18.05.2020 18:56, Alex Bennée wrote: >> Philippe Mathieu-Daudé writes: >> >>> + Alex >>> >>> On 5/6/20 10:17 AM, Pavel Dovgalyuk wrote: Sometimes virtual timer callbacks depend on order of virtual timer processing and warping of virtual clock. Therefore every callback should be logged to make replay deterministic. This patch creates a checkpoint before every virtual timer callback. With these checkpoints virtual timers processing and clock warping events order is completely deterministic. Signed-off-by: Pavel Dovgalyuk --- util/qemu-timer.c |5 + 1 file changed, 5 insertions(+) diff --git a/util/qemu-timer.c b/util/qemu-timer.c index d548d3c1ad..47833f338f 100644 --- a/util/qemu-timer.c +++ b/util/qemu-timer.c @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_lock(_list->active_timers_lock); progress = true; +/* + * Callback may insert new checkpoints, therefore add new checkpoint + * for the virtual timers. + */ +need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL; } qemu_mutex_unlock(_list->active_timers_lock); >> So the problem I have with this as with all the record/replay stuff I >> need want to review is it's very hard to see things in action. I added a >> *very* basic record/replay test to the aarch64 softmmu tests but they >> won't exercise any of this code because no timers get fired. I'm >> assuming the sort of tests that is really needed is something that not >> only causes QEMU_CLOCK_VIRTUAL timers to fire and trigger logged HW >> events and ensure that things don't get confused in the process. > > I encounter most of the bugs in different OS boot scenarios. > > We also have internal tests that include some computational, disk, and > network interaction tasks. > > Is it possible to add a test like booting a "real" OS and replaying > it? Yes - for these bigger more complex setups we should use the acceptance tests that run under Avocado. See "make check-acceptance". >> If I read up the file I just get more questions than answers. For >> example why do we release the qemu_timers lock before processing the >> replay event? Is it that the replay event could cause another timer to > > We release the lock, because accessing the replay module may process > some events and add more timers. OK. I guess the adding of the timer is a side effect of processing the event rather than something that gets added directly? -- Alex Bennée
Re: [PATCH] replay: synchronize on every virtual timer callback
On 18.05.2020 18:56, Alex Bennée wrote: Philippe Mathieu-Daudé writes: + Alex On 5/6/20 10:17 AM, Pavel Dovgalyuk wrote: Sometimes virtual timer callbacks depend on order of virtual timer processing and warping of virtual clock. Therefore every callback should be logged to make replay deterministic. This patch creates a checkpoint before every virtual timer callback. With these checkpoints virtual timers processing and clock warping events order is completely deterministic. Signed-off-by: Pavel Dovgalyuk --- util/qemu-timer.c |5 + 1 file changed, 5 insertions(+) diff --git a/util/qemu-timer.c b/util/qemu-timer.c index d548d3c1ad..47833f338f 100644 --- a/util/qemu-timer.c +++ b/util/qemu-timer.c @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_lock(_list->active_timers_lock); progress = true; +/* + * Callback may insert new checkpoints, therefore add new checkpoint + * for the virtual timers. + */ +need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL; } qemu_mutex_unlock(_list->active_timers_lock); So the problem I have with this as with all the record/replay stuff I need want to review is it's very hard to see things in action. I added a *very* basic record/replay test to the aarch64 softmmu tests but they won't exercise any of this code because no timers get fired. I'm assuming the sort of tests that is really needed is something that not only causes QEMU_CLOCK_VIRTUAL timers to fire and trigger logged HW events and ensure that things don't get confused in the process. I encounter most of the bugs in different OS boot scenarios. We also have internal tests that include some computational, disk, and network interaction tasks. Is it possible to add a test like booting a "real" OS and replaying it? If I read up the file I just get more questions than answers. For example why do we release the qemu_timers lock before processing the replay event? Is it that the replay event could cause another timer to We release the lock, because accessing the replay module may process some events and add more timers. be consumed? That seems suspect to me given we should only be expiring times in the run loop. Could the code be re-factored to use QEMU_LOCK_GUARD? It's hard to know and I really wouldn't want to try that re-factoring without some sort of confidence we were properly exercising the semantics of record/replay and alive to potential regressions. QEMU_LOCK_GUARD looks nice. But we'll still need unlock/lock pairs around checkpoint and timer callback. Please realise I do like the concept of record/replay and I'd love to get more features merged (like for example the reverse debug patches). However by it's very nature it gets it's fingers deeply intertwined with the main run loop and we really need to better exercise the code in our tests. FWIW you can have an: Acked-by: Alex Bennée Thanks. which means it doesn't look obviously broken to me and it doesn't seem to break the non-record/replay cases because that's all I can really test.
Re: [PATCH] replay: synchronize on every virtual timer callback
Philippe Mathieu-Daudé writes: > + Alex > > On 5/6/20 10:17 AM, Pavel Dovgalyuk wrote: >> Sometimes virtual timer callbacks depend on order >> of virtual timer processing and warping of virtual clock. >> Therefore every callback should be logged to make replay deterministic. >> This patch creates a checkpoint before every virtual timer callback. >> With these checkpoints virtual timers processing and clock warping >> events order is completely deterministic. >> Signed-off-by: Pavel Dovgalyuk >> --- >> util/qemu-timer.c |5 + >> 1 file changed, 5 insertions(+) >> diff --git a/util/qemu-timer.c b/util/qemu-timer.c >> index d548d3c1ad..47833f338f 100644 >> --- a/util/qemu-timer.c >> +++ b/util/qemu-timer.c >> @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) >> qemu_mutex_lock(_list->active_timers_lock); >> progress = true; >> +/* >> + * Callback may insert new checkpoints, therefore add new checkpoint >> + * for the virtual timers. >> + */ >> +need_replay_checkpoint = timer_list->clock->type == >> QEMU_CLOCK_VIRTUAL; >> } >> qemu_mutex_unlock(_list->active_timers_lock); So the problem I have with this as with all the record/replay stuff I need want to review is it's very hard to see things in action. I added a *very* basic record/replay test to the aarch64 softmmu tests but they won't exercise any of this code because no timers get fired. I'm assuming the sort of tests that is really needed is something that not only causes QEMU_CLOCK_VIRTUAL timers to fire and trigger logged HW events and ensure that things don't get confused in the process. If I read up the file I just get more questions than answers. For example why do we release the qemu_timers lock before processing the replay event? Is it that the replay event could cause another timer to be consumed? That seems suspect to me given we should only be expiring times in the run loop. Could the code be re-factored to use QEMU_LOCK_GUARD? It's hard to know and I really wouldn't want to try that re-factoring without some sort of confidence we were properly exercising the semantics of record/replay and alive to potential regressions. Please realise I do like the concept of record/replay and I'd love to get more features merged (like for example the reverse debug patches). However by it's very nature it gets it's fingers deeply intertwined with the main run loop and we really need to better exercise the code in our tests. FWIW you can have an: Acked-by: Alex Bennée which means it doesn't look obviously broken to me and it doesn't seem to break the non-record/replay cases because that's all I can really test. -- Alex Bennée
Re: [PATCH] replay: synchronize on every virtual timer callback
+ Alex On 5/6/20 10:17 AM, Pavel Dovgalyuk wrote: Sometimes virtual timer callbacks depend on order of virtual timer processing and warping of virtual clock. Therefore every callback should be logged to make replay deterministic. This patch creates a checkpoint before every virtual timer callback. With these checkpoints virtual timers processing and clock warping events order is completely deterministic. Signed-off-by: Pavel Dovgalyuk --- util/qemu-timer.c |5 + 1 file changed, 5 insertions(+) diff --git a/util/qemu-timer.c b/util/qemu-timer.c index d548d3c1ad..47833f338f 100644 --- a/util/qemu-timer.c +++ b/util/qemu-timer.c @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_lock(_list->active_timers_lock); progress = true; +/* + * Callback may insert new checkpoints, therefore add new checkpoint + * for the virtual timers. + */ +need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL; } qemu_mutex_unlock(_list->active_timers_lock);
Re: [PATCH] replay: synchronize on every virtual timer callback
ping On 06.05.2020 11:17, Pavel Dovgalyuk wrote: Sometimes virtual timer callbacks depend on order of virtual timer processing and warping of virtual clock. Therefore every callback should be logged to make replay deterministic. This patch creates a checkpoint before every virtual timer callback. With these checkpoints virtual timers processing and clock warping events order is completely deterministic. Signed-off-by: Pavel Dovgalyuk --- util/qemu-timer.c |5 + 1 file changed, 5 insertions(+) diff --git a/util/qemu-timer.c b/util/qemu-timer.c index d548d3c1ad..47833f338f 100644 --- a/util/qemu-timer.c +++ b/util/qemu-timer.c @@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) qemu_mutex_lock(_list->active_timers_lock); progress = true; +/* + * Callback may insert new checkpoints, therefore add new checkpoint + * for the virtual timers. + */ +need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL; } qemu_mutex_unlock(_list->active_timers_lock);