Hi,

This is the current state of my fixes for icount based record and
replay. It doesn't completely fix the problem (hence the RFC status)
but improves it to the point that I have been able to record and
replay the boot of a vexpress kernel.

The first 3 patches are helper scripts I've been using during my
debugging. The first is the only real fix and the following 2 should
probably be dropped from any pull request as they introduce new
features rather than fix something.

We then have another BQL fix for i386. I haven't had a chance to
replicate myself so far but it looks perfectly sane to me.

Finally the fixes for icount:

  cpus: remove icount handling from qemu_tcg_cpu_thread_fn

  Simple clean-up as we don't do icount for MTTCG

  cpus: check cpu->running in cpu_get_icount_raw()

  I'm not sure the race happens and once outside of cpu->running the
  icount counters should be zero. However it seems a sensible
  precaution.

  cpus: move icount preparation out of tcg_exec_cpu

  This is a little light re-factoring that stops the icount work
  getting in the way of the main bit of tcg_exec_cpu. It also removed
  some redundant assignment and replaced them with asserts for now.

  cpus: don't credit executed instructions before they have run

  This is the main one which ensures we never jump forward in time and
  cpu_get_icount_raw() remains consistent.

  replay: gracefully handle backward time events

  This is the most hand-wavey patch. It glosses over the disparity in
  time between the vCPU thread and the main-loop by jumping forward to
  the most current time value. However it is not really deterministic
  and runs into potential problems with sequencing of log events.

  I think a better fix would be to extend replay_lock() so all related
  log events are serialised and we don't end up with interleaved
  events from the vCPU thread and the main-loop.

I think the cpus: patches should probably go into the next
pull-request while we see if we can come up with a better final
solution for fixing record/replay. However given how long this
regression has run during the release candidate process I wanted to
update everyone on the current status and get feedback ASAP.

Cheers,


Alex Bennée (9):
  scripts/qemugdb/mtree.py: fix up mtree dump
  scripts/qemu-gdb/timers.py: new helper to dump timer state
  scripts/replay-dump.py: replay log dumper
  target/i386/misc_helper: wrap BQL around another IRQ generator
  cpus: remove icount handling from qemu_tcg_cpu_thread_fn
  cpus: check cpu->running in cpu_get_icount_raw()
  cpus: move icount preparation out of tcg_exec_cpu
  cpus: don't credit executed instructions before they have run
  replay: gracefully handle backward time events

 cpus.c                    |  94 +++++++++++-----
 include/qom/cpu.h         |   1 +
 replay/replay-internal.c  |   7 ++
 replay/replay.c           |   9 +-
 scripts/qemu-gdb.py       |   3 +-
 scripts/qemugdb/mtree.py  |  12 +-
 scripts/qemugdb/timers.py |  54 +++++++++
 scripts/replay-dump.py    | 272 ++++++++++++++++++++++++++++++++++++++++++++++
 target/i386/misc_helper.c |   3 +
 9 files changed, 423 insertions(+), 32 deletions(-)
 create mode 100644 scripts/qemugdb/timers.py
 create mode 100755 scripts/replay-dump.py

-- 
2.11.0


Reply via email to