Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 19/01/2015 14:10, Pavel Dovgaluk wrote: Because 'A' is written only inside some of the replay_run_event callbacks. It depends on type of the event and it's processing function inside the QEMU core. There could be no 'A' at all. Why can't that code write the 'E' as well? Because such callbacks do not know that they are called from record/replay event. They may be called from record/replay code and from other parts of QEMU. And they may write save something low-level like timer request. I do not understand. Can you make an example with actual function names and data that is written? I haven't found any actual examples. Probably they occurred in previous version which wrote virtual timers into the log. Anyway I will add the mutex then. Pavel Dovgalyuk
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
On 19/01/2015 13:03, Pavel Dovgaluk wrote: It will work for protecting the events list (I've already did this). But that will not work for protecting the log file. replay_run_event can write some data to the log. And also some other function like replay_checkpoint can also write to the log simultaneously (if we will remove the global mutex). That's why we cannot simply replace the global mutex with some kind of local one. That's why you have to document very well the architecture and the locking policy. For example, why can't replay_run_event (or something that it calls) take the replay lock locally, when it writes to the log? If you do not document the architecture, we run in circles with me asking possibly stupid questions and you thinking that my questions are stupid. :) Paolo
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 19/01/2015 13:03, Pavel Dovgaluk wrote: It will work for protecting the events list (I've already did this). But that will not work for protecting the log file. replay_run_event can write some data to the log. And also some other function like replay_checkpoint can also write to the log simultaneously (if we will remove the global mutex). That's why we cannot simply replace the global mutex with some kind of local one. That's why you have to document very well the architecture and the locking policy. Ok. For example, why can't replay_run_event (or something that it calls) take the replay lock locally, when it writes to the log? replay_run_event can take the lock. Suppose that it writes data 'A'. replay_run_event itself corresponds to some event 'E'. We expect that the following sequence of the events should occur: 'E', 'A'. But if something will be written to the log between 'E' and 'A' then replay_run_event in replay mode will stuck, because it will not see its data 'A'. Pavel Dovgalyuk
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 19/01/2015 13:43, Pavel Dovgaluk wrote: For example, why can't replay_run_event (or something that it calls) take the replay lock locally, when it writes to the log? replay_run_event can take the lock. Suppose that it writes data 'A'. replay_run_event itself corresponds to some event 'E'. We expect that the following sequence of the events should occur: 'E', 'A'. But if something will be written to the log between 'E' and 'A' then replay_run_event in replay mode will stuck, because it will not see its data 'A'. It would be easier if you pointed me to actual code in the series. But this doesn't seem impossible to fix by atomically writing the 'E' and 'A' in the same critical section. Because 'A' is written only inside some of the replay_run_event callbacks. It depends on type of the event and it's processing function inside the QEMU core. There could be no 'A' at all. Pavel Dovgalyuk
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 19/01/2015 14:01, Pavel Dovgaluk wrote: It would be easier if you pointed me to actual code in the series. But this doesn't seem impossible to fix by atomically writing the 'E' and 'A' in the same critical section. Because 'A' is written only inside some of the replay_run_event callbacks. It depends on type of the event and it's processing function inside the QEMU core. There could be no 'A' at all. Why can't that code write the 'E' as well? Because such callbacks do not know that they are called from record/replay event. They may be called from record/replay code and from other parts of QEMU. And they may write save something low-level like timer request. Pavel Dovgalyuk
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
On 19/01/2015 14:10, Pavel Dovgaluk wrote: Because 'A' is written only inside some of the replay_run_event callbacks. It depends on type of the event and it's processing function inside the QEMU core. There could be no 'A' at all. Why can't that code write the 'E' as well? Because such callbacks do not know that they are called from record/replay event. They may be called from record/replay code and from other parts of QEMU. And they may write save something low-level like timer request. I do not understand. Can you make an example with actual function names and data that is written? Paolo
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
On 19/01/2015 13:43, Pavel Dovgaluk wrote: For example, why can't replay_run_event (or something that it calls) take the replay lock locally, when it writes to the log? replay_run_event can take the lock. Suppose that it writes data 'A'. replay_run_event itself corresponds to some event 'E'. We expect that the following sequence of the events should occur: 'E', 'A'. But if something will be written to the log between 'E' and 'A' then replay_run_event in replay mode will stuck, because it will not see its data 'A'. It would be easier if you pointed me to actual code in the series. But this doesn't seem impossible to fix by atomically writing the 'E' and 'A' in the same critical section. This is for example how QMP events are thread-safe: it just adds the JSON string corresponding to the QMP event atomically in monitor_puts. Paolo
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
On 19/01/2015 14:01, Pavel Dovgaluk wrote: It would be easier if you pointed me to actual code in the series. But this doesn't seem impossible to fix by atomically writing the 'E' and 'A' in the same critical section. Because 'A' is written only inside some of the replay_run_event callbacks. It depends on type of the event and it's processing function inside the QEMU core. There could be no 'A' at all. Why can't that code write the 'E' as well? Paolo
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 16/01/2015 09:03, Pavel Dovgaluk wrote: There is one problem with protecting log file inside the replay code. We probably should have the following sequence of calls: lock_replay_mutex replay_save_events replay_run_event unlock_replay_mutex But replay_run_event can also generate some log events and we will have deadlock here. That is why we rely on global mutex. I think replay_run_event should run with the lock _not_ taken, that is: qemu_mutex_lock(lock); while (!QTAILQ_EMPTY(events_list)) { Event *event = QTAILQ_FIRST(events_list); QTAILQ_REMOVE(events_list, event, events); qemu_mutex_unlock(lock); replay_run_event(event); g_free(event); qemu_mutex_lock(lock); } qemu_mutex_unlock(lock); It will work for protecting the events list (I've already did this). But that will not work for protecting the log file. replay_run_event can write some data to the log. And also some other function like replay_checkpoint can also write to the log simultaneously (if we will remove the global mutex). That's why we cannot simply replace the global mutex with some kind of local one. Pavel Dovgalyuk
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
On 16/01/2015 09:03, Pavel Dovgaluk wrote: There is one problem with protecting log file inside the replay code. We probably should have the following sequence of calls: lock_replay_mutex replay_save_events replay_run_event unlock_replay_mutex But replay_run_event can also generate some log events and we will have deadlock here. That is why we rely on global mutex. I think replay_run_event should run with the lock _not_ taken, that is: qemu_mutex_lock(lock); while (!QTAILQ_EMPTY(events_list)) { Event *event = QTAILQ_FIRST(events_list); QTAILQ_REMOVE(events_list, event, events); qemu_mutex_unlock(lock); replay_run_event(event); g_free(event); qemu_mutex_lock(lock); } qemu_mutex_unlock(lock); Paolo
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 13/01/2015 10:21, Pavel Dovgaluk wrote: +/*! Reads next clock event from the input. */ +int64_t replay_read_clock(unsigned int kind) +{ +if (kind = REPLAY_CLOCK_COUNT) { +fprintf(stderr, invalid clock ID %d for replay\n, kind); +exit(1); +} + +replay_exec_instructions(); + +if (replay_file) { +if (skip_async_events(EVENT_CLOCK + kind)) { +replay_read_next_clock(kind); +} +int64_t ret = replay_state.cached_clock[kind]; + +return ret; +} + +fprintf(stderr, REPLAY INTERNAL ERROR %d\n, __LINE__); +exit(1); +} Is this thread safe? It is, because order of main_loop and cpu_exec executions is protected by global mutex. Please document exactly which globals are protected by the rr QemuMutex, and which by the global mutex. But I think as many variables as possible should be protected by the rr QemuMutex, for two reasons: 1) people are working on threaded TCG. While SMP record/replay is a whole different story, even UP TCG will be multithreaded. 2) in the end it makes reviewing the code simpler. There is one problem with protecting log file inside the replay code. We probably should have the following sequence of calls: lock_replay_mutex replay_save_events replay_run_event unlock_replay_mutex But replay_run_event can also generate some log events and we will have deadlock here. That is why we rely on global mutex. Pavel Dovgalyuk
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 12/01/2015 13:00, Pavel Dovgalyuk wrote: +/*! Reads next clock event from the input. */ +int64_t replay_read_clock(unsigned int kind) +{ +if (kind = REPLAY_CLOCK_COUNT) { +fprintf(stderr, invalid clock ID %d for replay\n, kind); +exit(1); +} + +replay_exec_instructions(); + +if (replay_file) { +if (skip_async_events(EVENT_CLOCK + kind)) { +replay_read_next_clock(kind); +} +int64_t ret = replay_state.cached_clock[kind]; + +return ret; +} + +fprintf(stderr, REPLAY INTERNAL ERROR %d\n, __LINE__); +exit(1); +} Is this thread safe? It is, because order of main_loop and cpu_exec executions is protected by global mutex. Please introduce the record/replay QemuMutex in patch 4, and all along the series document which variables it protects. record/replay only uses mutex for events queue, because events may be added to the queue by the different threads. Pavel Dovgalyuk
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
On 13/01/2015 10:21, Pavel Dovgaluk wrote: +/*! Reads next clock event from the input. */ +int64_t replay_read_clock(unsigned int kind) +{ +if (kind = REPLAY_CLOCK_COUNT) { +fprintf(stderr, invalid clock ID %d for replay\n, kind); +exit(1); +} + +replay_exec_instructions(); + +if (replay_file) { +if (skip_async_events(EVENT_CLOCK + kind)) { +replay_read_next_clock(kind); +} +int64_t ret = replay_state.cached_clock[kind]; + +return ret; +} + +fprintf(stderr, REPLAY INTERNAL ERROR %d\n, __LINE__); +exit(1); +} Is this thread safe? It is, because order of main_loop and cpu_exec executions is protected by global mutex. Please document exactly which globals are protected by the rr QemuMutex, and which by the global mutex. But I think as many variables as possible should be protected by the rr QemuMutex, for two reasons: 1) people are working on threaded TCG. While SMP record/replay is a whole different story, even UP TCG will be multithreaded. 2) in the end it makes reviewing the code simpler. Paolo
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
From: Paolo Bonzini [mailto:pbonz...@redhat.com] On 12/01/2015 13:00, Pavel Dovgalyuk wrote: diff --git a/replay/replay.h b/replay/replay.h index 90a949b..1c18c0e 100755 --- a/replay/replay.h +++ b/replay/replay.h @@ -16,6 +16,16 @@ #include stdint.h #include qapi-types.h +/* replay clock kinds */ +/* rdtsc */ +#define REPLAY_CLOCK_REAL_TICKS 0 +/* host_clock */ +#define REPLAY_CLOCK_HOST 1 +/* virtual_rt_clock */ +#define REPLAY_CLOCK_VIRTUAL_RT 2 + +#define REPLAY_CLOCK_COUNT 3 + extern ReplayMode replay_mode; extern char *replay_image_suffix; @@ -47,6 +57,19 @@ bool replay_interrupt(void); Returns true, when interrupt request is pending */ bool replay_has_interrupt(void); +/* Processing clocks and other time sources */ + +/*! Save the specified clock */ +int64_t replay_save_clock(unsigned int kind, int64_t clock); +/*! Read the specified clock from the log or return cached data */ +int64_t replay_read_clock(unsigned int kind); +/*! Saves or reads the clock depending on the current replay mode. */ +#define REPLAY_CLOCK(clock, value) \ +(replay_mode == REPLAY_MODE_PLAY ? replay_read_clock((clock)) \ +: replay_mode == REPLAY_MODE_RECORD \ +? replay_save_clock((clock), (value)) \ +: (value)) + Inline functions please, not macros. Macro is required here, because I do not want the value to be computed in replay mode at all. Pavel Dovgalyuk
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
On 12/01/2015 13:00, Pavel Dovgalyuk wrote: +/*! Reads next clock event from the input. */ +int64_t replay_read_clock(unsigned int kind) +{ +if (kind = REPLAY_CLOCK_COUNT) { +fprintf(stderr, invalid clock ID %d for replay\n, kind); +exit(1); +} + +replay_exec_instructions(); + +if (replay_file) { +if (skip_async_events(EVENT_CLOCK + kind)) { +replay_read_next_clock(kind); +} +int64_t ret = replay_state.cached_clock[kind]; + +return ret; +} + +fprintf(stderr, REPLAY INTERNAL ERROR %d\n, __LINE__); +exit(1); +} Is this thread safe? Please introduce the record/replay QemuMutex in patch 4, and all along the series document which variables it protects. Paolo diff --git a/replay/replay.h b/replay/replay.h index 90a949b..1c18c0e 100755 --- a/replay/replay.h +++ b/replay/replay.h @@ -16,6 +16,16 @@ #include stdint.h #include qapi-types.h +/* replay clock kinds */ +/* rdtsc */ +#define REPLAY_CLOCK_REAL_TICKS 0 +/* host_clock */ +#define REPLAY_CLOCK_HOST 1 +/* virtual_rt_clock */ +#define REPLAY_CLOCK_VIRTUAL_RT 2 + +#define REPLAY_CLOCK_COUNT 3 + extern ReplayMode replay_mode; extern char *replay_image_suffix; @@ -47,6 +57,19 @@ bool replay_interrupt(void); Returns true, when interrupt request is pending */ bool replay_has_interrupt(void); +/* Processing clocks and other time sources */ + +/*! Save the specified clock */ +int64_t replay_save_clock(unsigned int kind, int64_t clock); +/*! Read the specified clock from the log or return cached data */ +int64_t replay_read_clock(unsigned int kind); +/*! Saves or reads the clock depending on the current replay mode. */ +#define REPLAY_CLOCK(clock, value) \ +(replay_mode == REPLAY_MODE_PLAY ? replay_read_clock((clock)) \ +: replay_mode == REPLAY_MODE_RECORD \ +? replay_save_clock((clock), (value)) \ +: (value)) + Inline functions please, not macros.
Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks
On 12/01/2015 13:43, Pavel Dovgaluk wrote: +#define REPLAY_CLOCK(clock, value) \ +(replay_mode == REPLAY_MODE_PLAY ? replay_read_clock((clock)) \ +: replay_mode == REPLAY_MODE_RECORD \ +? replay_save_clock((clock), (value)) \ +: (value)) + Inline functions please, not macros. Macro is required here, because I do not want the value to be computed in replay mode at all. Indeed you're right. Paolo