Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-19 Thread Pavel Dovgaluk
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 19/01/2015 14:10, Pavel Dovgaluk wrote:
   
Because 'A' is written only inside some of the replay_run_event 
callbacks.
It depends on type of the event and it's processing function inside 
the QEMU core.
There could be no 'A' at all.
  
   Why can't that code write the 'E' as well?
  Because such callbacks do not know that they are called from record/replay 
  event.
  They may be called from record/replay code and from other parts of QEMU.
  And they may write save something low-level like timer request.
 
 I do not understand.  Can you make an example with actual function names
 and data that is written?

I haven't found any actual examples. Probably they occurred in previous version 
which wrote
virtual timers into the log.
Anyway I will add the mutex then.

Pavel Dovgalyuk




Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-19 Thread Paolo Bonzini
On 19/01/2015 13:03, Pavel Dovgaluk wrote:
 It will work for protecting the events list (I've already did this).
 But that will not work for protecting the log file.
 replay_run_event can write some data to the log. And also some other function 
 like replay_checkpoint
 can also write to the log simultaneously (if we will remove the global mutex).
 That's why we cannot simply replace the global mutex with some kind of local 
 one.

That's why you have to document very well the architecture and the
locking policy.  For example, why can't replay_run_event (or something
that it calls) take the replay lock locally, when it writes to the log?

If you do not document the architecture, we run in circles with me
asking possibly stupid questions and you thinking that my questions are
stupid. :)

Paolo



Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-19 Thread Pavel Dovgaluk
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 19/01/2015 13:03, Pavel Dovgaluk wrote:
  It will work for protecting the events list (I've already did this).
  But that will not work for protecting the log file.
  replay_run_event can write some data to the log. And also some other 
  function like
 replay_checkpoint
  can also write to the log simultaneously (if we will remove the global 
  mutex).
  That's why we cannot simply replace the global mutex with some kind of 
  local one.
 
 That's why you have to document very well the architecture and the
 locking policy.  

Ok.

 For example, why can't replay_run_event (or something
 that it calls) take the replay lock locally, when it writes to the log?

replay_run_event can take the lock. Suppose that it writes data 'A'.
replay_run_event itself corresponds to some event 'E'.
We expect that the following sequence of the events should occur: 'E', 'A'.
But if something will be written to the log between 'E' and 'A' then
replay_run_event in replay mode will stuck, because it will not see its data 
'A'.


Pavel Dovgalyuk




Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-19 Thread Pavel Dovgaluk
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 19/01/2015 13:43, Pavel Dovgaluk wrote:
   For example, why can't replay_run_event (or something
   that it calls) take the replay lock locally, when it writes to the log?
 
  replay_run_event can take the lock. Suppose that it writes data 'A'.
  replay_run_event itself corresponds to some event 'E'.
  We expect that the following sequence of the events should occur: 'E', 'A'.
  But if something will be written to the log between 'E' and 'A' then
  replay_run_event in replay mode will stuck, because it will not see its 
  data 'A'.
 
 It would be easier if you pointed me to actual code in the series.  But
 this doesn't seem impossible to fix by atomically writing the 'E' and
 'A' in the same critical section.

Because 'A' is written only inside some of the replay_run_event callbacks.
It depends on type of the event and it's processing function inside the QEMU 
core.
There could be no 'A' at all.

Pavel Dovgalyuk




Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-19 Thread Pavel Dovgaluk
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 19/01/2015 14:01, Pavel Dovgaluk wrote:
   It would be easier if you pointed me to actual code in the series.  But
   this doesn't seem impossible to fix by atomically writing the 'E' and
   'A' in the same critical section.
 
  Because 'A' is written only inside some of the replay_run_event callbacks.
  It depends on type of the event and it's processing function inside the 
  QEMU core.
  There could be no 'A' at all.
 
 Why can't that code write the 'E' as well?

Because such callbacks do not know that they are called from record/replay 
event.
They may be called from record/replay code and from other parts of QEMU.
And they may write save something low-level like timer request.

Pavel Dovgalyuk




Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-19 Thread Paolo Bonzini


On 19/01/2015 14:10, Pavel Dovgaluk wrote:
  
   Because 'A' is written only inside some of the replay_run_event 
   callbacks.
   It depends on type of the event and it's processing function inside the 
   QEMU core.
   There could be no 'A' at all.
  
  Why can't that code write the 'E' as well?
 Because such callbacks do not know that they are called from record/replay 
 event.
 They may be called from record/replay code and from other parts of QEMU.
 And they may write save something low-level like timer request.

I do not understand.  Can you make an example with actual function names
and data that is written?

Paolo



Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-19 Thread Paolo Bonzini


On 19/01/2015 13:43, Pavel Dovgaluk wrote:
  For example, why can't replay_run_event (or something
  that it calls) take the replay lock locally, when it writes to the log?

 replay_run_event can take the lock. Suppose that it writes data 'A'.
 replay_run_event itself corresponds to some event 'E'.
 We expect that the following sequence of the events should occur: 'E', 'A'.
 But if something will be written to the log between 'E' and 'A' then
 replay_run_event in replay mode will stuck, because it will not see its data 
 'A'.

It would be easier if you pointed me to actual code in the series.  But
this doesn't seem impossible to fix by atomically writing the 'E' and
'A' in the same critical section.

This is for example how QMP events are thread-safe: it just adds the
JSON string corresponding to the QMP event atomically in monitor_puts.

Paolo



Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-19 Thread Paolo Bonzini


On 19/01/2015 14:01, Pavel Dovgaluk wrote:
  It would be easier if you pointed me to actual code in the series.  But
  this doesn't seem impossible to fix by atomically writing the 'E' and
  'A' in the same critical section.

 Because 'A' is written only inside some of the replay_run_event callbacks.
 It depends on type of the event and it's processing function inside the QEMU 
 core.
 There could be no 'A' at all.

Why can't that code write the 'E' as well?

Paolo



Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-19 Thread Pavel Dovgaluk
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 16/01/2015 09:03, Pavel Dovgaluk wrote:
  There is one problem with protecting log file inside the replay code.
  We probably should have the following sequence of calls:
 
  lock_replay_mutex
  replay_save_events
  replay_run_event
  unlock_replay_mutex
 
  But replay_run_event can also generate some log events and we will have 
  deadlock here.
  That is why we rely on global mutex.
 
 I think replay_run_event should run with the lock _not_ taken, that is:
 
 qemu_mutex_lock(lock);
 while (!QTAILQ_EMPTY(events_list)) {
 Event *event = QTAILQ_FIRST(events_list);
 QTAILQ_REMOVE(events_list, event, events);
 qemu_mutex_unlock(lock);
 replay_run_event(event);
 g_free(event);
 qemu_mutex_lock(lock);
 }
 qemu_mutex_unlock(lock);

It will work for protecting the events list (I've already did this).
But that will not work for protecting the log file.
replay_run_event can write some data to the log. And also some other function 
like replay_checkpoint
can also write to the log simultaneously (if we will remove the global mutex).
That's why we cannot simply replace the global mutex with some kind of local 
one.

Pavel Dovgalyuk




Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-19 Thread Paolo Bonzini


On 16/01/2015 09:03, Pavel Dovgaluk wrote:
 There is one problem with protecting log file inside the replay code.
 We probably should have the following sequence of calls:
 
 lock_replay_mutex
 replay_save_events
 replay_run_event
 unlock_replay_mutex
 
 But replay_run_event can also generate some log events and we will have 
 deadlock here.
 That is why we rely on global mutex.

I think replay_run_event should run with the lock _not_ taken, that is:

qemu_mutex_lock(lock);
while (!QTAILQ_EMPTY(events_list)) {
Event *event = QTAILQ_FIRST(events_list);
QTAILQ_REMOVE(events_list, event, events);
qemu_mutex_unlock(lock);
replay_run_event(event);
g_free(event);
qemu_mutex_lock(lock);
}
qemu_mutex_unlock(lock);

Paolo



Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-16 Thread Pavel Dovgaluk
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 13/01/2015 10:21, Pavel Dovgaluk wrote:
  +/*! Reads next clock event from the input. */
+int64_t replay_read_clock(unsigned int kind)
+{
+if (kind = REPLAY_CLOCK_COUNT) {
+fprintf(stderr, invalid clock ID %d for replay\n, kind);
+exit(1);
+}
+
+replay_exec_instructions();
+
+if (replay_file) {
+if (skip_async_events(EVENT_CLOCK + kind)) {
+replay_read_next_clock(kind);
+}
+int64_t ret = replay_state.cached_clock[kind];
+
+return ret;
+}
+
+fprintf(stderr, REPLAY INTERNAL ERROR %d\n, __LINE__);
+exit(1);
+}
  
   Is this thread safe?
  It is, because order of main_loop and cpu_exec executions is protected
  by global mutex.
 
 
 Please document exactly which globals are protected by the rr QemuMutex,
 and which by the global mutex.  But I think as many variables as
 possible should be protected by the rr QemuMutex, for two reasons:
 
 1) people are working on threaded TCG.  While SMP record/replay is a
 whole different story, even UP TCG will be multithreaded.
 
 2) in the end it makes reviewing the code simpler.

There is one problem with protecting log file inside the replay code.
We probably should have the following sequence of calls:

lock_replay_mutex
replay_save_events
replay_run_event
unlock_replay_mutex

But replay_run_event can also generate some log events and we will have 
deadlock here.
That is why we rely on global mutex.

Pavel Dovgalyuk




Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-13 Thread Pavel Dovgaluk
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 12/01/2015 13:00, Pavel Dovgalyuk wrote:
  +/*! Reads next clock event from the input. */
  +int64_t replay_read_clock(unsigned int kind)
  +{
  +if (kind = REPLAY_CLOCK_COUNT) {
  +fprintf(stderr, invalid clock ID %d for replay\n, kind);
  +exit(1);
  +}
  +
  +replay_exec_instructions();
  +
  +if (replay_file) {
  +if (skip_async_events(EVENT_CLOCK + kind)) {
  +replay_read_next_clock(kind);
  +}
  +int64_t ret = replay_state.cached_clock[kind];
  +
  +return ret;
  +}
  +
  +fprintf(stderr, REPLAY INTERNAL ERROR %d\n, __LINE__);
  +exit(1);
  +}
 
 Is this thread safe?  

It is, because order of main_loop and cpu_exec executions is protected
by global mutex.

 Please introduce the record/replay QemuMutex in
 patch 4, and all along the series document which variables it protects.

record/replay only uses mutex for events queue, because events may be 
added to the queue by the different threads.

Pavel Dovgalyuk




Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-13 Thread Paolo Bonzini


On 13/01/2015 10:21, Pavel Dovgaluk wrote:
 +/*! Reads next clock event from the input. */
   +int64_t replay_read_clock(unsigned int kind)
   +{
   +if (kind = REPLAY_CLOCK_COUNT) {
   +fprintf(stderr, invalid clock ID %d for replay\n, kind);
   +exit(1);
   +}
   +
   +replay_exec_instructions();
   +
   +if (replay_file) {
   +if (skip_async_events(EVENT_CLOCK + kind)) {
   +replay_read_next_clock(kind);
   +}
   +int64_t ret = replay_state.cached_clock[kind];
   +
   +return ret;
   +}
   +
   +fprintf(stderr, REPLAY INTERNAL ERROR %d\n, __LINE__);
   +exit(1);
   +}
  
  Is this thread safe?  
 It is, because order of main_loop and cpu_exec executions is protected
 by global mutex.
 

Please document exactly which globals are protected by the rr QemuMutex,
and which by the global mutex.  But I think as many variables as
possible should be protected by the rr QemuMutex, for two reasons:

1) people are working on threaded TCG.  While SMP record/replay is a
whole different story, even UP TCG will be multithreaded.

2) in the end it makes reviewing the code simpler.

Paolo



Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-12 Thread Pavel Dovgaluk
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 12/01/2015 13:00, Pavel Dovgalyuk wrote:
  diff --git a/replay/replay.h b/replay/replay.h
  index 90a949b..1c18c0e 100755
  --- a/replay/replay.h
  +++ b/replay/replay.h
  @@ -16,6 +16,16 @@
   #include stdint.h
   #include qapi-types.h
 
  +/* replay clock kinds */
  +/* rdtsc */
  +#define REPLAY_CLOCK_REAL_TICKS 0
  +/* host_clock */
  +#define REPLAY_CLOCK_HOST   1
  +/* virtual_rt_clock */
  +#define REPLAY_CLOCK_VIRTUAL_RT 2
  +
  +#define REPLAY_CLOCK_COUNT  3
  +
   extern ReplayMode replay_mode;
   extern char *replay_image_suffix;
 
  @@ -47,6 +57,19 @@ bool replay_interrupt(void);
   Returns true, when interrupt request is pending */
   bool replay_has_interrupt(void);
 
  +/* Processing clocks and other time sources */
  +
  +/*! Save the specified clock */
  +int64_t replay_save_clock(unsigned int kind, int64_t clock);
  +/*! Read the specified clock from the log or return cached data */
  +int64_t replay_read_clock(unsigned int kind);
  +/*! Saves or reads the clock depending on the current replay mode. */
  +#define REPLAY_CLOCK(clock, value)  \
  +(replay_mode == REPLAY_MODE_PLAY ? replay_read_clock((clock))   \
  +: replay_mode == REPLAY_MODE_RECORD \
  +? replay_save_clock((clock), (value))   \
  +: (value))
  +
 
 Inline functions please, not macros.

Macro is required here, because I do not want the value to be computed in
replay mode at all.

Pavel Dovgalyuk




Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-12 Thread Paolo Bonzini


On 12/01/2015 13:00, Pavel Dovgalyuk wrote:
 +/*! Reads next clock event from the input. */
 +int64_t replay_read_clock(unsigned int kind)
 +{
 +if (kind = REPLAY_CLOCK_COUNT) {
 +fprintf(stderr, invalid clock ID %d for replay\n, kind);
 +exit(1);
 +}
 +
 +replay_exec_instructions();
 +
 +if (replay_file) {
 +if (skip_async_events(EVENT_CLOCK + kind)) {
 +replay_read_next_clock(kind);
 +}
 +int64_t ret = replay_state.cached_clock[kind];
 +
 +return ret;
 +}
 +
 +fprintf(stderr, REPLAY INTERNAL ERROR %d\n, __LINE__);
 +exit(1);
 +}

Is this thread safe?  Please introduce the record/replay QemuMutex in
patch 4, and all along the series document which variables it protects.

Paolo

 diff --git a/replay/replay.h b/replay/replay.h
 index 90a949b..1c18c0e 100755
 --- a/replay/replay.h
 +++ b/replay/replay.h
 @@ -16,6 +16,16 @@
  #include stdint.h
  #include qapi-types.h
  
 +/* replay clock kinds */
 +/* rdtsc */
 +#define REPLAY_CLOCK_REAL_TICKS 0
 +/* host_clock */
 +#define REPLAY_CLOCK_HOST   1
 +/* virtual_rt_clock */
 +#define REPLAY_CLOCK_VIRTUAL_RT 2
 +
 +#define REPLAY_CLOCK_COUNT  3
 +
  extern ReplayMode replay_mode;
  extern char *replay_image_suffix;
  
 @@ -47,6 +57,19 @@ bool replay_interrupt(void);
  Returns true, when interrupt request is pending */
  bool replay_has_interrupt(void);
  
 +/* Processing clocks and other time sources */
 +
 +/*! Save the specified clock */
 +int64_t replay_save_clock(unsigned int kind, int64_t clock);
 +/*! Read the specified clock from the log or return cached data */
 +int64_t replay_read_clock(unsigned int kind);
 +/*! Saves or reads the clock depending on the current replay mode. */
 +#define REPLAY_CLOCK(clock, value)  \
 +(replay_mode == REPLAY_MODE_PLAY ? replay_read_clock((clock))   \
 +: replay_mode == REPLAY_MODE_RECORD \
 +? replay_save_clock((clock), (value))   \
 +: (value))
 +

Inline functions please, not macros.



Re: [Qemu-devel] [RFC PATCH v7 12/21] replay: recording and replaying clock ticks

2015-01-12 Thread Paolo Bonzini


On 12/01/2015 13:43, Pavel Dovgaluk wrote:
   +#define REPLAY_CLOCK(clock, value) 
\
   +(replay_mode == REPLAY_MODE_PLAY ? replay_read_clock((clock))  
\
   +: replay_mode == REPLAY_MODE_RECORD
\
   +? replay_save_clock((clock), (value))  
\
   +: (value))
   +
  
  Inline functions please, not macros.
 Macro is required here, because I do not want the value to be computed in
 replay mode at all.

Indeed you're right.

Paolo