Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9

Alex Bennée Fri, 31 Mar 2017 06:15:07 -0700

Alex Bennée <alex.ben...@linaro.org> writes:

> Pavel Dovgalyuk <dovga...@ispras.ru> writes:
>
>>> From: mttcg-requ...@listserver.greensocs.com 
>>> [mailto:mttcg-requ...@listserver.greensocs.com]
>>> Pavel Dovgalyuk <dovga...@ispras.ru> writes:
>>> >> From: mttcg-requ...@listserver.greensocs.com [mailto:mttcg-
>>> requ...@listserver.greensocs.com]
>>> >> Pavel Dovgalyuk <dovga...@ispras.ru> writes:
>>> >> >> From: mttcg-requ...@listserver.greensocs.com [mailto:mttcg-
>>> >> requ...@listserver.greensocs.com]
>>> >> >> Pavel Dovgalyuk <dovga...@ispras.ru> writes:
>>> <snip>
>>> >> >> > I tested on vexpress-a9 platform with Debian wheezy.
>>> >> >>
>>> >> >> Thanks for that. I now have a test case that I can reproduce failures 
>>> >> >> on
>>> >> >> without needing graphics.
>>> >> >>
>>> >> >> I've been investigating if there are any problems with the timer
>>> >> >> processing now they have been moved into the TCG thread. The record
>>> >> >> stage seems to work fine but I'm having difficulty figuring out why
>>> >> >> playback freezes. It seems we get to a point where we are stuck 
>>> >> >> waiting
>>> >> >> for a suspiciously exact timer deadline:
>>> >> >
>>> >> > I've encountered the following behavior at replay stage:
>>> >> > - replay takes some instructions to execute (qemu_icount += counter)
>>> >> > - virtual timer is fired
>>> >>
>>> >> This is the virtual timer based on icount not the virtual rt timer? So
>>> >
>>> > Regular virtual timer. It's value is based on icount.
>>> > virtual_rt is used for internal icount purposes.
>>>
>>> And this is where the clock warps come in? The bias brings the
>>> virtual_rt time forward because execution is waiting for some external
>>> event to fire (e.g. a timer IRQ)?
>>
>> I guess so. But bias is not updated when the vCPU works.
>> vCPU thread updates only qemu_icount which is used for virtual clock 
>> calculation.
>>
>>> >> under the new scheme of being processed in the vCPU loop we should only
>>> >> fire this one once all execution is done (although you may exit the loop
>>> >> early before icount is exhausted).
>>> >
>>> > We should stop the vCPU before processing virtual timers because:
>>> > - virtual clock depends on instruction counter
>>> > - virtual timers may change virtual hardware state
>>>
>>> If we do the processing in the top of main vCPU loop this should be
>>> equivalent. The instruction counter cannot run because we haven't
>>> entered tcg_exec_cpu. We also process virtual timers in this thread
>>> outside the loop so nothing else can be poking the hardware state.
>>
>> This is how qemu worked in older version - it switched between
>> processing tasks (vCPU and timers) in one thread.
>
> The only real difference is the sequencing in the old case was protected
> by the BQL - now its my program order.
>
>> But how we'll join this behavior with the current design and MTTCG?
>
> Ignore MTTCG for now. We don't even try and run multiple-threads when
> icount is enabled. However the change in the BQL locking is what has
> triggered the failures.
>
> Anyway I think I'm getting closer to narrowing it down. On record I see
> replay_current_step jump backwards with this:
>
> /* A common event print, called when reading or saving an event */
> static void print_event(uint8_t event)
> {
>     static int event_count;
>     static uint64_t last_step;
>     uint64_t this_step = replay_get_current_step();
>
>     fprintf(stderr, "replay: event %d is %d @ step=%#" PRIx64 "\n",
>             event_count, event, this_step);
>
>     if (this_step < last_step) {
>         fprintf(stderr,"%s: !!! step %d went backwards 
> %#"PRIx64"=>%#"PRIx64"!!!\n",
>                 __func__, event_count, last_step, this_step);
>         abort();
>     }
>     last_step = this_step;
>     event_count++;
> }
>
> void replay_put_event(uint8_t event)
> {
>     assert(event < EVENT_COUNT);
>     replay_put_byte(event);
>     print_event(event);
> }
>
> The jump back is fairly consistent in my case (event 66 in the vexpress
> run) but I'm fairly sure that is the bug. I can't see any reason for the
> step count to go backwards - indeed that breaks the premise of .
>
> Unfortunately when I start putting break and watchpoints in to see how
> this jump back occurs the problem goes away until I disable them. So
> this very much looks like a race condition corrupting the icount data.


So this is a symptom of cpu_get_icount_raw(void) only accounting for in
progress instructions when in the vCPU context and the fact
timers_state.qemu_icount is "in credit" while the vCPU is running.

--
Alex Bennée

Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9

Reply via email to