Re: Slowness with multi-thread TCG?

Alex Bennée Wed, 29 Jun 2022 09:06:54 -0700


Frederic Barrat <fbar...@linux.ibm.com> writes:


> On 29/06/2022 00:17, Alex Bennée wrote:
>> If you run the sync-profiler (via the HMP "sync-profile on") you can
>> then get a breakdown of which mutex's are being held and for how long
>> ("info sync-profile").
>
>
> Alex, a huge thank you!
>
> For the record, the "info sync-profile" showed:
> Type               Object  Call site                     Wait Time (s)
> Count  Average (us)
> --------------------------------------------------------------------------------------------------
> BQL mutex  0x55eb89425540  accel/tcg/cpu-exec.c:744           96.31578
> 73589937          1.31
> BQL mutex  0x55eb89425540  target/ppc/helper_regs.c:207        0.00150
> 1178          1.27
>
>
> And it points to a lock in the interrupt delivery path, in
> cpu_handle_interrupt().
>
> I now understand the root cause. The interrupt signal for the
> decrementer interrupt remains set because the interrupt is not being
> delivered, per the config. I'm not quite sure what the proper fix is
> yet (there seems to be several implementations of the decrementer on
> ppc), but at least I understand why we are so slow.

That sounds like a bug in the interrupt controller emulation. It should
not even be attempting to cpu_exit() and set cpu->interrupt_request
(which are TCG internals) unless the IRQ is unmasked. Usually when
updates are made to an emulated IRQ controller you re-calculate the
state and decide if an interrupt needs to be asserted to QEMU.

> With a quick hack, I could verify that by moving that signal out of
> the way, the decompression time of the kernel is now peanuts, no
> matter the number of cpus. Even with one cpu, the 15 seconds measured
> before was already a huge waste, so it was not really a multiple-cpus
> problem. Multiple cpus were just highlighting it.
>
> Thanks again!
>
>   Fred


-- 
Alex Bennée

Re: Slowness with multi-thread TCG?

Reply via email to