If you're interested in how I found this problem, it was done using 'perf report -a -g' & flamegraphs. This is the flamegraph of qemu (on the host) when the guest is running the parallel compile:
http://oirase.annexia.org/tmp/qemu-riscv.svg If you click into 'CPU_0/TCG' at the bottom left (all the vCPUs basically act alike), and then go to 'cpu_get_tb_cpu_state' you can see the call to 'object_dynamic_cast_assert' taking considerable time. If you zoom out, hit Ctrl F and type 'object_dynamic_cast_assert' into the search box then the flamegraph will tell you this call takes about 6.6% of total time (not all, but most, attributable to the call from 'cpu_get_tb_cpu_state' -> 'object_dynamic_cast_assert'). There are several other issues in the flamegraph which I'm trying to address, but this was the simplest one. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html