On Thu, Jul 31, 2014 at 5:15 PM, Paolo Bonzini <pbonz...@redhat.com> wrote: > Il 31/07/2014 10:59, Ming Lei ha scritto: >>> > No guesses please. Actually that's also my guess, but since you are >>> > submitting the patch you must do better and show profiles where stack >>> > switching disappears after the patches. >> Follows the below hardware events reported by 'perf stat' when running >> fio randread benchmark for 2min in VM(single vq, 2 jobs): >> >> sudo ~/bin/perf stat -e >> L1-dcache-loads,L1-dcache-load-misses,cpu-cycles,instructions,branch-instructions,branch-misses,branch-loads,branch-load-misses,dTLB-loads,dTLB-load-misses >> ./nqemu-start-mq 4 1 >> >> 1), without bypassing coroutine via forcing to set 's->raw_format ' as >> false, see patch 5/15 >> >> - throughout: 95K >> 232,564,905,115 instructions >> 161.991075781 seconds time elapsed >> >> >> 2), with bypassing coroutinue >> - throughput: 115K >> 255,526,629,881 instructions >> 162.333465490 seconds time elapsed > > Ok, so you are saving 10% instructions per iop: before 232G / 95K = > 2.45M instructions/iop, 255G / 115K = 2.22M instructions/iop. > > That's not small, and it's a good thing for CPU utilization even if you > were not increasing iops. On top of this, can you provide the stack > traces to see the difference in the profiles?
Follows 'perf report' result on cycles event for with/without bypass coroutine: http://pastebin.com/ae0vnQ6V >From the profiling result, looks bdrv_co_do_preadv() is a bit slow without bypass coroutine. Thanks,