On Tue, Feb 14, 2012 at 13:17, Stefan Hajnoczi <stefa...@gmail.com> wrote: > On Tue, Feb 14, 2012 at 11:38 AM, Alex Barcelo <abarc...@ac.upc.edu> wrote: >> On Tue, Feb 14, 2012 at 09:33, Stefan Hajnoczi <stefa...@gmail.com> wrote: >>> On Mon, Feb 13, 2012 at 04:11:15PM +0100, Alex Barcelo wrote: >>>> This new implementation... well, it seems to work (I have done an >>>> ubuntu installation with a cdrom and a qcow drive, which seems to use >>>> quite a lot of coroutines). Of course I have done the coroutine-test >>>> and it was OK. But... I wasn't confident enough to propose it as a >>>> "mature alternative". And I don't have any performance benchmark, >>>> which would be interesting. So, I thought that the better option would >>>> be to send this patch to the developers as an alternative to ucontext. >>> >>> As a starting point, I suggest looking at >>> test-coroutine.c:perf_lifecycle(). It's a simple create-and-then-enter >>> benchmark which measures the latency of doing this. I expect you will >>> find performance is identical to the ucontext version because the >>> coroutine should be pooled and created using sigaltstack only once. >>> >>> The interesting thing would be to benchmark ucontext coroutine creation >>> against sigaltstack. Even then it may not matter much as long as pooled >>> coroutines are used most of the time. >> >> Didn't see the performance mode for test-coroutine. Now a benchmark >> test it's easy (it's half-done). The lifecycle is not a good >> benchmark, because sigaltstack is only called once. (As you said, the >> timing change in less than 1%). >> >> I thought that it would be interesting to add a performance test for >> nesting (which can be coroutine creation intensive). So I did it. I >> will send as a patch, is simple but it works for this. >> >> The preliminary results are: >> ucontext (traditional) method: >> MSG: Nesting 1000000 iterations of 100000 depth each: 0.452988 s >> >> sigaltstack (new) method: >> MSG: Nesting 1000000 iterations of 100000 depth each: 0.689649 s > > Plase run the tests with more iterations. The execution time should > be several seconds to reduce any scheduler impact or other hickups. I > suggest scaling iterations up to around 10 seconds.
Ok, 10.2s vs 10.5s (still wins the traditional ucontext, but it doesn't seem relevant any more). >> The sigaltstack is worse (well, it doesn't surprise me, it's more >> complicated and does more jumps and is a code flow more erratic). But >> a loss in efficiency in coroutines should not be important (how many >> coroutines are created in a typical qemu-system execution? I'm >> thinking "one"). Also as you said ;) pooled coroutines are used most >> of the time, in real qemu-system execution. > > No, a lot of coroutines are created - each parallel disk I/O request > involves a coroutine. Coroutines are also being used in other > subsystems (e.g. virtfs). > > Hopefully the number active coroutines is still <100 but it's definitely >1. I put a "Hello world, look, I'm in a coroutine" printf inside the coroutine creation function, and I have only seen it twice in a normal qemu-system execution. And I was doubting.