On Tue, Feb 14, 2012 at 11:38 AM, Alex Barcelo <abarc...@ac.upc.edu> wrote: > On Tue, Feb 14, 2012 at 09:33, Stefan Hajnoczi <stefa...@gmail.com> wrote: >> On Mon, Feb 13, 2012 at 04:11:15PM +0100, Alex Barcelo wrote: >>> This new implementation... well, it seems to work (I have done an >>> ubuntu installation with a cdrom and a qcow drive, which seems to use >>> quite a lot of coroutines). Of course I have done the coroutine-test >>> and it was OK. But... I wasn't confident enough to propose it as a >>> "mature alternative". And I don't have any performance benchmark, >>> which would be interesting. So, I thought that the better option would >>> be to send this patch to the developers as an alternative to ucontext. >> >> As a starting point, I suggest looking at >> test-coroutine.c:perf_lifecycle(). It's a simple create-and-then-enter >> benchmark which measures the latency of doing this. I expect you will >> find performance is identical to the ucontext version because the >> coroutine should be pooled and created using sigaltstack only once. >> >> The interesting thing would be to benchmark ucontext coroutine creation >> against sigaltstack. Even then it may not matter much as long as pooled >> coroutines are used most of the time. > > Didn't see the performance mode for test-coroutine. Now a benchmark > test it's easy (it's half-done). The lifecycle is not a good > benchmark, because sigaltstack is only called once. (As you said, the > timing change in less than 1%). > > I thought that it would be interesting to add a performance test for > nesting (which can be coroutine creation intensive). So I did it. I > will send as a patch, is simple but it works for this. > > The preliminary results are: > ucontext (traditional) method: > MSG: Nesting 1000000 iterations of 100000 depth each: 0.452988 s > > sigaltstack (new) method: > MSG: Nesting 1000000 iterations of 100000 depth each: 0.689649 s
Plase run the tests with more iterations. The execution time should be several seconds to reduce any scheduler impact or other hickups. I suggest scaling iterations up to around 10 seconds. > The sigaltstack is worse (well, it doesn't surprise me, it's more > complicated and does more jumps and is a code flow more erratic). But > a loss in efficiency in coroutines should not be important (how many > coroutines are created in a typical qemu-system execution? I'm > thinking "one"). Also as you said ;) pooled coroutines are used most > of the time, in real qemu-system execution. No, a lot of coroutines are created - each parallel disk I/O request involves a coroutine. Coroutines are also being used in other subsystems (e.g. virtfs). Hopefully the number active coroutines is still <100 but it's definitely >1. Stefan