On 28/11/2014 12:32, Peter Lieven wrote: > Am 28.11.2014 um 12:23 schrieb Paolo Bonzini: >> >> On 28/11/2014 12:21, Peter Lieven wrote: >>> Am 28.11.2014 um 12:14 schrieb Paolo Bonzini: >>>>> master: >>>>> Run operation 40000000 iterations 12.851414 s, 3112K operations/s, 321ns >>>>> per coroutine >>>>> >>>>> paolo: >>>>> Run operation 40000000 iterations 11.951720 s, 3346K operations/s, 298ns >>>>> per coroutine >>>> Nice. :) >>>> >>>> Can you please try "coroutine: Use __thread … " together, too? I still >>>> see 11% time spent in pthread_getspecific, and I get ~10% more indeed if >>>> I apply it here (my times are 191/160/145). >>> indeed: >>> >>> Run operation 40000000 iterations 10.138684 s, 3945K operations/s, 253ns >>> per coroutine >> Your perf_master2 uses the ring buffer unconditionally, right? I wonder >> if we can use a similar algorithm but with arrays instead of lists... > > Why do you set pool_size = 0 in the create path? > > When I do the following: > diff --git a/qemu-coroutine.c b/qemu-coroutine.c > index 6bee354..c79ee78 100644 > --- a/qemu-coroutine.c > +++ b/qemu-coroutine.c > @@ -44,7 +44,7 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry) > * and the actual size of alloc_pool. But it is just a > heuristic, > * it does not need to be perfect. > */ > - pool_size = 0; > + atomic_dec(&pool_size); > QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool); > co = QSLIST_FIRST(&alloc_pool); > > > I get: > Run operation 40000000 iterations 9.883958 s, 4046K operations/s, 247ns per > coroutine
Because pool_size is the (approximate) number of coroutines in the pool. It is zero after QSLIST_MOVE_ATOMIC has NULL-ed out release_pool.slh_first. Paolo