Re: [Qemu-devel] [RFC PATCH 3/3] qemu-coroutine: use a ring per thread for the pool

Peter Lieven Fri, 28 Nov 2014 03:34:40 -0800

Am 28.11.2014 um 12:23 schrieb Paolo Bonzini:
>
> On 28/11/2014 12:21, Peter Lieven wrote:
>> Am 28.11.2014 um 12:14 schrieb Paolo Bonzini:
>>>> master:
>>>> Run operation 40000000 iterations 12.851414 s, 3112K operations/s, 321ns 
>>>> per coroutine
>>>>
>>>> paolo:
>>>> Run operation 40000000 iterations 11.951720 s, 3346K operations/s, 298ns 
>>>> per coroutine
>>> Nice. :)
>>>
>>> Can you please try "coroutine: Use __thread … " together, too?  I still
>>> see 11% time spent in pthread_getspecific, and I get ~10% more indeed if
>>> I apply it here (my times are 191/160/145).
>> indeed:
>>
>> Run operation 40000000 iterations 10.138684 s, 3945K operations/s, 253ns per 
>> coroutine
> Your perf_master2 uses the ring buffer unconditionally, right?  I wonder
> if we can use a similar algorithm but with arrays instead of lists...


Why do you set pool_size = 0 in the create path?

When I do the following:
diff --git a/qemu-coroutine.c b/qemu-coroutine.c
index 6bee354..c79ee78 100644
--- a/qemu-coroutine.c
+++ b/qemu-coroutine.c
@@ -44,7 +44,7 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
                  * and the actual size of alloc_pool.  But it is just a 
heuristic,
                  * it does not need to be perfect.
                  */
-                pool_size = 0;
+                atomic_dec(&pool_size);
                 QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool);
                 co = QSLIST_FIRST(&alloc_pool);


I get:
Run operation 40000000 iterations 9.883958 s, 4046K operations/s, 247ns per 
coroutine

Re: [Qemu-devel] [RFC PATCH 3/3] qemu-coroutine: use a ring per thread for the pool

Reply via email to