Kevin Wolf <kw...@redhat.com> writes:

> Am 07.08.2020 um 15:29 hat Markus Armbruster geschrieben:
>> This is just a sketch.  It needs comments and a real commit message.
>> 
>> As is, it goes on top of Kevin's series.  It is meant to be squashed
>> into PATCH 06.
>> 
>> Signed-off-by: Markus Armbruster <arm...@redhat.com>
>> ---
>>  include/qemu/coroutine.h     |  4 ++++
>>  include/qemu/coroutine_int.h |  2 ++
>>  monitor/monitor.c            | 36 +++++++++++++++---------------------
>>  util/qemu-coroutine.c        | 20 ++++++++++++++++++++
>>  4 files changed, 41 insertions(+), 21 deletions(-)
>> 
>> diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
>> index dfd261c5b1..11da47092c 100644
>> --- a/include/qemu/coroutine.h
>> +++ b/include/qemu/coroutine.h
>> @@ -65,6 +65,10 @@ typedef void coroutine_fn CoroutineEntry(void *opaque);
>>   */
>>  Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque);
>>  
>> +Coroutine *qemu_coroutine_create_with_storage(CoroutineEntry *entry,
>> +                                              void *opaque, size_t storage);
>> +void *qemu_coroutine_local_storage(Coroutine *co);
>> +
>>  /**
>>   * Transfer control to a coroutine
>>   */
>> diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
>> index bd6b0468e1..7d7865a02f 100644
>> --- a/include/qemu/coroutine_int.h
>> +++ b/include/qemu/coroutine_int.h
>> @@ -41,6 +41,8 @@ struct Coroutine {
>>      void *entry_arg;
>>      Coroutine *caller;
>>  
>> +    void *coroutine_local_storage;
>> +
>>      /* Only used when the coroutine has terminated.  */
>>      QSLIST_ENTRY(Coroutine) pool_next;
>
> This increases the size of Coroutine objects typically by 8 bytes and
> shifts the following fields by the same amount. On my x86_64 build, we
> have exactly those 8 bytes left in CoroutineUContext until a new
> cacheline would start. With different CONFIG_* settings, it could be the
> change that increases the size to a new cacheline. No idea what this
> looks like on other architectures.
>
> Does this or the shifting of fields matter for performance? I don't
> know. It might even be unlikely. But cache effects are hard to predict
> and not wanting to do the work of proving that it's indeed harmless is
> one of the reasons why for the slow paths in question I preferred a
> solution that doesn't touch the coroutine core at all.

Point taken.

Possible mitigation: add at the end rather than in the middle.

>> diff --git a/monitor/monitor.c b/monitor/monitor.c
>> index 50fb5b20d3..047a8fb380 100644
>> --- a/monitor/monitor.c
>> +++ b/monitor/monitor.c
>> @@ -82,38 +82,32 @@ bool qmp_dispatcher_co_shutdown;
>>   */
>>  bool qmp_dispatcher_co_busy;
>>  
>> -/*
>> - * Protects mon_list, monitor_qapi_event_state, coroutine_mon,
>> - * monitor_destroyed.
>> - */
>> +/* Protects mon_list, monitor_qapi_event_state, monitor_destroyed. */
>>  QemuMutex monitor_lock;
>>  static GHashTable *monitor_qapi_event_state;
>> -static GHashTable *coroutine_mon; /* Maps Coroutine* to Monitor* */
>>  
>>  MonitorList mon_list;
>>  int mon_refcount;
>>  static bool monitor_destroyed;
>>  
>> +static Monitor **monitor_curp(Coroutine *co)
>> +{
>> +    static __thread Monitor *global_cur_mon;
>> +
>> +    if (co == qmp_dispatcher_co) {
>> +        return qemu_coroutine_local_storage(co);
>> +    }
>> +    return &global_cur_mon;
>> +}
>
> Like the other patch, this needs to be extended for HMP. global_cur_mon
> is never meant to be set.

It is, for OOB commands.

> The solution fails as soon as we have more than a single monitor
> coroutine running at the same time because it relies on
> qmp_dispatcher_co.

Yes, but pretty much everything below handle_qmp_command() falls apart
then.  Remembering to update monitor_curp() would be the least of my
worries :)

>                    In this respect, it makes the same assumptions as the
> simple hack.
>
> Only knowing that qmp_dispatcher_co is always created with storage
> containing a Monitor** makes this safe.

Correct.

>>  Monitor *monitor_cur(void)
>>  {
>> -    Monitor *mon;
>> -
>> -    qemu_mutex_lock(&monitor_lock);
>> -    mon = g_hash_table_lookup(coroutine_mon, qemu_coroutine_self());
>> -    qemu_mutex_unlock(&monitor_lock);
>> -
>> -    return mon;
>> +    return *monitor_curp(qemu_coroutine_self());
>>  }
>>  
>>  void monitor_set_cur(Coroutine *co, Monitor *mon)
>>  {
>> -    qemu_mutex_lock(&monitor_lock);
>> -    if (mon) {
>> -        g_hash_table_replace(coroutine_mon, co, mon);
>> -    } else {
>> -        g_hash_table_remove(coroutine_mon, co);
>> -    }
>> -    qemu_mutex_unlock(&monitor_lock);
>> +    *monitor_curp(co) = mon;
>>  }
>>  
>>  /**
>> @@ -666,14 +660,14 @@ void monitor_init_globals_core(void)
>>  {
>>      monitor_qapi_event_init();
>>      qemu_mutex_init(&monitor_lock);
>> -    coroutine_mon = g_hash_table_new(NULL, NULL);
>>  
>>      /*
>>       * The dispatcher BH must run in the main loop thread, since we
>>       * have commands assuming that context.  It would be nice to get
>>       * rid of those assumptions.
>>       */
>> -    qmp_dispatcher_co = qemu_coroutine_create(monitor_qmp_dispatcher_co, 
>> NULL);
>> +    qmp_dispatcher_co = qemu_coroutine_create_with_storage(
>> +        monitor_qmp_dispatcher_co, NULL, sizeof(Monitor **));
>>      atomic_mb_set(&qmp_dispatcher_co_busy, true);
>>      aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
>>  }
>> diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
>> index c3caa6c770..87bf7f0fc0 100644
>> --- a/util/qemu-coroutine.c
>> +++ b/util/qemu-coroutine.c
>> @@ -81,8 +81,28 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry, 
>> void *opaque)
>>      return co;
>>  }
>>  
>> +Coroutine *qemu_coroutine_create_with_storage(CoroutineEntry *entry,
>> +                                              void *opaque, size_t storage)
>> +{
>> +    Coroutine *co = qemu_coroutine_create(entry, opaque);
>> +
>> +    if (!co) {
>> +        return NULL;
>> +    }
>> +
>> +    co->coroutine_local_storage = g_malloc0(storage);
>> +    return co;
>> +}
>
> As the code above shows, this interface is only useful if you can
> identify the coroutine. It cannot be used in code that didn't create the
> current coroutine because then it can't know whether or not the
> coroutine has coroutine local storage, and if it has, what its structure
> is.
>
> For a supposedly generic solution, I think this is a bit weak.

Yes, that's fair.

The solution Daniel proposed is makes the weakness more explicit:
instead of relying on "coroutine was created with this coroutine-local
storage", we'd rely on "coroutine_getspecific(key) does not fail".  It
can fail only if coroutine_setspecific(key, ...) was not called.  Not
much better in practice.

> Effectively, this might be a one-off solution in disguise because
> it's a big restriction on the possible use cases.

Daniel's solution is basically pthread_getspecific() for coroutines,
with the keys dumbed down.

If pthread_getspecific() was good enough for pthreads...

Well, it wasn't, or rather it was only because something better could
not be had with just a library, without toolchain support.  And that's
where we are with coroutines.

>> +void *qemu_coroutine_local_storage(Coroutine *co)
>> +{
>> +    return co->coroutine_local_storage;
>> +}
>> +
>>  static void coroutine_delete(Coroutine *co)
>>  {
>> +    g_free(co->coroutine_local_storage);
>> +    co->coroutine_local_storage = NULL;
>>      co->caller = NULL;
>>  
>>      if (CONFIG_COROUTINE_POOL) {
>
> Your list of pros/cons didn't mention coroutine creation/deletion as a
> hot path at all (which it is, we have one coroutine per request).

I did not expect coroutine creation / deletion to be a hot path.

It is not a hot path for QMP, because QMP is not a hot path.

I'm ready to accept the proposition that it's a hot path elsewhere.

> You leave qemu_coroutine_create() untouched (except indirectly by a
> larger g_malloc0() in the non-pooled case, which is negligible) and I
> assume that g_free(NULL) is cheap, so at least this is probably as good
> as it gets for something integrated in the coroutine core. Maybe an
> explicit if (co->coroutine_local_storage) would improve it slightly.
>
> Kevin


Reply via email to