"Dr. David Alan Gilbert" <dgilb...@redhat.com> wrote: > * Juan Quintela (quint...@redhat.com) wrote: >> We create new channels for each new thread created. We only send through >> them a character to be sure that we are creating the channels in the >> right order. > > That text is out of date isn't it?
oops, fixed. >> +gboolean multifd_new_channel(QIOChannel *ioc) >> +{ >> + int thread_count = migrate_multifd_threads(); >> + MultiFDRecvParams *p = g_new0(MultiFDRecvParams, 1); >> + MigrationState *s = migrate_get_current(); >> + char string[MULTIFD_UUID_MSG]; >> + char string_uuid[UUID_FMT_LEN]; >> + char *uuid; >> + int id; >> + >> + qio_channel_read(ioc, string, sizeof(string), &error_abort); >> + sscanf(string, "%s multifd %03d", string_uuid, &id); >> + >> + if (qemu_uuid_set) { >> + uuid = qemu_uuid_unparse_strdup(&qemu_uuid); >> + } else { >> + uuid = g_strdup(multifd_uuid); >> + } >> + if (strcmp(string_uuid, uuid)) { >> + error_report("multifd: received uuid '%s' and expected uuid '%s'", >> + string_uuid, uuid); > > probably worth adding the channel id as well so we can see > when it fails. Done. >> + migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, >> + MIGRATION_STATUS_FAILED); >> + terminate_multifd_recv_threads(); >> + return FALSE; >> + } >> + g_free(uuid); >> + >> + if (multifd_recv_state->params[id] != NULL) { >> + error_report("multifd: received id '%d' is already setup'", id); >> + migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, >> + MIGRATION_STATUS_FAILED); >> + terminate_multifd_recv_threads(); >> + return FALSE; >> + } >> + qemu_mutex_init(&p->mutex); >> + qemu_sem_init(&p->sem, 0); >> + p->quit = false; >> + p->id = id; >> + p->c = ioc; >> + atomic_set(&multifd_recv_state->params[id], p); > > Can you explain why this is quite so careful about ordering ? Is there > something that could look at params or try and take the mutex before > the count is incremented? what happened to me in the middle stages of the patches (yes, doing asynchronously was painful) was that: I created the threads (at the beggining I did the multifd_recv_state->params[id] == p inside the thread, that makes things really, really racy. I *think* that now we could probably do this as you state. > I think it's safe to do: > p->quit = false; > p->id = id; > p->c = ioc; > &multifd_recv_state->params[id] = p; > qemu_sem_init(&p->sem, 0); > qemu_mutex_init(&p->mutex); > qemu_thread_create(...) > atomic_inc(&multifd_recv_state->count); <-- I'm not sure if this > needs to be atomic We only change it on the main thread, so it should be enough. The split that I want to do is: we do the listen asynchronously when something arrives, we just read it (main thread) we then read <uuid> <string> <arguments> and then after checking that uuid is right, we call whatever function we have for "string", in our case "multifd", with <arguments> as one string parameters. This should make it easier to create new "channels" for other purposes. So far so good. But then it appears what are the responsabilities, At the beggining, I read the string on the reception thread for that channel, that created a race because I received the 1st message for that channel before the channel was fully created (yes, it only happened sometimes, easy to understand after debugging). This is the main reason that I changed to an array of pointers to structs instead of one array of structs. Then, I had to ve very careful to know when I had created all the channels threads, because otherwise I ended having races left and right. I will try to test the ordering that you suggested. >> + qemu_thread_create(&p->thread, "multifd_recv", multifd_recv_thread, p, >> + QEMU_THREAD_JOINABLE); > > You've lost the nice numbered thread names you had created in the > previous version of this that you're removing. I could get them back, but they really were not showing at gdb, where do they show? ps? >> + multifd_recv_state->count++; >> + >> + /* We need to return FALSE for the last channel */ >> + if (multifd_recv_state->count == thread_count) { >> + return FALSE; >> + } else { >> + return TRUE; >> + } > > return multifd_recv_state->count != thread_count; ? For other reasons I change this functions and now they use a different way of setting/checking if we have finished. Look at the new series. I didn't do as you said because I feel it weird that we return a bool when we expert a gboolean, but ..... Thanks, Juan.