Fabiano Rosas <faro...@suse.de> wrote: > Juan Quintela <quint...@redhat.com> writes: > >> Fabiano Rosas <faro...@suse.de> wrote: >>> Juan Quintela <quint...@redhat.com> writes: >>> >> >> This is a common pattern for concurrency. To not have your mutex locked >> too long, you put a variable (that can only be tested/changed with the >> lock) to explain that the "channel" is busy, the struct that lock >> protects is not (see how we make sure that the channel don't use any >> variable of the struct without the locking). > > Sure, but what purpose is to mark the channel as busy? The migration > thread cannot access the p->packet anyway. From multifd_send_pages() > perspective, as soon as the channel releases the lock to start with the > IO, the packet has been sent. It could start preparing the next pages > struct while the channel is doing IO. No?
ok, we remove the pending. Then we are sending that packet. But see what happens on multifd_send_pages() channels_ready is 0. this is channel 1 next_channel == 1 channel 0 gets ready, so it increases channels_ready. static int multifd_send_pages(QEMUFile *f) { qemu_sem_wait(&multifd_send_state->channels_ready); // we pass this next_channel %= migrate_multifd_channels(); for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) { p = &multifd_send_state->params[i]; // remember that i == 0 qemu_mutex_lock(&p->mutex); if (p->quit) { error_report("%s: channel %d has already quit!", __func__, i); qemu_mutex_unlock(&p->mutex); return -1; } if (!p->pending_job) { p->pending_job++; next_channel = (i + 1) % migrate_multifd_channels(); break; } qemu_mutex_unlock(&p->mutex); // We choose 1, to send the packet through it. // channel 1 is busy. // channel 0 is idle but receives no work. } ... } So the variable is there to differentiate what channels are busy/idle to send the work to the idle channels. > We don't touch p after the IO aside from p->pending_jobs-- and we > already distribute the load uniformly by incrementing next_channel. I know. After multifd_send_threads() releases the mutex it will only touch ->pending_job (taking the mutex 1st). > I'm not saying this would be more performant, just wondering if it would > be possible. Yeap, but as said before quite suboptimal. >> As said, we don't want that. Because channels can go a different speeds >> due to factors outside of our control. >> >> If the semaphore bothers you, you can change it to to a condition >> variable, but you just move the complexity from one side to the other >> (Initial implementation had a condition variable, but Paolo said that >> the semaphore is more efficient, so he won) > > Oh, it doesn't bother me. I'm just trying to unequivocally understand > it's effects. And if it logically follows that it's not necessary, only > then remove it. Both channels_ready and pending_job makes the scheme more performant. Without them it will not fail, just work way slower. In the example that just showed you, if we started always from channel 0 to search for a idle channel, we would even do worse (that would be an actual error): start with channels_ready == 0; channels_ready is 0. channel 1 gets ready, so it increases channels_ready. static int multifd_send_pages(QEMUFile *f) { qemu_sem_wait(&multifd_send_state->channels_ready); // we pass this for (i = 0;; i = (i + 1) % migrate_multifd_channels()) { p = &multifd_send_state->params[i]; // remember that i == 0 qemu_mutex_lock(&p->mutex); if (p->quit) { error_report("%s: channel %d has already quit!", __func__, i); qemu_mutex_unlock(&p->mutex); return -1; } if (!p->pending_job) { p->pending_job++; next_channel = (i + 1) % migrate_multifd_channels(); break; } // As there is no test to see if this is idle, we put the page here qemu_mutex_unlock(&p->mutex); // We put here the page info } ... } channel 2 guest ready, so it increses channels_ready static int multifd_send_pages(QEMUFile *f) { qemu_sem_wait(&multifd_send_state->channels_ready); // we pass this for (i = 0;; i = (i + 1) % migrate_multifd_channels()) { p = &multifd_send_state->params[i]; // remember that i == 0 qemu_mutex_lock(&p->mutex); if (p->quit) { error_report("%s: channel %d has already quit!", __func__, i); qemu_mutex_unlock(&p->mutex); return -1; } if (!p->pending_job) { p->pending_job++; next_channel = (i + 1) % migrate_multifd_channels(); break; } // As there is no test to see if this is idle, we put the page here qemu_mutex_unlock(&p->mutex); // We put here the page info // channel 0 is still transmitting the 1st page // And overwrote the previous page info } ... } In this particular case, using next_channel in round robin would have saved the case. When you put info for a channel to consume asynhronously, you need to mark somehow that the channel has finished to use the data before ordering it to put do more job. We can changing pending_job to a bool if you preffer. I think that we have nailed all the off_by_one errors by now (famous last words). Later, Juan.