* Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote: > 13.03.2018 13:30, Dr. David Alan Gilbert wrote: > > * Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote: > > > 12.03.2018 18:30, Dr. David Alan Gilbert wrote: > > > > * Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote: > > > > > There would be savevm states (dirty-bitmap) which can migrate only in > > > > > postcopy stage. The corresponding pending is introduced here. > > > > > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> > > > > > --- > > > [...] > > > > > > > > static MigIterateState migration_iteration_run(MigrationState *s) > > > > > { > > > > > - uint64_t pending_size, pend_post, pend_nonpost; > > > > > + uint64_t pending_size, pend_pre, pend_compat, pend_post; > > > > > bool in_postcopy = s->state == > > > > > MIGRATION_STATUS_POSTCOPY_ACTIVE; > > > > > - qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, > > > > > - &pend_nonpost, &pend_post); > > > > > - pending_size = pend_nonpost + pend_post; > > > > > + qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, > > > > > &pend_pre, > > > > > + &pend_compat, &pend_post); > > > > > + pending_size = pend_pre + pend_compat + pend_post; > > > > > trace_migrate_pending(pending_size, s->threshold_size, > > > > > - pend_post, pend_nonpost); > > > > > + pend_pre, pend_compat, pend_post); > > > > > if (pending_size && pending_size >= s->threshold_size) { > > > > > /* Still a significant amount to transfer */ > > > > > if (migrate_postcopy() && !in_postcopy && > > > > > - pend_nonpost <= s->threshold_size && > > > > > - atomic_read(&s->start_postcopy)) { > > > > > + pend_pre <= s->threshold_size && > > > > > + (atomic_read(&s->start_postcopy) || > > > > > + (pend_pre + pend_compat <= s->threshold_size))) > > > > This change does something different from the description; > > > > it causes a postcopy_start even if the user never ran the postcopy-start > > > > command; so sorry, we can't do that; because postcopy for RAM is > > > > something that users can enable but only switch into when they've given > > > > up on it completing normally. > > > > > > > > However, I guess that leaves you with a problem; which is what happens > > > > to the system when you've run out of pend_pre+pend_compat but can't > > > > complete because pend_post is non-0; so I don't know the answer to that. > > > > > > > > > > > Hmm. Here, we go to postcopy only if "pend_pre + pend_compat <= > > > s->threshold_size". Pre-patch, in this case we will go to > > > migration_completion(). So, precopy stage is finishing anyway. > > Right. > > > > > So, we want > > > in this case to finish ram migration like it was finished by > > > migration_completion(), and then, run postcopy, which will handle only > > > dirty > > > bitmaps, yes? > > It's a bit tricky; the first important thing is that we can't change the > > semantics of the migration without the 'dirty bitmaps'. > > > > So then there's the question of how a migration with both > > postcopy-ram+dirty bitmaps should work; again I don't think we should > > enter the postcopy-ram phase until start-postcopy is issued. > > > > Then there's the 3rd case; dirty-bitmaps but no postcopy-ram; in that > > case I worry less about the semantics of how you want to do it. > > I have an idea: > > in postcopy_start(), in ram_has_postcopy() (and may be some other places?), > check atomic_read(&s->start_postcopy) instead of migrate_postcopy_ram()
We've got to use migrate_postcopy_ram() to decide whether we should do ram specific things, e.g. send the ram discard data. I'm wanting to make sure that if we have another full postcopy device (like RAM, maybe storage say) that we'll just add that in with migrate_postcopy_whatever(). > then: > > 1. behavior without dirty-bitmaps is not changed, as currently we cant go > into postcopy_start and ram_has_postcopy without s->start_postcopy > 2. dirty-bitmaps+ram: if user don't set s->start_postcopy, postcopy_start() > will operate as if migration capability was not enabled, so ram should > complete its migration > 3. only dirty-bitmaps: again, postcopy_start() will operate as if migration > capability was not enabled, so ram should complete its migration Why can't we just remove the change to the trigger condition in this patch? Then I think everything works as long as the management layer does eventually call migration-start-postcopy ? (It might waste some bandwidth at the point where there's otherwise nothing left to send). Even with the use of migrate-start-postcopy, you're going to need to be careful about the higher level story; you need to document when to do it and what the higher levels should do after a migration failure - at the moment they know that once postcopy starts migration is irrecoverable if it fails; I suspect that's not true with your dirty bitmaps. IMHO this still comes back to my original observation from ~18months ago that in many ways this isn't very postcopy like; in the sense that all the semantics are quite different from RAM. Dave > > > > > Hmm2. Looked through migration_completion(), I don't understand, how it > > > finishes ram migration without postcopy. It calls > > > qemu_savevm_state_complete_precopy(), which skips states with > > > has_postcopy=true, which is ram... > > Because savevm_state_complete_precopy only skips has_postcopy=true in > > the in_postcopy case: > > > > (in_postcopy && se->ops->has_postcopy && > > se->ops->has_postcopy(se->opaque)) || > > > > so when we call it in migration_completion(), if we've not entered > > postcopy yet, then that test doesn't trigger. > > > > (Apologies for not spotting this earlier; but I thought this patch was > > a nice easy one just adding the postcopy_only_pending - I didn't realise it > > changed > > existing semantics until I spotted that) > > oh, yes, I was inattentive :( > > > > > Dave > > > > > -- > > > Best regards, > > > Vladimir > > > > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > -- > Best regards, > Vladimir > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK