Prasad Pandit <[email protected]> writes: > On Tue, 13 Jan 2026 at 01:15, Fabiano Rosas <[email protected]> wrote: >> There are failures that happen _because_ we cancelled. As I've mentioned >> somewhere else before, the cancellation is not "informed" to all threads >> running migration code, there are some code paths that will simply fail >> as a result of migration_cancel(). We need to allow cancelling to work >> in a possibly stuck thread (such as a blocked recv in the return path), >> but this means we end up calling qemu_file_shutdown indiscriminately. >> In these cases, parts of the code would set FAILED, but that failure is >> a result of cancelling. We've determined that migrate-cancel should >> always lead to CANCELLED and a new migration should always be possible. > > * I see. > >> This is ok, call it an error and done. >> >> > OTOH, if we cancel while processing an error/failure, end user >> > may not see that error because we report - migration was cancelled. >> > >> >> This is interesting, I _think_ it wouldn't be possible to cancel while >> handling an error due to BQL locked, the migrate-cancel wouldn't be >> issued while migration_cleanup is ongoing. However, I don't think we ever >> tested this scenario in particular. Maybe you could try to catch >> something by modifying the /migration/cancel tests, if you're willing. > > * I have made a note of looking at it at a later time. > >> Aside from the QAPI states, there are some internal states we already >> track with separate flags, e.g.: >> >> rp_thread_created, start_postcopy, migration_thread_running, >> switchover_acked, postcopy_package_loaded, fault_thread_quit, >> preempt_thread_status, load_threads_abort. >> >> A bit array could maybe cover all of these and more. >> >> --- >> >> You could send a PoC patch with your idea fixing this FAILING bug? We'd >> need a trigger for migrate, set_caps, etc and the failed event. >> >> If that new patch doesn't get consensus then we merge this one and work >> on a new design as time permits. > > * Considering the above wider coverage area, I think it is best to > first fix the issue at hand and then move to this new change. For now > I'll try to rebase my current patch on your v3: cleanup early > connection code series. Once that is through, I'll take the states > change patch. Hope that's okay. >
Ok, go ahead. > Thank you. > --- > - Prasad
