On 9/30/24 12:25 PM, Vladimir Sementsov-Ogievskiy wrote:
> [add migration maintainers]
> 
> On 24.09.24 15:56, Andrey Drobyshev wrote:
>> [...]
> 
> I doubt that this a correct way to go.
> 
> As far as I understand, "inactive" actually means that "storage is not
> belong to qemu, but to someone else (another qemu process for example),
> and may be changed transparently". In turn this means that Qemu should
> do nothing with inactive disks. So the problem is that nobody called
> bdrv_activate_all on target, and we shouldn't ignore that.
> 
> Hmm, I see in process_incoming_migration_bh() we do call
> bdrv_activate_all(), but only in some scenarios. May be, the condition
> should be less strict here.
> 
> Why we need any condition here at all? Don't we want to activate
> block-layer on target after migration anyway?
> 

Hmm I'm not sure about the unconditional activation, since we at least
have to honor LATE_BLOCK_ACTIVATE cap if it's set (and probably delay it
in such a case).  In current libvirt upstream I see such code:

> /* Migration capabilities which should always be enabled as long as they      
>   
>  * are supported by QEMU. If the capability is supposed to be enabled on both 
>   
>  * sides of migration, it won't be enabled unless both sides support it.      
>   
>  */                                                                           
>   
> static const qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOn[] = 
> {                                                                             
>     
>     {QEMU_MIGRATION_CAP_PAUSE_BEFORE_SWITCHOVER,                              
>   
>      QEMU_MIGRATION_SOURCE},                                                  
>   
>                                                                               
>   
>     {QEMU_MIGRATION_CAP_LATE_BLOCK_ACTIVATE,                                  
>   
>      QEMU_MIGRATION_DESTINATION},                                             
>   
> };

which means that libvirt always wants LATE_BLOCK_ACTIVATE to be set.

The code from process_incoming_migration_bh() you're referring to:

>     /* If capability late_block_activate is set:                              
>   
>      * Only fire up the block code now if we're going to restart the          
>   
>      * VM, else 'cont' will do it.                                            
>   
>      * This causes file locking to happen; so we don't want it to happen      
>   
>      * unless we really are starting the VM.                                  
>   
>      */                                                                       
>   
>     if (!migrate_late_block_activate() ||                                     
>   
>          (autostart && (!global_state_received() ||                           
>   
>             runstate_is_live(global_state_get_runstate())))) {                
>   
>         /* Make sure all file formats throw away their mutable metadata.      
>                                                                               
>    
>          * If we get an error here, just don't restart the VM yet. */         
>   
>         bdrv_activate_all(&local_err);                                        
>   
>         if (local_err) {                                                      
>   
>             error_report_err(local_err);                                      
>   
>             local_err = NULL;                                                 
>   
>             autostart = false;                                                
>   
>         }                                                                     
>   
>     }

It states explicitly that we're either going to start VM right at this
point if (autostart == true), or we wait till "cont" command happens.
None of this is going to happen if we start another migration while
still being in PAUSED state.  So I think it seems reasonable to take
such case into account.  For instance, this patch does prevent the crash:

> diff --git a/migration/migration.c b/migration/migration.c
> index ae2be31557..3222f6745b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -733,7 +733,8 @@ static void process_incoming_migration_bh(void *opaque)
>       */
>      if (!migrate_late_block_activate() ||
>           (autostart && (!global_state_received() ||
> -            runstate_is_live(global_state_get_runstate())))) {
> +            runstate_is_live(global_state_get_runstate()))) ||
> +         (!autostart && global_state_get_runstate() == RUN_STATE_PAUSED)) {
>          /* Make sure all file formats throw away their mutable metadata.
>           * If we get an error here, just don't restart the VM yet. */
>          bdrv_activate_all(&local_err);

What are your thoughts on it?

Andrey

Reply via email to