vanquyen020920 commented on PR #13319:
URL: https://github.com/apache/cloudstack/pull/13319#issuecomment-4610312674

   Thanks @bernardodemarco for testing and for the detailed feedback.
   
   You were right. My initial patch was incomplete: it allowed the 
API/data-motion flow to continue and fixed the `srcData == null` issue, but it 
did not update the active libvirt domain disk source. As a result, CloudStack 
volume metadata could be updated while the running VM was still using the old 
source disk.
   
   I reworked the KVM agent implementation locally to handle the regular live 
migration path using libvirt block copy + pivot instead of only relying on 
`copyPhysicalDisk()`.
   
   The updated flow I tested is:
   
   1. Detect regular live migration for an attached/running KVM volume.
   2. Prepare the destination disk on the destination storage pool.
   3. Create the destination physical disk if it does not already exist.
   4. Generate the destination libvirt disk XML.
   5. Run libvirt `blockCopy`.
   6. Wait for the block job to complete.
   7. Pivot the running disk to the destination.
   8. Verify that the active and inactive libvirt domain XML point to the 
destination disk before returning success.
   
   Test environment:
   
   * Apache CloudStack 4.20.0.0
   * Ubuntu 22.04
   * KVM
   * Source primary storage: NFS / file-based primary storage
   * Destination primary storage: Ceph RBD
   
   ### Test 1: system VM / virtual router root disk migration
   
   Before migration, the running VM was using the NFS/file-based source:
   
   ```text
   file disk vda 
/mnt/0d464e3f-5176-3ba8-8c5f-01b8f4da5a2d/cbe98e70-0ec8-4dfa-8888-42b38a763672
   ```
   
   During migration, the agent created the destination RBD volume and completed 
the live block copy:
   
   ```text
   Destination disk [cbe98e70-0ec8-4dfa-8888-42b38a763672] does not exist on 
pool [a15210c6-c858-3174-a390-183f4ed25096]. Creating it before live block copy.
   Attempting to create volume cbe98e70-0ec8-4dfa-8888-42b38a763672 (RBD) in 
pool a15210c6-c858-3174-a390-183f4ed25096 with size (4.88 GB) 5242880000
   Block copy has started for regular volume vda : 
cbe98e70-0ec8-4dfa-8888-42b38a763672
   Block copy completed for the volume vda : 
cbe98e70-0ec8-4dfa-8888-42b38a763672
   ```
   
   After migration, `virsh domblklist` shows that the active disk source was 
pivoted to Ceph RBD:
   
   ```text
   network disk vda CMC-CLOUDSTACK/cbe98e70-0ec8-4dfa-8888-42b38a763672
   ```
   
   I also verified both active and inactive libvirt XML, and both now point to 
the RBD destination:
   
   ```xml
   <disk type='network' device='disk'>
     <driver name='qemu' type='raw' cache='none'/>
     <source protocol='rbd' 
name='CMC-CLOUDSTACK/cbe98e70-0ec8-4dfa-8888-42b38a763672'>
       <host name='10.14.5.55'/>
       <host name='10.14.5.56'/>
       <host name='10.14.5.57'/>
       <auth username='admin'>
         <secret type='ceph' uuid='a15210c6-c858-3174-a390-183f4ed25096'/>
       </auth>
     </source>
     <backingStore/>
     <target dev='vda' bus='virtio'/>
   </disk>
   ```
   
   <!-- Attach screenshot: active/inactive XML and block copy completed log -->
   
   ### Test 2: user VM root disk migration
   
   I also tested with a normal user VM.
   
   Before migration, the VM root disk was on NFS/file-based storage:
   
   ```text
   file disk vda 
/mnt/0d464e3f-5176-3ba8-8c5f-01b8f4da5a2d/65ca9af4-ccb8-43a2-96f1-c91202cd192f
   ```
   <img width="1107" height="850" alt="image" 
src="https://github.com/user-attachments/assets/4bff61a1-8032-496d-bd3e-bc2e8b9a5f42";
 />
   <img width="1280" height="700" alt="image" 
src="https://github.com/user-attachments/assets/58707519-4d21-4601-ae5b-6b71066b2f9b";
 />
   <img width="1280" height="680" alt="image" 
src="https://github.com/user-attachments/assets/3c9fc1ed-ad9b-4372-99b7-a2a6e68ccd9d";
 />
   <img width="1280" height="640" alt="image" 
src="https://github.com/user-attachments/assets/7dc3f750-ead0-4daa-9c96-286fb3b1d901";
 />
   
   The guest had a test file before migration:
   
   ```text
   cat test.txt
   Test Migrate Disk
   ```
   
   After live migration to Ceph RBD, CloudStack UI shows the root volume on 
`CPM-CEPH`.
   
   The agent log shows the destination RBD volume was created and block copy 
completed:
   
   ```text
   Using live block copy path for regular volume migration. VM [i-2-213-VM], 
source path [65ca9af4-ccb8-43a2-96f1-c91202cd192f], destination path 
[65ca9af4-ccb8-43a2-96f1-c91202cd192f], destination pool 
[a15210c6-c858-3174-a390-183f4ed25096].
   Preparing destination disk for regular live volume migration. Destination 
path [65ca9af4-ccb8-43a2-96f1-c91202cd192f], destination pool 
[a15210c6-c858-3174-a390-183f4ed25096].
   Destination disk [65ca9af4-ccb8-43a2-96f1-c91202cd192f] was not found on 
pool [a15210c6-c858-3174-a390-183f4ed25096]. It will be created before live 
block copy.
   Destination disk [65ca9af4-ccb8-43a2-96f1-c91202cd192f] does not exist on 
pool [a15210c6-c858-3174-a390-183f4ed25096]. Creating it before live block copy.
   Attempting to create volume 65ca9af4-ccb8-43a2-96f1-c91202cd192f (RBD) in 
pool a15210c6-c858-3174-a390-183f4ed25096 with size (10.00 GB) 10737418240
   Block copy has started for regular volume vda : 
65ca9af4-ccb8-43a2-96f1-c91202cd192f
   Block copy completed for the volume vda : 
65ca9af4-ccb8-43a2-96f1-c91202cd192f
   ```
   
   After migration, I verified the guest remained running and the test file was 
still readable:
   
   ```text
   cat test.txt
   Test Migrate Disk
   Final
   ```
   
   <img width="1280" height="705" alt="image" 
src="https://github.com/user-attachments/assets/a447a84f-1680-46d4-a893-8020834676f4";
 />
   <img width="1280" height="668" alt="image" 
src="https://github.com/user-attachments/assets/368da1c3-7dd5-4651-a3fc-bec42e02442d";
 />
   <img width="1280" height="799" alt="image" 
src="https://github.com/user-attachments/assets/497be0ed-f512-4b07-8f47-07c8338c18fb";
 />
   <img width="1280" height="225" alt="image" 
src="https://github.com/user-attachments/assets/c4718366-a5b0-40df-ba96-694776f0cb38";
 />
   <img width="1201" height="828" alt="image" 
src="https://github.com/user-attachments/assets/b735e7ee-cd5c-49a6-a685-bb450646135d";
 />
   
   This addresses the specific issue you found where only CloudStack metadata 
changed while the VM domain XML remained unchanged. With the updated 
implementation, the running libvirt domain is pivoted to the destination 
storage.
   
   I agree that the broader feature still needs careful validation for 
additional edge cases, including:
   
   * backing chains with multiple deltas;
   * incremental volume snapshots;
   * VM snapshots;
   * file-based to file-based migrations;
   * RBD to file-based migrations;
   * multi-disk VMs;
   * reboot validation after migration.
   
   I will push the updated implementation and include this validation evidence 
so it can be reviewed and tested further.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to