vanquyen020920 commented on PR #13319:
URL: https://github.com/apache/cloudstack/pull/13319#issuecomment-4610312674
Thanks @bernardodemarco for testing and for the detailed feedback.
You were right. My initial patch was incomplete: it allowed the
API/data-motion flow to continue and fixed the `srcData == null` issue, but it
did not update the active libvirt domain disk source. As a result, CloudStack
volume metadata could be updated while the running VM was still using the old
source disk.
I reworked the KVM agent implementation locally to handle the regular live
migration path using libvirt block copy + pivot instead of only relying on
`copyPhysicalDisk()`.
The updated flow I tested is:
1. Detect regular live migration for an attached/running KVM volume.
2. Prepare the destination disk on the destination storage pool.
3. Create the destination physical disk if it does not already exist.
4. Generate the destination libvirt disk XML.
5. Run libvirt `blockCopy`.
6. Wait for the block job to complete.
7. Pivot the running disk to the destination.
8. Verify that the active and inactive libvirt domain XML point to the
destination disk before returning success.
Test environment:
* Apache CloudStack 4.20.0.0
* Ubuntu 22.04
* KVM
* Source primary storage: NFS / file-based primary storage
* Destination primary storage: Ceph RBD
### Test 1: system VM / virtual router root disk migration
Before migration, the running VM was using the NFS/file-based source:
```text
file disk vda
/mnt/0d464e3f-5176-3ba8-8c5f-01b8f4da5a2d/cbe98e70-0ec8-4dfa-8888-42b38a763672
```
During migration, the agent created the destination RBD volume and completed
the live block copy:
```text
Destination disk [cbe98e70-0ec8-4dfa-8888-42b38a763672] does not exist on
pool [a15210c6-c858-3174-a390-183f4ed25096]. Creating it before live block copy.
Attempting to create volume cbe98e70-0ec8-4dfa-8888-42b38a763672 (RBD) in
pool a15210c6-c858-3174-a390-183f4ed25096 with size (4.88 GB) 5242880000
Block copy has started for regular volume vda :
cbe98e70-0ec8-4dfa-8888-42b38a763672
Block copy completed for the volume vda :
cbe98e70-0ec8-4dfa-8888-42b38a763672
```
After migration, `virsh domblklist` shows that the active disk source was
pivoted to Ceph RBD:
```text
network disk vda CMC-CLOUDSTACK/cbe98e70-0ec8-4dfa-8888-42b38a763672
```
I also verified both active and inactive libvirt XML, and both now point to
the RBD destination:
```xml
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source protocol='rbd'
name='CMC-CLOUDSTACK/cbe98e70-0ec8-4dfa-8888-42b38a763672'>
<host name='10.14.5.55'/>
<host name='10.14.5.56'/>
<host name='10.14.5.57'/>
<auth username='admin'>
<secret type='ceph' uuid='a15210c6-c858-3174-a390-183f4ed25096'/>
</auth>
</source>
<backingStore/>
<target dev='vda' bus='virtio'/>
</disk>
```
<!-- Attach screenshot: active/inactive XML and block copy completed log -->
### Test 2: user VM root disk migration
I also tested with a normal user VM.
Before migration, the VM root disk was on NFS/file-based storage:
```text
file disk vda
/mnt/0d464e3f-5176-3ba8-8c5f-01b8f4da5a2d/65ca9af4-ccb8-43a2-96f1-c91202cd192f
```
<img width="1107" height="850" alt="image"
src="https://github.com/user-attachments/assets/4bff61a1-8032-496d-bd3e-bc2e8b9a5f42"
/>
<img width="1280" height="700" alt="image"
src="https://github.com/user-attachments/assets/58707519-4d21-4601-ae5b-6b71066b2f9b"
/>
<img width="1280" height="680" alt="image"
src="https://github.com/user-attachments/assets/3c9fc1ed-ad9b-4372-99b7-a2a6e68ccd9d"
/>
<img width="1280" height="640" alt="image"
src="https://github.com/user-attachments/assets/7dc3f750-ead0-4daa-9c96-286fb3b1d901"
/>
The guest had a test file before migration:
```text
cat test.txt
Test Migrate Disk
```
After live migration to Ceph RBD, CloudStack UI shows the root volume on
`CPM-CEPH`.
The agent log shows the destination RBD volume was created and block copy
completed:
```text
Using live block copy path for regular volume migration. VM [i-2-213-VM],
source path [65ca9af4-ccb8-43a2-96f1-c91202cd192f], destination path
[65ca9af4-ccb8-43a2-96f1-c91202cd192f], destination pool
[a15210c6-c858-3174-a390-183f4ed25096].
Preparing destination disk for regular live volume migration. Destination
path [65ca9af4-ccb8-43a2-96f1-c91202cd192f], destination pool
[a15210c6-c858-3174-a390-183f4ed25096].
Destination disk [65ca9af4-ccb8-43a2-96f1-c91202cd192f] was not found on
pool [a15210c6-c858-3174-a390-183f4ed25096]. It will be created before live
block copy.
Destination disk [65ca9af4-ccb8-43a2-96f1-c91202cd192f] does not exist on
pool [a15210c6-c858-3174-a390-183f4ed25096]. Creating it before live block copy.
Attempting to create volume 65ca9af4-ccb8-43a2-96f1-c91202cd192f (RBD) in
pool a15210c6-c858-3174-a390-183f4ed25096 with size (10.00 GB) 10737418240
Block copy has started for regular volume vda :
65ca9af4-ccb8-43a2-96f1-c91202cd192f
Block copy completed for the volume vda :
65ca9af4-ccb8-43a2-96f1-c91202cd192f
```
After migration, I verified the guest remained running and the test file was
still readable:
```text
cat test.txt
Test Migrate Disk
Final
```
<img width="1280" height="705" alt="image"
src="https://github.com/user-attachments/assets/a447a84f-1680-46d4-a893-8020834676f4"
/>
<img width="1280" height="668" alt="image"
src="https://github.com/user-attachments/assets/368da1c3-7dd5-4651-a3fc-bec42e02442d"
/>
<img width="1280" height="799" alt="image"
src="https://github.com/user-attachments/assets/497be0ed-f512-4b07-8f47-07c8338c18fb"
/>
<img width="1280" height="225" alt="image"
src="https://github.com/user-attachments/assets/c4718366-a5b0-40df-ba96-694776f0cb38"
/>
<img width="1201" height="828" alt="image"
src="https://github.com/user-attachments/assets/b735e7ee-cd5c-49a6-a685-bb450646135d"
/>
This addresses the specific issue you found where only CloudStack metadata
changed while the VM domain XML remained unchanged. With the updated
implementation, the running libvirt domain is pivoted to the destination
storage.
I agree that the broader feature still needs careful validation for
additional edge cases, including:
* backing chains with multiple deltas;
* incremental volume snapshots;
* VM snapshots;
* file-based to file-based migrations;
* RBD to file-based migrations;
* multi-disk VMs;
* reboot validation after migration.
I will push the updated implementation and include this validation evidence
so it can be reviewed and tested further.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]