StepBee opened a new issue, #9408:
URL: https://github.com/apache/cloudstack/issues/9408

   <!--
   Verify first that your issue/request is not already reported on GitHub.
   Also test if the latest release and main branch are affected too.
   Always add information AFTER of these HTML comments, but no need to delete 
the comments.
   -->
   
   ##### ISSUE TYPE
   <!-- Pick one below and delete the rest -->
    * Improvement Request
    * Enhancement Request
   
   ##### COMPONENT NAME
   <!--
   Categorize the issue, e.g. API, VR, VPN, UI, etc.
   -->
   ~~~
   Snapshot with copy to secondary storage
   ~~~
   
   ##### CLOUDSTACK VERSION
   <!--
   New line separated list of affected versions, commit ID for issues on main 
branch.
   -->
   
   ~~~
   4.19
   ~~~
   
   ##### CONFIGURATION
   <!--
   Information about the configuration if relevant, e.g. basic network, 
advanced networking, etc.  N/A otherwise
   -->
   Primary Storage based on Ceph / RBD
   Secondary Storage based on Ceph / NFS via Ganesha Gateway
   
   ##### OS / ENVIRONMENT
   <!--
   Information about the environment if relevant, N/A otherwise
   -->
   Cloudstack Agent on Ubuntu
   
   ##### SUMMARY
   <!-- Explain the problem/feature briefly -->
   When creating snapshots of volumes located on Ceph/RBD with setting 
snapshot.backup.to.secondary = true
   1. RBD Snapshot is created on the primary storage
   2. on the hypervisor, "qemu-img convert -O qcow2 ..... 
/mnt/<uuid-secondary-storage>/snapshots/../<snapshot_id>" is executed
   
   The first step is running fast as expected, as it's only an rbd snapshot.
   But the second step is slow, unbelievable slow.
   
   The slow qemu-img results in timeouts and failed snapshots/backups for 
larger volumes, even when increasing wait timeouts etc.
   
   I compared the performance with raw output
   qemu-img convert -O raw ..... 
/mnt/<uuid-secondary-storage>/snapshots/../<snapshot_id>
   
   and with
   rbd export (which exports the snapshot to a raw file)
   
   Example numbers from a test:
   qemu-img -O qcow2 = 100 Mbit/s
   qemu-img -O raw = 5 Gbit/s
   rbd export (raw output file) = 8 Gbit/s
   
   I am aware that a qcow2 file has benefits, when parts of the image are 
"empty" which will result in smaller image files - but the performance 
difference, specifically for large filled up disks,  is enormous.
   
   From the code in
   
https://github.com/apache/cloudstack/blob/main/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStorageProcessor.java#L1010
   is see, the output format is hardcoded to qcow2 and i read this was changed 
from raw to qcow2 somewhere in the past.
   
   My questions:
   - What are other people experiences with the qemu-img performance from RBD 
primary storage to NFS secondary storage
   - is there a specific reason, why the output format is not a configuration 
setting?
   
   ##### STEPS TO REPRODUCE
   <!--
   For bugs, show exactly how to reproduce the problem, using a minimal 
test-case. Use Screenshots if accurate.
   
   For new features, show how the feature would be used.
   -->
   
   <!-- Paste example playbooks or commands between quotes below -->
   ~~~
   1. create a large volume on a RBD based primary storage
   2. fill up the disk
   3. set snapshot.backup.to.secondary = true
   4. create a snapshot of the volume
   
   instead of step 4, you can run the commands manually
   
   QEMU-IMG output qcow2
   qemu-img convert -O qcow2 -U 
"rbd:<rbd-pool>/<rbd-image>@<snapshot>:mon_host=1xxxxx\:6789:auth_supported=cephx:id=xxxxx:key=xxxxx:rbd_default_format=2:client_mount_timeout=30"
 /mnt/<uuid>/image-backup.qcow2
   
   QEMU-IMG output raw
   qemu-img convert -O raw -U 
"rbd:<rbd-pool>/<rbd-image>@<snapshot>:mon_host=1xxxxx\:6789:auth_supported=cephx:id=xxxxx:key=xxxxx:rbd_default_format=2:client_mount_timeout=30"
 /mnt/<uuid>/image-backup.raw
   
   RBD export
   rbd -c /etc/ceph/ceph.conf --id xxxx export 
<rbd-pool>/<rbd-image>@<snapshot> /mnt/<uuid>/rbd-export-backup.raw
   
   ~~~
   
   <!-- You can also paste gist.github.com links for larger files -->
   
   ##### EXPECTED RESULTS
   <!-- What did you expect to happen when running the steps above? -->
   
   ~~~
   Snapshots should be copied to secondary storage with a more performant 
option than qemu-img -O qcow2.
   Maybe it could be an idea to provide the qemu-img output format as a 
configurable setting, as raw output would speed up the process already.
   Or using "rbd export" with raw output.
   
   Both raw options will probably use more backup space than the qcow2 option.
   But considering the enormous performance difference, i'd rather provide more 
space than have failed snapshots constantly.
   ~~~
   
   ##### ACTUAL RESULTS
   <!-- What actually happened? -->
   
   <!-- Paste verbatim command output between quotes below -->
   ~~~
   Currently most of our snapshots of larger volumes are failing because of 
timeouts, which result from very poor performance of using qemu-img -O qcow2
   ~~~
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to