StepBee opened a new issue, #9408:
URL: https://github.com/apache/cloudstack/issues/9408
<!--
Verify first that your issue/request is not already reported on GitHub.
Also test if the latest release and main branch are affected too.
Always add information AFTER of these HTML comments, but no need to delete
the comments.
-->
##### ISSUE TYPE
<!-- Pick one below and delete the rest -->
* Improvement Request
* Enhancement Request
##### COMPONENT NAME
<!--
Categorize the issue, e.g. API, VR, VPN, UI, etc.
-->
~~~
Snapshot with copy to secondary storage
~~~
##### CLOUDSTACK VERSION
<!--
New line separated list of affected versions, commit ID for issues on main
branch.
-->
~~~
4.19
~~~
##### CONFIGURATION
<!--
Information about the configuration if relevant, e.g. basic network,
advanced networking, etc. N/A otherwise
-->
Primary Storage based on Ceph / RBD
Secondary Storage based on Ceph / NFS via Ganesha Gateway
##### OS / ENVIRONMENT
<!--
Information about the environment if relevant, N/A otherwise
-->
Cloudstack Agent on Ubuntu
##### SUMMARY
<!-- Explain the problem/feature briefly -->
When creating snapshots of volumes located on Ceph/RBD with setting
snapshot.backup.to.secondary = true
1. RBD Snapshot is created on the primary storage
2. on the hypervisor, "qemu-img convert -O qcow2 .....
/mnt/<uuid-secondary-storage>/snapshots/../<snapshot_id>" is executed
The first step is running fast as expected, as it's only an rbd snapshot.
But the second step is slow, unbelievable slow.
The slow qemu-img results in timeouts and failed snapshots/backups for
larger volumes, even when increasing wait timeouts etc.
I compared the performance with raw output
qemu-img convert -O raw .....
/mnt/<uuid-secondary-storage>/snapshots/../<snapshot_id>
and with
rbd export (which exports the snapshot to a raw file)
Example numbers from a test:
qemu-img -O qcow2 = 100 Mbit/s
qemu-img -O raw = 5 Gbit/s
rbd export (raw output file) = 8 Gbit/s
I am aware that a qcow2 file has benefits, when parts of the image are
"empty" which will result in smaller image files - but the performance
difference, specifically for large filled up disks, is enormous.
From the code in
https://github.com/apache/cloudstack/blob/main/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStorageProcessor.java#L1010
is see, the output format is hardcoded to qcow2 and i read this was changed
from raw to qcow2 somewhere in the past.
My questions:
- What are other people experiences with the qemu-img performance from RBD
primary storage to NFS secondary storage
- is there a specific reason, why the output format is not a configuration
setting?
##### STEPS TO REPRODUCE
<!--
For bugs, show exactly how to reproduce the problem, using a minimal
test-case. Use Screenshots if accurate.
For new features, show how the feature would be used.
-->
<!-- Paste example playbooks or commands between quotes below -->
~~~
1. create a large volume on a RBD based primary storage
2. fill up the disk
3. set snapshot.backup.to.secondary = true
4. create a snapshot of the volume
instead of step 4, you can run the commands manually
QEMU-IMG output qcow2
qemu-img convert -O qcow2 -U
"rbd:<rbd-pool>/<rbd-image>@<snapshot>:mon_host=1xxxxx\:6789:auth_supported=cephx:id=xxxxx:key=xxxxx:rbd_default_format=2:client_mount_timeout=30"
/mnt/<uuid>/image-backup.qcow2
QEMU-IMG output raw
qemu-img convert -O raw -U
"rbd:<rbd-pool>/<rbd-image>@<snapshot>:mon_host=1xxxxx\:6789:auth_supported=cephx:id=xxxxx:key=xxxxx:rbd_default_format=2:client_mount_timeout=30"
/mnt/<uuid>/image-backup.raw
RBD export
rbd -c /etc/ceph/ceph.conf --id xxxx export
<rbd-pool>/<rbd-image>@<snapshot> /mnt/<uuid>/rbd-export-backup.raw
~~~
<!-- You can also paste gist.github.com links for larger files -->
##### EXPECTED RESULTS
<!-- What did you expect to happen when running the steps above? -->
~~~
Snapshots should be copied to secondary storage with a more performant
option than qemu-img -O qcow2.
Maybe it could be an idea to provide the qemu-img output format as a
configurable setting, as raw output would speed up the process already.
Or using "rbd export" with raw output.
Both raw options will probably use more backup space than the qcow2 option.
But considering the enormous performance difference, i'd rather provide more
space than have failed snapshots constantly.
~~~
##### ACTUAL RESULTS
<!-- What actually happened? -->
<!-- Paste verbatim command output between quotes below -->
~~~
Currently most of our snapshots of larger volumes are failing because of
timeouts, which result from very poor performance of using qemu-img -O qcow2
~~~
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]