Hi Sage and Jason,

My company is building backup system based on rbd export-diff and
import-diff cmds.

However, in recent test we found some strange behaviors of cmd export-diff.
long words in short: sometimes repeatedly executing rbd export-diff
–from-snap snap1 image@snap2 -|md5sum, and md5sum returns different values.

The details are:

We used two ceph rbd clusters: A for online vms usage and B for backup

For a specific vm image, this image is cloned from a parent image. And
initially our backup system will do a full backup with rbd export/import
cmds. Then every day we will do incremental backup with rbd
export-diff/import-diff cmds.

The make sure the data consistency, we also do the md5 comparison of online
vm images@snapN and backup vm images@snapN.

Our test found some times for some vm images the md5 check is failed:
online vm images@snapN doesn’t match backup vm images@snapN.

To narrow this issue, we manually generated the incremental file generated
by rbd export-diff between the specific snaps and found its md5 didn’t
match the file generated by backup scripits.

Compared those two binary files we found only a little difference: some
bytes are not the same.

I doubt could this be an export-diff bug? As far as I know, if we create
two snaps, then the diffs between two snaps should always be the same. But
why export-diff doesn’t work as expected and return different md5 check?
Some corner case not well considered or anyone else has the same
experience? BTW, we did some fio io workload 24 hours in vms during the
backup test.


