Re: [PATCH 1/3] qemu-img: Add checksum command
On Mon, Nov 7, 2022 at 12:20 PM Hanna Reitz wrote: > On 30.10.22 18:37, Nir Soffer wrote: > > On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz wrote: > > > > On 01.09.22 16:32, Nir Soffer wrote: > [...] > > > --- > > > docs/tools/qemu-img.rst | 22 + > > > meson.build | 10 ++- > > > meson_options.txt | 2 + > > > qemu-img-cmds.hx| 8 ++ > > > qemu-img.c | 191 > > > > > 5 files changed, 232 insertions(+), 1 deletion(-) > > > > > > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst > > > index 85a6e05b35..8be9c45cbf 100644 > > > --- a/docs/tools/qemu-img.rst > > > +++ b/docs/tools/qemu-img.rst > > > @@ -347,20 +347,42 @@ Command description: > > > Check completed, image is corrupted > > > 3 > > > Check completed, image has leaked clusters, but is not > > corrupted > > > 63 > > > Checks are not supported by the image format > > > > > > If ``-r`` is specified, exit codes representing the image > > state refer to the > > > state after (the attempt at) repairing it. That is, a > > successful ``-r all`` > > > will yield the exit code 0, independently of the image state > > before. > > > > > > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f > > FMT] [-T SRC_CACHE] [-p] FILENAME > > > + > > > + Print a checksum for image *FILENAME* guest visible content. > > > > Why not say which kind of checksum it is? > > > > > > Do you mean the algorithm used? This may be confusing, for example we > > write > > > >Print a sha256 checksum ... > > > > User will expect to get the same result from "sha256sum disk.img". How > > about > > > >Print a blkhash checksum ... > > > > And add a link to the blkhash project? > > I did mean sha256, but if it isn’t pure sha256, then a link to any > description how it is computed would be good, I think. > Ok, will link to https://gitlab.com/nirs/blkhash [...] > > > > + The checksum is not compatible with other tools such as > > *sha256sum*. > > > > Why not? I can see it differs even for raw images, but why? I would > > have very much assumed that this gives me exactly what sha256sum > > in the > > guest on the guest device would yield. > > > > > > The blkhash is a construction based on other cryptographic hash > > functions (e.g. sha256). > > The way the hash is constructed is explained here: > > https://gitlab.com/nirs/blkhash/-/blob/master/blkhash.py#L52 > > > > We can provide a very slow version using a single thread and no zero > > optimization > > that will create the same hash as sha256sum for raw image. > > Ah, right. Yes, especially zero optimization is likely to make a huge > difference. Thanks for the explanation! > > Maybe that could be mentioned here as a side note, though? E.g. “The > checksum is not compatible with other tools such as *sha256sum* for > optimization purposes (to allow multithreading and optimized handling of > zero areas).”? > Ok, I will improve the text in the next version. [...] > > In blksum I do not allow changing the block size. > > > > I'll add an assert in the next version to keeps this default optimal. > > Thanks! (Static assert should work, right?) > I think it should Nir
Re: [PATCH 1/3] qemu-img: Add checksum command
On 30.10.22 18:37, Nir Soffer wrote: On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz wrote: On 01.09.22 16:32, Nir Soffer wrote: > The checksum command compute a checksum for disk image content using the > blkhash library[1]. The blkhash library is not packaged yet, but it is > available via copr[2]. > > Example run: > > $ ./qemu-img checksum -p fedora-35.qcow2 > 6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5 fedora-35.qcow2 > > The block checksum is constructed by splitting the image to fixed sized > blocks and computing a digest of every block. The image checksum is the > digest of the all block digests. > > The checksum uses internally the "sha256" algorithm but it cannot be > compared with checksums created by other tools such as `sha256sum`. > > The blkhash library supports sparse images, zero detection, and > optimizes zero block hashing (they are practically free). The library > uses multiple threads to speed up the computation. > > Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times > faster, depending on the amount of data in the image: > > $ ./qemu-img info /scratch/50p.raw > file format: raw > virtual size: 6 GiB (6442450944 bytes) > disk size: 2.91 GiB > > $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \ > "sha256sum /scratch/50p.raw" > Benchmark 1: ./qemu-img checksum /scratch/50p.raw > Time (mean ± σ): 1.849 s ± 0.037 s [User: 7.764 s, System: 0.962 s] > Range (min … max): 1.813 s … 1.908 s 5 runs > > Benchmark 2: sha256sum /scratch/50p.raw > Time (mean ± σ): 14.585 s ± 0.072 s [User: 13.537 s, System: 1.003 s] > Range (min … max): 14.501 s … 14.697 s 5 runs > > Summary > './qemu-img checksum /scratch/50p.raw' ran > 7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw' > > The new command is available only when `blkhash` is available during > build. To test the new command please install the `blkhash-devel` > package: > > $ dnf copr enable nsoffer/blkhash > $ sudo dnf install blkhash-devel > > [1] https://gitlab.com/nirs/blkhash > [2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/ > [3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s, > sha256sum (estimate): 17,749s > > Signed-off-by: Nir Soffer > --- > docs/tools/qemu-img.rst | 22 + > meson.build | 10 ++- > meson_options.txt | 2 + > qemu-img-cmds.hx | 8 ++ > qemu-img.c | 191 > 5 files changed, 232 insertions(+), 1 deletion(-) > > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst > index 85a6e05b35..8be9c45cbf 100644 > --- a/docs/tools/qemu-img.rst > +++ b/docs/tools/qemu-img.rst > @@ -347,20 +347,42 @@ Command description: > Check completed, image is corrupted > 3 > Check completed, image has leaked clusters, but is not corrupted > 63 > Checks are not supported by the image format > > If ``-r`` is specified, exit codes representing the image state refer to the > state after (the attempt at) repairing it. That is, a successful ``-r all`` > will yield the exit code 0, independently of the image state before. > > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T SRC_CACHE] [-p] FILENAME > + > + Print a checksum for image *FILENAME* guest visible content. Why not say which kind of checksum it is? Do you mean the algorithm used? This may be confusing, for example we write Print a sha256 checksum ... User will expect to get the same result from "sha256sum disk.img". How about Print a blkhash checksum ... And add a link to the blkhash project? I did mean sha256, but if it isn’t pure sha256, then a link to any description how it is computed would be good, I think. > Images with > + different format or settings wil have the same checksum. s/wil/will/ Fixing > + > + The format is probed unless you specify it by ``-f``. > + > + The checksum is computed for guest visible content. Allocated areas full of > + zeroes, zero clusters, and unallocated areas are read as zeros so they will > + have the same checksum. Images with single or multiple files or backing files > + will have the same checksums if the guest will see the same content when > + reading the image. > + > + Image metadata that is not visible to the guest such as dirty bit
Re: [PATCH 1/3] qemu-img: Add checksum command
On Wed, Oct 26, 2022 at 4:00 PM Hanna Reitz wrote: > On 01.09.22 16:32, Nir Soffer wrote: > > The checksum command compute a checksum for disk image content using the > > blkhash library[1]. The blkhash library is not packaged yet, but it is > > available via copr[2]. > > > > Example run: > > > > $ ./qemu-img checksum -p fedora-35.qcow2 > > 6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5 > fedora-35.qcow2 > > > > The block checksum is constructed by splitting the image to fixed sized > > blocks and computing a digest of every block. The image checksum is the > > digest of the all block digests. > > > > The checksum uses internally the "sha256" algorithm but it cannot be > > compared with checksums created by other tools such as `sha256sum`. > > > > The blkhash library supports sparse images, zero detection, and > > optimizes zero block hashing (they are practically free). The library > > uses multiple threads to speed up the computation. > > > > Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times > > faster, depending on the amount of data in the image: > > > > $ ./qemu-img info /scratch/50p.raw > > file format: raw > > virtual size: 6 GiB (6442450944 bytes) > > disk size: 2.91 GiB > > > > $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum > /scratch/50p.raw" \ > > "sha256sum /scratch/50p.raw" > > Benchmark 1: ./qemu-img checksum /scratch/50p.raw > >Time (mean ± σ): 1.849 s ± 0.037 s[User: 7.764 s, > System: 0.962 s] > >Range (min … max):1.813 s … 1.908 s5 runs > > > > Benchmark 2: sha256sum /scratch/50p.raw > >Time (mean ± σ): 14.585 s ± 0.072 s[User: 13.537 s, > System: 1.003 s] > >Range (min … max): 14.501 s … 14.697 s5 runs > > > > Summary > >'./qemu-img checksum /scratch/50p.raw' ran > > 7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw' > > > > The new command is available only when `blkhash` is available during > > build. To test the new command please install the `blkhash-devel` > > package: > > > > $ dnf copr enable nsoffer/blkhash > > $ sudo dnf install blkhash-devel > > > > [1] https://gitlab.com/nirs/blkhash > > [2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/ > > [3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s, > > sha256sum (estimate): 17,749s > > > > Signed-off-by: Nir Soffer > > --- > > docs/tools/qemu-img.rst | 22 + > > meson.build | 10 ++- > > meson_options.txt | 2 + > > qemu-img-cmds.hx| 8 ++ > > qemu-img.c | 191 > > 5 files changed, 232 insertions(+), 1 deletion(-) > > > > diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst > > index 85a6e05b35..8be9c45cbf 100644 > > --- a/docs/tools/qemu-img.rst > > +++ b/docs/tools/qemu-img.rst > > @@ -347,20 +347,42 @@ Command description: > > Check completed, image is corrupted > > 3 > > Check completed, image has leaked clusters, but is not corrupted > > 63 > > Checks are not supported by the image format > > > > If ``-r`` is specified, exit codes representing the image state > refer to the > > state after (the attempt at) repairing it. That is, a successful > ``-r all`` > > will yield the exit code 0, independently of the image state before. > > > > +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T > SRC_CACHE] [-p] FILENAME > > + > > + Print a checksum for image *FILENAME* guest visible content. > > Why not say which kind of checksum it is? > Do you mean the algorithm used? This may be confusing, for example we write Print a sha256 checksum ... User will expect to get the same result from "sha256sum disk.img". How about Print a blkhash checksum ... And add a link to the blkhash project? > > > Images > with > > + different format or settings wil have the same checksum. > > s/wil/will/ > Fixing > > > + > > + The format is probed unless you specify it by ``-f``. > > + > > + The checksum is computed for guest visible content. Allocated areas > full of > > + zeroes, zero clusters, and unallocated areas are read as zeros so > they will > > + have the same checksum. Images with single or multiple files or > backing files > > + will have the same checksums if the guest will see the same content > when > > + reading the image. > > + > > + Image metadata that is not visible to the guest such as dirty bitmaps > does > > + not affect the checksum. > > + > > + Computing a checksum requires a read-only image. You cannot compute a > > + checksum of an active image used by a guest, > > Makes me ask: Why not? Other subcommands have the -U flag for this. > The text is not precise enough, the issue is not active image but having a read only imag
Re: [PATCH 1/3] qemu-img: Add checksum command
On 01.09.22 16:32, Nir Soffer wrote: The checksum command compute a checksum for disk image content using the blkhash library[1]. The blkhash library is not packaged yet, but it is available via copr[2]. Example run: $ ./qemu-img checksum -p fedora-35.qcow2 6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5 fedora-35.qcow2 The block checksum is constructed by splitting the image to fixed sized blocks and computing a digest of every block. The image checksum is the digest of the all block digests. The checksum uses internally the "sha256" algorithm but it cannot be compared with checksums created by other tools such as `sha256sum`. The blkhash library supports sparse images, zero detection, and optimizes zero block hashing (they are practically free). The library uses multiple threads to speed up the computation. Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times faster, depending on the amount of data in the image: $ ./qemu-img info /scratch/50p.raw file format: raw virtual size: 6 GiB (6442450944 bytes) disk size: 2.91 GiB $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \ "sha256sum /scratch/50p.raw" Benchmark 1: ./qemu-img checksum /scratch/50p.raw Time (mean ± σ): 1.849 s ± 0.037 s[User: 7.764 s, System: 0.962 s] Range (min … max):1.813 s … 1.908 s5 runs Benchmark 2: sha256sum /scratch/50p.raw Time (mean ± σ): 14.585 s ± 0.072 s[User: 13.537 s, System: 1.003 s] Range (min … max): 14.501 s … 14.697 s5 runs Summary './qemu-img checksum /scratch/50p.raw' ran 7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw' The new command is available only when `blkhash` is available during build. To test the new command please install the `blkhash-devel` package: $ dnf copr enable nsoffer/blkhash $ sudo dnf install blkhash-devel [1] https://gitlab.com/nirs/blkhash [2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/ [3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s, sha256sum (estimate): 17,749s Signed-off-by: Nir Soffer --- docs/tools/qemu-img.rst | 22 + meson.build | 10 ++- meson_options.txt | 2 + qemu-img-cmds.hx| 8 ++ qemu-img.c | 191 5 files changed, 232 insertions(+), 1 deletion(-) diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst index 85a6e05b35..8be9c45cbf 100644 --- a/docs/tools/qemu-img.rst +++ b/docs/tools/qemu-img.rst @@ -347,20 +347,42 @@ Command description: Check completed, image is corrupted 3 Check completed, image has leaked clusters, but is not corrupted 63 Checks are not supported by the image format If ``-r`` is specified, exit codes representing the image state refer to the state after (the attempt at) repairing it. That is, a successful ``-r all`` will yield the exit code 0, independently of the image state before. +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T SRC_CACHE] [-p] FILENAME + + Print a checksum for image *FILENAME* guest visible content. Why not say which kind of checksum it is? Images with + different format or settings wil have the same checksum. s/wil/will/ + + The format is probed unless you specify it by ``-f``. + + The checksum is computed for guest visible content. Allocated areas full of + zeroes, zero clusters, and unallocated areas are read as zeros so they will + have the same checksum. Images with single or multiple files or backing files + will have the same checksums if the guest will see the same content when + reading the image. + + Image metadata that is not visible to the guest such as dirty bitmaps does + not affect the checksum. + + Computing a checksum requires a read-only image. You cannot compute a + checksum of an active image used by a guest, Makes me ask: Why not? Other subcommands have the -U flag for this. but you can compute a checksum + of a guest during pull mode incremental backup using NBD URL. + + The checksum is not compatible with other tools such as *sha256sum*. Why not? I can see it differs even for raw images, but why? I would have very much assumed that this gives me exactly what sha256sum in the guest on the guest device would yield. + .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME Commit the changes recorded in *FILENAME* in its base image or backing file. If the backing file is smaller than the snapshot, then the backing file will be resized to be the same size as the snapshot. If the snapshot is smaller than the backing file,
[PATCH 1/3] qemu-img: Add checksum command
The checksum command compute a checksum for disk image content using the blkhash library[1]. The blkhash library is not packaged yet, but it is available via copr[2]. Example run: $ ./qemu-img checksum -p fedora-35.qcow2 6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5 fedora-35.qcow2 The block checksum is constructed by splitting the image to fixed sized blocks and computing a digest of every block. The image checksum is the digest of the all block digests. The checksum uses internally the "sha256" algorithm but it cannot be compared with checksums created by other tools such as `sha256sum`. The blkhash library supports sparse images, zero detection, and optimizes zero block hashing (they are practically free). The library uses multiple threads to speed up the computation. Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times faster, depending on the amount of data in the image: $ ./qemu-img info /scratch/50p.raw file format: raw virtual size: 6 GiB (6442450944 bytes) disk size: 2.91 GiB $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \ "sha256sum /scratch/50p.raw" Benchmark 1: ./qemu-img checksum /scratch/50p.raw Time (mean ± σ): 1.849 s ± 0.037 s[User: 7.764 s, System: 0.962 s] Range (min … max):1.813 s … 1.908 s5 runs Benchmark 2: sha256sum /scratch/50p.raw Time (mean ± σ): 14.585 s ± 0.072 s[User: 13.537 s, System: 1.003 s] Range (min … max): 14.501 s … 14.697 s5 runs Summary './qemu-img checksum /scratch/50p.raw' ran 7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw' The new command is available only when `blkhash` is available during build. To test the new command please install the `blkhash-devel` package: $ dnf copr enable nsoffer/blkhash $ sudo dnf install blkhash-devel [1] https://gitlab.com/nirs/blkhash [2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/ [3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s, sha256sum (estimate): 17,749s Signed-off-by: Nir Soffer --- docs/tools/qemu-img.rst | 22 + meson.build | 10 ++- meson_options.txt | 2 + qemu-img-cmds.hx| 8 ++ qemu-img.c | 191 5 files changed, 232 insertions(+), 1 deletion(-) diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst index 85a6e05b35..8be9c45cbf 100644 --- a/docs/tools/qemu-img.rst +++ b/docs/tools/qemu-img.rst @@ -347,20 +347,42 @@ Command description: Check completed, image is corrupted 3 Check completed, image has leaked clusters, but is not corrupted 63 Checks are not supported by the image format If ``-r`` is specified, exit codes representing the image state refer to the state after (the attempt at) repairing it. That is, a successful ``-r all`` will yield the exit code 0, independently of the image state before. +.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T SRC_CACHE] [-p] FILENAME + + Print a checksum for image *FILENAME* guest visible content. Images with + different format or settings wil have the same checksum. + + The format is probed unless you specify it by ``-f``. + + The checksum is computed for guest visible content. Allocated areas full of + zeroes, zero clusters, and unallocated areas are read as zeros so they will + have the same checksum. Images with single or multiple files or backing files + will have the same checksums if the guest will see the same content when + reading the image. + + Image metadata that is not visible to the guest such as dirty bitmaps does + not affect the checksum. + + Computing a checksum requires a read-only image. You cannot compute a + checksum of an active image used by a guest, but you can compute a checksum + of a guest during pull mode incremental backup using NBD URL. + + The checksum is not compatible with other tools such as *sha256sum*. + .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME Commit the changes recorded in *FILENAME* in its base image or backing file. If the backing file is smaller than the snapshot, then the backing file will be resized to be the same size as the snapshot. If the snapshot is smaller than the backing file, the backing file will not be truncated. If you want the backing file to match the size of the smaller snapshot, you can safely truncate it yourself once the commit operation successfully completes. The image *FILENAME* is emptied after the operation has succeeded. If you do diff --git a/meson.build b/meson.build index 20fddbd707..56b648d8a7 100644 --- a/meson.build +++ b/meson.build @@ -727,20 +727,24 @@ if not get_option('curl').auto() or have_block kwargs: static_kwargs) endif l