Re: [Libguestfs] Checksums and other verification

2023-03-02 Thread Nir Soffer
On Thu, Mar 2, 2023 at 10:46 AM Richard W.M. Jones  wrote:
>
> On Mon, Feb 27, 2023 at 07:09:33PM +0200, Nir Soffer wrote:
> > On Mon, Feb 27, 2023 at 6:41 PM Richard W.M. Jones  
> > wrote:
> > > I think it would be more useful if (or in addition) it could compute
> > > the checksum of a stream which is being converted with 'qemu-img
> > > convert'.  Extra points if it can compute the checksum over either the
> > > input or output stream.
> >
> > I thought about this, it could be a filter that you add in the graph
> > that gives you checksum as a side effect of copying. But this requires
> > disabling unordered writes, which is pretty bad for performance.
> >
> > But even if you compute the checksum during a transfer, you want to
> > verify it by reading the transferred data from storage. Once you computed
> > the checksum you can keep it for verifying the same image in the future.
>
> The use-case I have in mind is being able to verify a download when
> you already know the checksum and are copying / converting the image
> in flight.
>
> eg: You are asked to download https://example.com/distro-cloud.qcow2
> with some published checksum and you will on the fly download and
> convert this to raw, but want to verify the checksum (of the qcow2)
> during the conversion step.  (Or at some point, but during the convert
> avoids having to spool the image locally.)

I'm thinking about the same flow. I think the best way to verify is:

1. The remote server publishes a block-checksum of the image
2. The system gets the block-checksum from the server (from http header?)
3. The system pulls data from the server, pushes to the target disk in
the wanted format
4. The system computes a checksum of the target disk

This way you verify the entire pipeline including the storage. If we
compute a checksum
during the conversion, we verify only that we got the correct data
from the server.

If we care only about verifying the transfer from the server, we can compute the
checksum during the download, which is likely to be sequential (so easy to
integrate with blkhash)

If we want to validate nbdcopy, it will be much harder to compute a checksum
inside nbdcopy because it does not stream the data in order.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] Checksums and other verification

2023-03-02 Thread Richard W.M. Jones
On Mon, Feb 27, 2023 at 07:09:33PM +0200, Nir Soffer wrote:
> On Mon, Feb 27, 2023 at 6:41 PM Richard W.M. Jones  wrote:
> > I think it would be more useful if (or in addition) it could compute
> > the checksum of a stream which is being converted with 'qemu-img
> > convert'.  Extra points if it can compute the checksum over either the
> > input or output stream.
> 
> I thought about this, it could be a filter that you add in the graph
> that gives you checksum as a side effect of copying. But this requires
> disabling unordered writes, which is pretty bad for performance.
> 
> But even if you compute the checksum during a transfer, you want to
> verify it by reading the transferred data from storage. Once you computed
> the checksum you can keep it for verifying the same image in the future.

The use-case I have in mind is being able to verify a download when
you already know the checksum and are copying / converting the image
in flight.

eg: You are asked to download https://example.com/distro-cloud.qcow2
with some published checksum and you will on the fly download and
convert this to raw, but want to verify the checksum (of the qcow2)
during the conversion step.  (Or at some point, but during the convert
avoids having to spool the image locally.)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] Checksums and other verification

2023-02-28 Thread Nir Soffer
On Tue, Feb 28, 2023 at 4:13 PM Laszlo Ersek  wrote:
>
> On 2/28/23 12:39, Richard W.M. Jones wrote:
> > On Tue, Feb 28, 2023 at 12:24:04PM +0100, Laszlo Ersek wrote:
> >> On 2/27/23 17:44, Richard W.M. Jones wrote:
> >>> On Mon, Feb 27, 2023 at 08:42:23AM -0600, Eric Blake wrote:
>  Or intentionally choose a hash that can be computed out-of-order,
>  such as a Merkle Tree.  But we'd need a standard setup for all
>  parties to agree on how the hash is to be computed and checked, if
>  it is going to be anything more than just a linear hash of the
>  entire guest-visible contents.
> >>>
> >>> Unfortunately I suspect that by far the easiest way for people who
> >>> host images to compute checksums is to run 'shaXXXsum' on them or
> >>> sign them with a GPG signature, rather than engaging in a novel hash
> >>> function.  Indeed that's what is happening now:
> >>>
> >>> https://alt.fedoraproject.org/en/verify.html
> >>
> >> If the output is produced with unordered writes, but the complete
> >> output needs to be verified with a hash *chain*, that still allows
> >> for some level of asynchrony. The start of the hashing need not be
> >> delayed until after the end of output, only after the start of
> >> output.
> >>
> >> For example, nbdcopy could maintain the highest offset up to which
> >> the output is contiguous, and on a separate thread, it could be
> >> hashing the output up to that offset.
> >>
> >> Considering a gigantic output, as yet unassembled blocks could likely
> >> not be buffered in memory (that's why the writes are unordered in the
> >> first place!), so the hashing thread would have to re-read the output
> >> via NBD. Whether that would cause performance to improve or to
> >> deteriorate is undecided IMO. If the far end of the output network
> >> block device can accommodate a reader that is independent of the
> >> writers, then this level of overlap is beneficial. Otherwise, this
> >> extra reader thread would just add more thrashing, and we'd be better
> >> off with a separate read-through once writing is complete.
> >
> > In my mind I'm wondering if there's any mathematical result that lets
> > you combine each hash(block_i) into the final hash(block[1..N])
> > without needing to compute the hash of each block in order.
>
> I've now checked:
>
> https://en.wikipedia.org/wiki/SHA-2
> https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction
> https://en.wikipedia.org/wiki/One-way_compression_function#Construction_from_block_ciphers
> https://en.wikipedia.org/wiki/One-way_compression_function#Davies%E2%80%93Meyer
>
> Consider the following order of steps:
>
> - precompute hash(block[n]), with some initial IV
> - throw away block[n]
> - wait until block[n-1] is processed, providing the actual IV for
>   hashing block[n]
> - mix the new IV into hash(block[n]) without having access to block[n]
>
> If such a method existed, it would break the security (i.e., the
> original design) of the hash, IMO, as it would separate the IV from
> block[n]. In a way, it would make the "mix" and "concat" operators (of
> the underlying block cipher's chaining method) distributive. I believe
> then you could generate a bunch of *valid* hash(block[n]) values as a
> mere function of the IV, without having access to block[n]. You could
> perhaps use that for probing against other hash(block[m]) values, and
> maybe determine repeating patterns in the plaintext. I'm not a
> cryptographer so I can't exactly show what security property is broken
> by separating the IV from block[n].
>
> > (This is what blkhash solves, but unfortunately the output isn't
> > compatible with standard hashes.)
>
> Assuming blkhash is a Merkle Tree implementation, blkhash solves a
> different problem IMO.

blkhash uses a flat Merkle tree, described here:
https://www.researchgate.net/publication/323243320_Foundations_of_Applied_Cryptography_and_Cybersecurity
seection 3.9.4 2lMT: the Flat Merkle Tree construction

To support parallel hashing it uses more complex construction, but this
can be simplified to a single flat Merkle tree.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] Checksums and other verification

2023-02-28 Thread Laszlo Ersek
On 2/28/23 12:39, Richard W.M. Jones wrote:
> On Tue, Feb 28, 2023 at 12:24:04PM +0100, Laszlo Ersek wrote:
>> On 2/27/23 17:44, Richard W.M. Jones wrote:
>>> On Mon, Feb 27, 2023 at 08:42:23AM -0600, Eric Blake wrote:
 Or intentionally choose a hash that can be computed out-of-order,
 such as a Merkle Tree.  But we'd need a standard setup for all
 parties to agree on how the hash is to be computed and checked, if
 it is going to be anything more than just a linear hash of the
 entire guest-visible contents.
>>>
>>> Unfortunately I suspect that by far the easiest way for people who
>>> host images to compute checksums is to run 'shaXXXsum' on them or
>>> sign them with a GPG signature, rather than engaging in a novel hash
>>> function.  Indeed that's what is happening now:
>>>
>>> https://alt.fedoraproject.org/en/verify.html
>>
>> If the output is produced with unordered writes, but the complete
>> output needs to be verified with a hash *chain*, that still allows
>> for some level of asynchrony. The start of the hashing need not be
>> delayed until after the end of output, only after the start of
>> output.
>>
>> For example, nbdcopy could maintain the highest offset up to which
>> the output is contiguous, and on a separate thread, it could be
>> hashing the output up to that offset.
>>
>> Considering a gigantic output, as yet unassembled blocks could likely
>> not be buffered in memory (that's why the writes are unordered in the
>> first place!), so the hashing thread would have to re-read the output
>> via NBD. Whether that would cause performance to improve or to
>> deteriorate is undecided IMO. If the far end of the output network
>> block device can accommodate a reader that is independent of the
>> writers, then this level of overlap is beneficial. Otherwise, this
>> extra reader thread would just add more thrashing, and we'd be better
>> off with a separate read-through once writing is complete.
>
> In my mind I'm wondering if there's any mathematical result that lets
> you combine each hash(block_i) into the final hash(block[1..N])
> without needing to compute the hash of each block in order.

I've now checked:

https://en.wikipedia.org/wiki/SHA-2
https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction
https://en.wikipedia.org/wiki/One-way_compression_function#Construction_from_block_ciphers
https://en.wikipedia.org/wiki/One-way_compression_function#Davies%E2%80%93Meyer

Consider the following order of steps:

- precompute hash(block[n]), with some initial IV
- throw away block[n]
- wait until block[n-1] is processed, providing the actual IV for
  hashing block[n]
- mix the new IV into hash(block[n]) without having access to block[n]

If such a method existed, it would break the security (i.e., the
original design) of the hash, IMO, as it would separate the IV from
block[n]. In a way, it would make the "mix" and "concat" operators (of
the underlying block cipher's chaining method) distributive. I believe
then you could generate a bunch of *valid* hash(block[n]) values as a
mere function of the IV, without having access to block[n]. You could
perhaps use that for probing against other hash(block[m]) values, and
maybe determine repeating patterns in the plaintext. I'm not a
cryptographer so I can't exactly show what security property is broken
by separating the IV from block[n].

> (This is what blkhash solves, but unfortunately the output isn't
> compatible with standard hashes.)

Assuming blkhash is a Merkle Tree implementation, blkhash solves a
different problem IMO. In your above notation, hash(block[1..N]) is a
hash of the concatenated plaintext blocks, and that's not what a Merkle
Tree describes. The "mix" and "concat" operators remain
non-distributive; it's the operator trees that differ. With a Merkle
Tree, there are sub-trees that can be evaluated independently of each
other. With SHA256, we have a fully imbalanced operator tree, one where
the tree depth is maximal.

Laszlo
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] Checksums and other verification

2023-02-28 Thread Richard W.M. Jones
On Tue, Feb 28, 2023 at 12:24:04PM +0100, Laszlo Ersek wrote:
> On 2/27/23 17:44, Richard W.M. Jones wrote:
> > On Mon, Feb 27, 2023 at 08:42:23AM -0600, Eric Blake wrote:
> >> Or intentionally choose a hash that can be computed out-of-order, such
> >> as a Merkle Tree.  But we'd need a standard setup for all parties to
> >> agree on how the hash is to be computed and checked, if it is going to
> >> be anything more than just a linear hash of the entire guest-visible
> >> contents.
> > 
> > Unfortunately I suspect that by far the easiest way for people who
> > host images to compute checksums is to run 'shaXXXsum' on them or sign
> > them with a GPG signature, rather than engaging in a novel hash
> > function.  Indeed that's what is happening now:
> > 
> > https://alt.fedoraproject.org/en/verify.html
> 
> If the output is produced with unordered writes, but the complete output
> needs to be verified with a hash *chain*, that still allows for some
> level of asynchrony. The start of the hashing need not be delayed until
> after the end of output, only after the start of output.
> 
> For example, nbdcopy could maintain the highest offset up to which the
> output is contiguous, and on a separate thread, it could be hashing the
> output up to that offset.
> 
> Considering a gigantic output, as yet unassembled blocks could likely
> not be buffered in memory (that's why the writes are unordered in the
> first place!), so the hashing thread would have to re-read the output
> via NBD. Whether that would cause performance to improve or to
> deteriorate is undecided IMO. If the far end of the output network block
> device can accommodate a reader that is independent of the writers, then
> this level of overlap is beneficial. Otherwise, this extra reader thread
> would just add more thrashing, and we'd be better off with a separate
> read-through once writing is complete.

In my mind I'm wondering if there's any mathematical result that lets
you combine each hash(block_i) into the final hash(block[1..N])
without needing to compute the hash of each block in order.

(This is what blkhash solves, but unfortunately the output isn't
compatible with standard hashes.)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] Checksums and other verification

2023-02-28 Thread Laszlo Ersek
On 2/27/23 17:44, Richard W.M. Jones wrote:
> On Mon, Feb 27, 2023 at 08:42:23AM -0600, Eric Blake wrote:
>> Or intentionally choose a hash that can be computed out-of-order, such
>> as a Merkle Tree.  But we'd need a standard setup for all parties to
>> agree on how the hash is to be computed and checked, if it is going to
>> be anything more than just a linear hash of the entire guest-visible
>> contents.
> 
> Unfortunately I suspect that by far the easiest way for people who
> host images to compute checksums is to run 'shaXXXsum' on them or sign
> them with a GPG signature, rather than engaging in a novel hash
> function.  Indeed that's what is happening now:
> 
> https://alt.fedoraproject.org/en/verify.html

If the output is produced with unordered writes, but the complete output
needs to be verified with a hash *chain*, that still allows for some
level of asynchrony. The start of the hashing need not be delayed until
after the end of output, only after the start of output.

For example, nbdcopy could maintain the highest offset up to which the
output is contiguous, and on a separate thread, it could be hashing the
output up to that offset.

Considering a gigantic output, as yet unassembled blocks could likely
not be buffered in memory (that's why the writes are unordered in the
first place!), so the hashing thread would have to re-read the output
via NBD. Whether that would cause performance to improve or to
deteriorate is undecided IMO. If the far end of the output network block
device can accommodate a reader that is independent of the writers, then
this level of overlap is beneficial. Otherwise, this extra reader thread
would just add more thrashing, and we'd be better off with a separate
read-through once writing is complete.

Laszlo
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] Checksums and other verification

2023-02-27 Thread Nir Soffer
On Mon, Feb 27, 2023 at 6:41 PM Richard W.M. Jones  wrote:
>
> On Mon, Feb 27, 2023 at 04:24:33PM +0200, Nir Soffer wrote:
> > On Mon, Feb 27, 2023 at 3:56 PM Richard W.M. Jones  
> > wrote:
> > >
> > >
> > > https://github.com/kubevirt/containerized-data-importer/issues/1520
> > >
> > > Hi Eric,
> > >
> > > We had a question from the Kubevirt team related to the above issue.
> > > The question is roughly if it's possible to calculate the checksum of
> > > an image as an nbdkit filter and/or in the qemu block layer.
> > >
> > > Supplemental #1: could qemu-img convert calculate a checksum as it goes
> > > along?
> > >
> > > Supplemental #2: could we detect various sorts of common errors, such
> > > a webserver that is incorrectly configured and serves up an error page
> > > containing ""; or something which is supposed to be a disk image
> > > but does not "look like" (in some ill-defined sense) a disk image,
> > > eg. it has no partition table.
> > >
> > > I'm not sure if qemu has any existing features covering the above (and
> > > I know for sure that nbdkit doesn't).
> > >
> > > One issue is that calculating a checksum involves a linear scan of the
> > > image, although we can at least skip holes.
> >
> > Kubvirt can use blksum
> > https://fosdem.org/2023/schedule/event/vai_blkhash_fast_disk/
> >
> > But we need to package it for Fedora/CentOS Stream.
> >
> > I also work on "qemu-img checksum", getting more reviews on this can help:
> > Lastest version:
> > https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00971.html
> > Last reveiw are here:
> > https://lists.nongnu.org/archive/html/qemu-block/2022-12/
> >
> > More work is needed on the testing framework changes.
>
> I think it would be more useful if (or in addition) it could compute
> the checksum of a stream which is being converted with 'qemu-img
> convert'.  Extra points if it can compute the checksum over either the
> input or output stream.

I thought about this, it could be a filter that you add in the graph
that gives you checksum as a side effect of copying. But this requires
disabling unordered writes, which is pretty bad for performance.

But even if you compute the checksum during a transfer, you want to
verify it by reading the transferred data from storage. Once you computed
the checksum you can keep it for verifying the same image in the future.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] Checksums and other verification

2023-02-27 Thread Richard W.M. Jones
On Mon, Feb 27, 2023 at 08:42:23AM -0600, Eric Blake wrote:
> Or intentionally choose a hash that can be computed out-of-order, such
> as a Merkle Tree.  But we'd need a standard setup for all parties to
> agree on how the hash is to be computed and checked, if it is going to
> be anything more than just a linear hash of the entire guest-visible
> contents.

Unfortunately I suspect that by far the easiest way for people who
host images to compute checksums is to run 'shaXXXsum' on them or sign
them with a GPG signature, rather than engaging in a novel hash
function.  Indeed that's what is happening now:

https://alt.fedoraproject.org/en/verify.html

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] Checksums and other verification

2023-02-27 Thread Richard W.M. Jones
On Mon, Feb 27, 2023 at 04:24:33PM +0200, Nir Soffer wrote:
> On Mon, Feb 27, 2023 at 3:56 PM Richard W.M. Jones  wrote:
> >
> >
> > https://github.com/kubevirt/containerized-data-importer/issues/1520
> >
> > Hi Eric,
> >
> > We had a question from the Kubevirt team related to the above issue.
> > The question is roughly if it's possible to calculate the checksum of
> > an image as an nbdkit filter and/or in the qemu block layer.
> >
> > Supplemental #1: could qemu-img convert calculate a checksum as it goes
> > along?
> >
> > Supplemental #2: could we detect various sorts of common errors, such
> > a webserver that is incorrectly configured and serves up an error page
> > containing ""; or something which is supposed to be a disk image
> > but does not "look like" (in some ill-defined sense) a disk image,
> > eg. it has no partition table.
> >
> > I'm not sure if qemu has any existing features covering the above (and
> > I know for sure that nbdkit doesn't).
> >
> > One issue is that calculating a checksum involves a linear scan of the
> > image, although we can at least skip holes.
> 
> Kubvirt can use blksum
> https://fosdem.org/2023/schedule/event/vai_blkhash_fast_disk/
> 
> But we need to package it for Fedora/CentOS Stream.
> 
> I also work on "qemu-img checksum", getting more reviews on this can help:
> Lastest version:
> https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00971.html
> Last reveiw are here:
> https://lists.nongnu.org/archive/html/qemu-block/2022-12/
> 
> More work is needed on the testing framework changes.

I think it would be more useful if (or in addition) it could compute
the checksum of a stream which is being converted with 'qemu-img
convert'.  Extra points if it can compute the checksum over either the
input or output stream.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] Checksums and other verification

2023-02-27 Thread Eric Blake
On Mon, Feb 27, 2023 at 01:56:26PM +, Richard W.M. Jones wrote:
> 
> https://github.com/kubevirt/containerized-data-importer/issues/1520
> 
> Hi Eric,
> 
> We had a question from the Kubevirt team related to the above issue.
> The question is roughly if it's possible to calculate the checksum of
> an image as an nbdkit filter and/or in the qemu block layer.

In the qemu block layer - yes: see Nir's https://gitlab.com/nirs/blkhash

Note that there is a huge difference between a block-based checksum (a
checksum of the block data the guest will see) and a checksum of the
original file (bytes as visible on the source, although with non-raw
files, more than one image may hash to the same guest-visible contents
despite having different host checksums).

Also, it may prove to be more efficient to generate a Merkle Tree hash
of an image (an image is divided into smaller portions in a
binary-tree fanout, where the hash of the entire image is computed by
combining hashes of child nodes up to the root of the tree - which
allows downloading blocks out of order).  [You may be more familiar
with Merkle Trees than you realize - every git commit id is ultimately
a Merkle Tree hash of all prior commits]

As for nbdkit being able to do hashing as a filter, we don't have such
a filter now, but I think it would be technically possible to
implement one.  The trickiest part would be figuring out a way to
expose the checksum to the client once the client has finally read
through the entire image.  It would be easy to have nbdkit output the
resulting hash in a secondary file for consumption by the end client,
harder but potentially more useful would be extending the NBD protocol
itself to allow the NBD client to issue a query to the server to
provide the hash directly (or an indication that the hash is not yet
known because not all blocks have been hashed yet).

> 
> Supplemental #1: could qemu-img convert calculate a checksum as it goes
> along?

Nir's work on blkhash seems like that is doable.

> 
> Supplemental #2: could we detect various sorts of common errors, such
> a webserver that is incorrectly configured and serves up an error page
> containing ""; or something which is supposed to be a disk image
> but does not "look like" (in some ill-defined sense) a disk image,
> eg. it has no partition table.
> 
> I'm not sure if qemu has any existing features covering the above (and
> I know for sure that nbdkit doesn't).

Indeed.  But adding a filter that does a pre-read of the plugin's
firsts 1M during .prepare to look for an expected signature (what is
sufficient, seeing if there is a partition table?) and refuses to let
the client connect if the plugin is serving wrong data seems fairly
straightforward.

> 
> One issue is that calculating a checksum involves a linear scan of the
> image, although we can at least skip holes.

Or intentionally choose a hash that can be computed out-of-order, such
as a Merkle Tree.  But we'd need a standard setup for all parties to
agree on how the hash is to be computed and checked, if it is going to
be anything more than just a linear hash of the entire guest-visible
contents.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] Checksums and other verification

2023-02-27 Thread Nir Soffer
On Mon, Feb 27, 2023 at 3:56 PM Richard W.M. Jones  wrote:
>
>
> https://github.com/kubevirt/containerized-data-importer/issues/1520
>
> Hi Eric,
>
> We had a question from the Kubevirt team related to the above issue.
> The question is roughly if it's possible to calculate the checksum of
> an image as an nbdkit filter and/or in the qemu block layer.
>
> Supplemental #1: could qemu-img convert calculate a checksum as it goes
> along?
>
> Supplemental #2: could we detect various sorts of common errors, such
> a webserver that is incorrectly configured and serves up an error page
> containing ""; or something which is supposed to be a disk image
> but does not "look like" (in some ill-defined sense) a disk image,
> eg. it has no partition table.
>
> I'm not sure if qemu has any existing features covering the above (and
> I know for sure that nbdkit doesn't).
>
> One issue is that calculating a checksum involves a linear scan of the
> image, although we can at least skip holes.

Kubvirt can use blksum
https://fosdem.org/2023/schedule/event/vai_blkhash_fast_disk/

But we need to package it for Fedora/CentOS Stream.

I also work on "qemu-img checksum", getting more reviews on this can help:
Lastest version:
https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00971.html
Last reveiw are here:
https://lists.nongnu.org/archive/html/qemu-block/2022-12/

More work is needed on the testing framework changes.

Nir

___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


[Libguestfs] Checksums and other verification

2023-02-27 Thread Richard W.M. Jones


https://github.com/kubevirt/containerized-data-importer/issues/1520

Hi Eric,

We had a question from the Kubevirt team related to the above issue.
The question is roughly if it's possible to calculate the checksum of
an image as an nbdkit filter and/or in the qemu block layer.

Supplemental #1: could qemu-img convert calculate a checksum as it goes
along?

Supplemental #2: could we detect various sorts of common errors, such
a webserver that is incorrectly configured and serves up an error page
containing ""; or something which is supposed to be a disk image
but does not "look like" (in some ill-defined sense) a disk image,
eg. it has no partition table.

I'm not sure if qemu has any existing features covering the above (and
I know for sure that nbdkit doesn't).

One issue is that calculating a checksum involves a linear scan of the
image, although we can at least skip holes.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs