On Tue, Feb 28, 2023 at 12:24:04PM +0100, Laszlo Ersek wrote: > On 2/27/23 17:44, Richard W.M. Jones wrote: > > On Mon, Feb 27, 2023 at 08:42:23AM -0600, Eric Blake wrote: > >> Or intentionally choose a hash that can be computed out-of-order, such > >> as a Merkle Tree. But we'd need a standard setup for all parties to > >> agree on how the hash is to be computed and checked, if it is going to > >> be anything more than just a linear hash of the entire guest-visible > >> contents. > > > > Unfortunately I suspect that by far the easiest way for people who > > host images to compute checksums is to run 'shaXXXsum' on them or sign > > them with a GPG signature, rather than engaging in a novel hash > > function. Indeed that's what is happening now: > > > > https://alt.fedoraproject.org/en/verify.html > > If the output is produced with unordered writes, but the complete output > needs to be verified with a hash *chain*, that still allows for some > level of asynchrony. The start of the hashing need not be delayed until > after the end of output, only after the start of output. > > For example, nbdcopy could maintain the highest offset up to which the > output is contiguous, and on a separate thread, it could be hashing the > output up to that offset. > > Considering a gigantic output, as yet unassembled blocks could likely > not be buffered in memory (that's why the writes are unordered in the > first place!), so the hashing thread would have to re-read the output > via NBD. Whether that would cause performance to improve or to > deteriorate is undecided IMO. If the far end of the output network block > device can accommodate a reader that is independent of the writers, then > this level of overlap is beneficial. Otherwise, this extra reader thread > would just add more thrashing, and we'd be better off with a separate > read-through once writing is complete.
In my mind I'm wondering if there's any mathematical result that lets you combine each hash(block_i) into the final hash(block[1..N]) without needing to compute the hash of each block in order. (This is what blkhash solves, but unfortunately the output isn't compatible with standard hashes.) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs