Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Sat, Jun 12, 2021 at 12:23:06AM +0300, Nir Soffer wrote: > > Otherwise, you do have a point: "depth":1 in isolation is ambiguous > > between "not allocated anywhere in this 1-element chain" and > > "allocated at the first backing file in this chain of length 2 or > > more". At which point you can indeed use "qemu-img info" to determine > > the backing chain depth. How painful is that extra step? Does it > > justify the addition of a new optional "backing":true to any portion > > of the file that was beyond the end of the chain (and omit that line > > for all other regions, rather than printing "backing":false)? > > Dealing with depth: N + 1 is not that painful, but also not great. > > I think it is worth a little more effort, and it will save time in the long > term > for users and for developers. Better APIs need simpler and shorter > documentation and require less support. > > I'm not sure about backing: false, maybe absent: true to match libnbd? In the patch [1], I did "backing":true if the cluster was not found in the chain, and omitted the bool altogether when the cluster is present. If we like the name "absent":true better than "backing":true, that's an easy change. The libnbd change for nbdinfo to report 'absent' instead of 'unallocated' has not yet been released, so we have some leeway on naming choices. [1] https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg03067.html -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Fri, Jun 11, 2021 at 9:34 PM Eric Blake wrote: > > On Fri, Jun 11, 2021 at 08:35:01PM +0300, Nir Soffer wrote: > > On Fri, Jun 11, 2021 at 4:28 PM Eric Blake wrote: > > > > > > On Fri, Jun 11, 2021 at 10:09:09AM +0200, Kevin Wolf wrote: > > > > > Yes, that might work as well. But we didn't previously document > > > > > depth to be optional. Removing something from output risks breaking > > > > > more downstream tools that expect it to be non-optional, compared to > > > > > providing a new value. > > > > > > > > A negative value isn't any less unexpected than a missing key. I don't > > > > think any existing tool would be able to handle it. Encoding different > > > > meanings in a single value isn't very QAPI-like either. Usually strings > > > > that are parsed are the problem, but negative integers really isn't that > > > > much different. I don't really like this solution. > > > > > > > > Leaving out the depth feels like a better suggestion to me. > > > > > > > > But anyway, this seems to only happen at the end of the backing chain. > > > > So if the backing chain consistents of n images, why not report 'depth': > > > > n + 1? So, in the above example, you would get 1. I think this has the > > > > best chances of tools actually working correctly with the new output, > > > > even though it's still not unlikely to break something. > > > > > > Ooh, I like that. It is closer to reality - the file data really > > > comes from the next depth, even if we have no filename at that depth. > > > v2 of my patch coming up. > > > > How do you know the number of the layer? this info is not presented in > > qemu-img map output. ... > Otherwise, you do have a point: "depth":1 in isolation is ambiguous > between "not allocated anywhere in this 1-element chain" and > "allocated at the first backing file in this chain of length 2 or > more". At which point you can indeed use "qemu-img info" to determine > the backing chain depth. How painful is that extra step? Does it > justify the addition of a new optional "backing":true to any portion > of the file that was beyond the end of the chain (and omit that line > for all other regions, rather than printing "backing":false)? Dealing with depth: N + 1 is not that painful, but also not great. I think it is worth a little more effort, and it will save time in the long term for users and for developers. Better APIs need simpler and shorter documentation and require less support. I'm not sure about backing: false, maybe absent: true to match libnbd? Nir
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Fri, Jun 11, 2021 at 08:35:01PM +0300, Nir Soffer wrote: > On Fri, Jun 11, 2021 at 4:28 PM Eric Blake wrote: > > > > On Fri, Jun 11, 2021 at 10:09:09AM +0200, Kevin Wolf wrote: > > > > Yes, that might work as well. But we didn't previously document > > > > depth to be optional. Removing something from output risks breaking > > > > more downstream tools that expect it to be non-optional, compared to > > > > providing a new value. > > > > > > A negative value isn't any less unexpected than a missing key. I don't > > > think any existing tool would be able to handle it. Encoding different > > > meanings in a single value isn't very QAPI-like either. Usually strings > > > that are parsed are the problem, but negative integers really isn't that > > > much different. I don't really like this solution. > > > > > > Leaving out the depth feels like a better suggestion to me. > > > > > > But anyway, this seems to only happen at the end of the backing chain. > > > So if the backing chain consistents of n images, why not report 'depth': > > > n + 1? So, in the above example, you would get 1. I think this has the > > > best chances of tools actually working correctly with the new output, > > > even though it's still not unlikely to break something. > > > > Ooh, I like that. It is closer to reality - the file data really > > comes from the next depth, even if we have no filename at that depth. > > v2 of my patch coming up. > > How do you know the number of the layer? this info is not presented in > qemu-img map output. qemu-img map has two output formats. In --output=human, areas of the disk reading as zero are elided (and this happens to include ALL areas that were not allocated anywhere in the chain); all other areas list the filename of the element in the chain where the data was found. This mode also fails if compression or encryption prevents easy access to actual data. In other words, it's fragile, so no one uses it for anything programmatic, even though it's the default. In --output=json, no file names are output. Instead, "depth":N tells you how deep in the backing chain you must traverse to find the data. "depth":0 is obvious: the file you mapped (other than the bug that this patch is fixing where we mistakenly used "depth":0 also for unallocated regions). If you use "backing":null to force a 1-layer depth, then "depth":1 is unambiguous meaning the (non-present) backing file. Otherwise, you do have a point: "depth":1 in isolation is ambiguous between "not allocated anywhere in this 1-element chain" and "allocated at the first backing file in this chain of length 2 or more". At which point you can indeed use "qemu-img info" to determine the backing chain depth. How painful is that extra step? Does it justify the addition of a new optional "backing":true to any portion of the file that was beyond the end of the chain (and omit that line for all other regions, rather than printing "backing":false)? > > Users will have to run "qemu-img info --backing-chain" to understand the > output of qemu-img map. At any rate, it should be easy enough to output an additional field, followup patch coming soon... -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Fri, Jun 11, 2021 at 4:28 PM Eric Blake wrote: > > On Fri, Jun 11, 2021 at 10:09:09AM +0200, Kevin Wolf wrote: > > > Yes, that might work as well. But we didn't previously document > > > depth to be optional. Removing something from output risks breaking > > > more downstream tools that expect it to be non-optional, compared to > > > providing a new value. > > > > A negative value isn't any less unexpected than a missing key. I don't > > think any existing tool would be able to handle it. Encoding different > > meanings in a single value isn't very QAPI-like either. Usually strings > > that are parsed are the problem, but negative integers really isn't that > > much different. I don't really like this solution. > > > > Leaving out the depth feels like a better suggestion to me. > > > > But anyway, this seems to only happen at the end of the backing chain. > > So if the backing chain consistents of n images, why not report 'depth': > > n + 1? So, in the above example, you would get 1. I think this has the > > best chances of tools actually working correctly with the new output, > > even though it's still not unlikely to break something. > > Ooh, I like that. It is closer to reality - the file data really > comes from the next depth, even if we have no filename at that depth. > v2 of my patch coming up. How do you know the number of the layer? this info is not presented in qemu-img map output. Users will have to run "qemu-img info --backing-chain" to understand the output of qemu-img map.
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Fri, Jun 11, 2021 at 01:21:45PM +0200, Kevin Wolf wrote: > > Did you consider just add a new field? > > > > So, "depth" keeps its meaning "which level provides data". > > > > And we add additional optional field like > > > > absolutely-completely-absent: bool > > > > Which is true if data is nowhere in the backing chain. > > Or how about exposing BDRV_BLOCK_ALLOCATED as 'allocated': 'bool'? Which > I think is what the conclusion was already for NBD, so doing the same in > 'qemu-img map' would be consistent. > > This is, of course, almost the same as 'absolutely-completely-absent', > just without the negating the flag. If we want to bikeshed on a new name, I think "allocated" is going to cause more confusion than it solves. And "hole" is wrong. Better would be "backing":true for portions of the file that would derive from a backing file, if a backing file had been present. But that still feels like more work than just exposing n+1 in depth. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Fri, Jun 11, 2021 at 10:09:09AM +0200, Kevin Wolf wrote: > > Yes, that might work as well. But we didn't previously document > > depth to be optional. Removing something from output risks breaking > > more downstream tools that expect it to be non-optional, compared to > > providing a new value. > > A negative value isn't any less unexpected than a missing key. I don't > think any existing tool would be able to handle it. Encoding different > meanings in a single value isn't very QAPI-like either. Usually strings > that are parsed are the problem, but negative integers really isn't that > much different. I don't really like this solution. > > Leaving out the depth feels like a better suggestion to me. > > But anyway, this seems to only happen at the end of the backing chain. > So if the backing chain consistents of n images, why not report 'depth': > n + 1? So, in the above example, you would get 1. I think this has the > best chances of tools actually working correctly with the new output, > even though it's still not unlikely to break something. Ooh, I like that. It is closer to reality - the file data really comes from the next depth, even if we have no filename at that depth. v2 of my patch coming up. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
11.06.2021 14:21, Kevin Wolf wrote: Am 11.06.2021 um 10:14 hat Vladimir Sementsov-Ogievskiy geschrieben: 11.06.2021 11:09, Kevin Wolf wrote: Am 10.06.2021 um 22:46 hat Eric Blake geschrieben: On Thu, Jun 10, 2021 at 11:09:05PM +0300, Nir Soffer wrote: But: $ qemu-img map --output=json -f qcow2 json:'{"driver":"qcow2","backing":null, \ "file":{"driver":"file","filename":"top.qcow2"}}' [{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680}, { "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": false}] also reports the entire file at "depth":0, which is misleading, since we have just been arguing from the qemu:allocation-depth perspective (and also from bdrv_block_status) that the qcow2 image is NOT 100% allocated (in the sense where allocation == data comes locally). Perhaps it might be better if we tweaked the above qemu-img map to produce: [{ "start": 0, "length": 65536, "depth": -1, "zero": true, "data": false}, { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680}, { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 196608, "length": 65536, "depth": -1, "zero": true, "data": false}] It will be more consistent with "offset" to drop "depth" from output if we don't have it: [{ "start": 0, "length": 65536, "zero": true, "data": false}, { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680}, { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 196608, "length": 65536, "zero": true, "data": false}] Yes, that might work as well. But we didn't previously document depth to be optional. Removing something from output risks breaking more downstream tools that expect it to be non-optional, compared to providing a new value. A negative value isn't any less unexpected than a missing key. I don't think any existing tool would be able to handle it. Encoding different meanings in a single value isn't very QAPI-like either. Usually strings that are parsed are the problem, but negative integers really isn't that much different. I don't really like this solution. Leaving out the depth feels like a better suggestion to me. But anyway, this seems to only happen at the end of the backing chain. So if the backing chain consistents of n images, why not report 'depth': n + 1? So, in the above example, you would get 1. I think this has the best chances of tools actually working correctly with the new output, even though it's still not unlikely to break something. Did you consider just add a new field? So, "depth" keeps its meaning "which level provides data". And we add additional optional field like absolutely-completely-absent: bool Which is true if data is nowhere in the backing chain. Or how about exposing BDRV_BLOCK_ALLOCATED as 'allocated': 'bool'? Which I think is what the conclusion was already for NBD, so doing the same in 'qemu-img map' would be consistent. "allocated" is historically ambiguous: we never know exactly does it mean "occupy space on disk" or "data (or zeroes) taken from this qcow2 image, not from backing". Eric recently sent related patch to libnbd: [libnbd PATCH] info: Avoid ambiguous 'allocated' terminology in mapping This is, of course, almost the same as 'absolutely-completely-absent', just without the negating the flag. Kevin -- Best regards, Vladimir
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
Am 11.06.2021 um 10:14 hat Vladimir Sementsov-Ogievskiy geschrieben: > 11.06.2021 11:09, Kevin Wolf wrote: > > Am 10.06.2021 um 22:46 hat Eric Blake geschrieben: > > > On Thu, Jun 10, 2021 at 11:09:05PM +0300, Nir Soffer wrote: > > > > > But: > > > > > > > > > > $ qemu-img map --output=json -f qcow2 > > > > > json:'{"driver":"qcow2","backing":null, \ > > > > >"file":{"driver":"file","filename":"top.qcow2"}}' > > > > > [{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": > > > > > false}, > > > > > { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": > > > > > true, "offset": 327680}, > > > > > { "start": 131072, "length": 131072, "depth": 0, "zero": true, > > > > > "data": false}] > > > > > > > > > > also reports the entire file at "depth":0, which is misleading, since > > > > > we have just been arguing from the qemu:allocation-depth perspective > > > > > (and also from bdrv_block_status) that the qcow2 image is NOT 100% > > > > > allocated (in the sense where allocation == data comes locally). > > > > > Perhaps it might be better if we tweaked the above qemu-img map to > > > > > produce: > > > > > > > > > > [{ "start": 0, "length": 65536, "depth": -1, "zero": true, "data": > > > > > false}, > > > > > { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": > > > > > true, "offset": 327680}, > > > > > { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": > > > > > false}, > > > > > { "start": 196608, "length": 65536, "depth": -1, "zero": true, > > > > > "data": false}] > > > > > > > > It will be more consistent with "offset" to drop "depth" from output > > > > if we don't have it: > > > > > > > > [{ "start": 0, "length": 65536, "zero": true, "data": false}, > > > > { "start": 65536, "length": 65536, "depth": 0, "zero": false, > > > > "data": true, "offset": 327680}, > > > > { "start": 131072, "length": 65536, "depth": 0, "zero": true, > > > > "data": false}, > > > > { "start": 196608, "length": 65536, "zero": true, "data": false}] > > > > > > Yes, that might work as well. But we didn't previously document > > > depth to be optional. Removing something from output risks breaking > > > more downstream tools that expect it to be non-optional, compared to > > > providing a new value. > > > > A negative value isn't any less unexpected than a missing key. I don't > > think any existing tool would be able to handle it. Encoding different > > meanings in a single value isn't very QAPI-like either. Usually strings > > that are parsed are the problem, but negative integers really isn't that > > much different. I don't really like this solution. > > > > Leaving out the depth feels like a better suggestion to me. > > > > But anyway, this seems to only happen at the end of the backing chain. > > So if the backing chain consistents of n images, why not report 'depth': > > n + 1? So, in the above example, you would get 1. I think this has the > > best chances of tools actually working correctly with the new output, > > even though it's still not unlikely to break something. > > > > Did you consider just add a new field? > > So, "depth" keeps its meaning "which level provides data". > > And we add additional optional field like > > absolutely-completely-absent: bool > > Which is true if data is nowhere in the backing chain. Or how about exposing BDRV_BLOCK_ALLOCATED as 'allocated': 'bool'? Which I think is what the conclusion was already for NBD, so doing the same in 'qemu-img map' would be consistent. This is, of course, almost the same as 'absolutely-completely-absent', just without the negating the flag. Kevin
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
11.06.2021 12:05, Nir Soffer wrote: ב-11 ביוני 2021, בשעה 11:14, Vladimir Sementsov-Ogievskiy כתב/ה: 11.06.2021 11:09, Kevin Wolf wrote: Am 10.06.2021 um 22:46 hat Eric Blake geschrieben: On Thu, Jun 10, 2021 at 11:09:05PM +0300, Nir Soffer wrote: But: $ qemu-img map --output=json -f qcow2 json:'{"driver":"qcow2","backing":null, \ "file":{"driver":"file","filename":"top.qcow2"}}' [{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680}, { "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": false}] also reports the entire file at "depth":0, which is misleading, since we have just been arguing from the qemu:allocation-depth perspective (and also from bdrv_block_status) that the qcow2 image is NOT 100% allocated (in the sense where allocation == data comes locally). Perhaps it might be better if we tweaked the above qemu-img map to produce: [{ "start": 0, "length": 65536, "depth": -1, "zero": true, "data": false}, { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680}, { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 196608, "length": 65536, "depth": -1, "zero": true, "data": false}] It will be more consistent with "offset" to drop "depth" from output if we don't have it: [{ "start": 0, "length": 65536, "zero": true, "data": false}, { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680}, { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 196608, "length": 65536, "zero": true, "data": false}] Yes, that might work as well. But we didn't previously document depth to be optional. Removing something from output risks breaking more downstream tools that expect it to be non-optional, compared to providing a new value. A negative value isn't any less unexpected than a missing key. I don't think any existing tool would be able to handle it. Encoding different meanings in a single value isn't very QAPI-like either. Usually strings that are parsed are the problem, but negative integers really isn't that much different. I don't really like this solution. Leaving out the depth feels like a better suggestion to me. But anyway, this seems to only happen at the end of the backing chain. So if the backing chain consistents of n images, why not report 'depth': n + 1? So, in the above example, you would get 1. I think this has the best chances of tools actually working correctly with the new output, even though it's still not unlikely to break something. Did you consider just add a new field? So, "depth" keeps its meaning "which level provides data". And we add additional optional field like absolutely-completely-absent: bool hole: bool? That messes-up with file-posix holes which are UNALLOCATED_ZERO.. I think, we should somehow start to honestly report backing chains and use "backing" concept in interfaces.. maybe nobacking: bool May be true only together with data=false and zero=true, and means that all layers refer to backing for this region, but last layer just don't have backing image currently and therefore returns zeroes. -- Best regards, Vladimir
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
> ב-11 ביוני 2021, בשעה 11:14, Vladimir Sementsov-Ogievskiy > כתב/ה: > > 11.06.2021 11:09, Kevin Wolf wrote: >> Am 10.06.2021 um 22:46 hat Eric Blake geschrieben: On Thu, Jun 10, 2021 at 11:09:05PM +0300, Nir Soffer wrote: >> But: >> >> $ qemu-img map --output=json -f qcow2 >> json:'{"driver":"qcow2","backing":null, \ >> "file":{"driver":"file","filename":"top.qcow2"}}' >> [{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false}, >> { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": >> true, "offset": 327680}, >> { "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": >> false}] >> >> also reports the entire file at "depth":0, which is misleading, since >> we have just been arguing from the qemu:allocation-depth perspective >> (and also from bdrv_block_status) that the qcow2 image is NOT 100% >> allocated (in the sense where allocation == data comes locally). >> Perhaps it might be better if we tweaked the above qemu-img map to >> produce: >> >> [{ "start": 0, "length": 65536, "depth": -1, "zero": true, "data": >> false}, >> { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": >> true, "offset": 327680}, >> { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": >> false}, >> { "start": 196608, "length": 65536, "depth": -1, "zero": true, "data": >> false}] > > It will be more consistent with "offset" to drop "depth" from output > if we don't have it: > > [{ "start": 0, "length": 65536, "zero": true, "data": false}, > { "start": 65536, "length": 65536, "depth": 0, "zero": false, > "data": true, "offset": 327680}, > { "start": 131072, "length": 65536, "depth": 0, "zero": true, > "data": false}, > { "start": 196608, "length": 65536, "zero": true, "data": false}] >>> >>> Yes, that might work as well. But we didn't previously document >>> depth to be optional. Removing something from output risks breaking >>> more downstream tools that expect it to be non-optional, compared to >>> providing a new value. >> A negative value isn't any less unexpected than a missing key. I don't >> think any existing tool would be able to handle it. Encoding different >> meanings in a single value isn't very QAPI-like either. Usually strings >> that are parsed are the problem, but negative integers really isn't that >> much different. I don't really like this solution. >> Leaving out the depth feels like a better suggestion to me. >> But anyway, this seems to only happen at the end of the backing chain. >> So if the backing chain consistents of n images, why not report 'depth': >> n + 1? So, in the above example, you would get 1. I think this has the >> best chances of tools actually working correctly with the new output, >> even though it's still not unlikely to break something. > > Did you consider just add a new field? > > So, "depth" keeps its meaning "which level provides data". > > And we add additional optional field like > > absolutely-completely-absent: bool hole: bool? > > Which is true if data is nowhere in the backing chain. > > > -- > Best regards, > Vladimir
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
11.06.2021 11:09, Kevin Wolf wrote: Am 10.06.2021 um 22:46 hat Eric Blake geschrieben: On Thu, Jun 10, 2021 at 11:09:05PM +0300, Nir Soffer wrote: But: $ qemu-img map --output=json -f qcow2 json:'{"driver":"qcow2","backing":null, \ "file":{"driver":"file","filename":"top.qcow2"}}' [{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680}, { "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": false}] also reports the entire file at "depth":0, which is misleading, since we have just been arguing from the qemu:allocation-depth perspective (and also from bdrv_block_status) that the qcow2 image is NOT 100% allocated (in the sense where allocation == data comes locally). Perhaps it might be better if we tweaked the above qemu-img map to produce: [{ "start": 0, "length": 65536, "depth": -1, "zero": true, "data": false}, { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680}, { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 196608, "length": 65536, "depth": -1, "zero": true, "data": false}] It will be more consistent with "offset" to drop "depth" from output if we don't have it: [{ "start": 0, "length": 65536, "zero": true, "data": false}, { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680}, { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 196608, "length": 65536, "zero": true, "data": false}] Yes, that might work as well. But we didn't previously document depth to be optional. Removing something from output risks breaking more downstream tools that expect it to be non-optional, compared to providing a new value. A negative value isn't any less unexpected than a missing key. I don't think any existing tool would be able to handle it. Encoding different meanings in a single value isn't very QAPI-like either. Usually strings that are parsed are the problem, but negative integers really isn't that much different. I don't really like this solution. Leaving out the depth feels like a better suggestion to me. But anyway, this seems to only happen at the end of the backing chain. So if the backing chain consistents of n images, why not report 'depth': n + 1? So, in the above example, you would get 1. I think this has the best chances of tools actually working correctly with the new output, even though it's still not unlikely to break something. Did you consider just add a new field? So, "depth" keeps its meaning "which level provides data". And we add additional optional field like absolutely-completely-absent: bool Which is true if data is nowhere in the backing chain. -- Best regards, Vladimir
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
Am 10.06.2021 um 22:46 hat Eric Blake geschrieben: > On Thu, Jun 10, 2021 at 11:09:05PM +0300, Nir Soffer wrote: > > > But: > > > > > > $ qemu-img map --output=json -f qcow2 > > > json:'{"driver":"qcow2","backing":null, \ > > > "file":{"driver":"file","filename":"top.qcow2"}}' > > > [{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false}, > > > { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": > > > true, "offset": 327680}, > > > { "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": > > > false}] > > > > > > also reports the entire file at "depth":0, which is misleading, since > > > we have just been arguing from the qemu:allocation-depth perspective > > > (and also from bdrv_block_status) that the qcow2 image is NOT 100% > > > allocated (in the sense where allocation == data comes locally). > > > Perhaps it might be better if we tweaked the above qemu-img map to > > > produce: > > > > > > [{ "start": 0, "length": 65536, "depth": -1, "zero": true, "data": false}, > > > { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": > > > true, "offset": 327680}, > > > { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": > > > false}, > > > { "start": 196608, "length": 65536, "depth": -1, "zero": true, "data": > > > false}] > > > > It will be more consistent with "offset" to drop "depth" from output > > if we don't have it: > > > > [{ "start": 0, "length": 65536, "zero": true, "data": false}, > > { "start": 65536, "length": 65536, "depth": 0, "zero": false, > > "data": true, "offset": 327680}, > > { "start": 131072, "length": 65536, "depth": 0, "zero": true, > > "data": false}, > > { "start": 196608, "length": 65536, "zero": true, "data": false}] > > Yes, that might work as well. But we didn't previously document > depth to be optional. Removing something from output risks breaking > more downstream tools that expect it to be non-optional, compared to > providing a new value. A negative value isn't any less unexpected than a missing key. I don't think any existing tool would be able to handle it. Encoding different meanings in a single value isn't very QAPI-like either. Usually strings that are parsed are the problem, but negative integers really isn't that much different. I don't really like this solution. Leaving out the depth feels like a better suggestion to me. But anyway, this seems to only happen at the end of the backing chain. So if the backing chain consistents of n images, why not report 'depth': n + 1? So, in the above example, you would get 1. I think this has the best chances of tools actually working correctly with the new output, even though it's still not unlikely to break something. Kevin
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Thu, Jun 10, 2021 at 11:09:05PM +0300, Nir Soffer wrote: > > But: > > > > $ qemu-img map --output=json -f qcow2 > > json:'{"driver":"qcow2","backing":null, \ > > "file":{"driver":"file","filename":"top.qcow2"}}' > > [{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false}, > > { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, > > "offset": 327680}, > > { "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": > > false}] > > > > also reports the entire file at "depth":0, which is misleading, since > > we have just been arguing from the qemu:allocation-depth perspective > > (and also from bdrv_block_status) that the qcow2 image is NOT 100% > > allocated (in the sense where allocation == data comes locally). > > Perhaps it might be better if we tweaked the above qemu-img map to > > produce: > > > > [{ "start": 0, "length": 65536, "depth": -1, "zero": true, "data": false}, > > { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, > > "offset": 327680}, > > { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": > > false}, > > { "start": 196608, "length": 65536, "depth": -1, "zero": true, "data": > > false}] > > It will be more consistent with "offset" to drop "depth" from output > if we don't have it: > > [{ "start": 0, "length": 65536, "zero": true, "data": false}, > { "start": 65536, "length": 65536, "depth": 0, "zero": false, > "data": true, "offset": 327680}, > { "start": 131072, "length": 65536, "depth": 0, "zero": true, > "data": false}, > { "start": 196608, "length": 65536, "zero": true, "data": false}] Yes, that might work as well. But we didn't previously document depth to be optional. Removing something from output risks breaking more downstream tools that expect it to be non-optional, compared to providing a new value. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Thu, Jun 10, 2021 at 9:35 PM Eric Blake wrote: > > On Tue, Jun 08, 2021 at 07:38:10PM +0300, Nir Soffer wrote: > > The example I provided was not detailed enough, what we actually do is: > > > > qemu-nbd .. 'json:{"driver": "qcow2", "backing": null, "file": > > {"driver": "file", "filename": "top.qcow2"}}' > > > > So there is no backing chain and allocation depth is not relevant. > > - Allocated areas should be reported with flags 0 > > - Zero areas which are not holes should be reported as NBD_STATE_ZERO > > - Zero areas which are holes (not allocated in this image) should be > > reported as NBD_STATE_HOLE > > Thinking about this a bit more, here's something I noticed: > > $ qemu-img map --output=json -f raw base.raw > [{ "start": 0, "length": 196608, "depth": 0, "zero": false, "data": true, > "offset": 0}, > { "start": 196608, "length": 65536, "depth": 0, "zero": true, "data": false, > "offset": 196608}] > > which matches what I've said elsewhere in this thread: the entire > image is reported as "depth":0 because the raw file is responsible for > 100% of the content. > > But: > > $ qemu-img map --output=json -f qcow2 json:'{"driver":"qcow2","backing":null, > \ > "file":{"driver":"file","filename":"top.qcow2"}}' > [{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false}, > { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, > "offset": 327680}, > { "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": false}] > > also reports the entire file at "depth":0, which is misleading, since > we have just been arguing from the qemu:allocation-depth perspective > (and also from bdrv_block_status) that the qcow2 image is NOT 100% > allocated (in the sense where allocation == data comes locally). > Perhaps it might be better if we tweaked the above qemu-img map to > produce: > > [{ "start": 0, "length": 65536, "depth": -1, "zero": true, "data": false}, > { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, > "offset": 327680}, > { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": false}, > { "start": 196608, "length": 65536, "depth": -1, "zero": true, "data": false}] It will be more consistent with "offset" to drop "depth" from output if we don't have it: [{ "start": 0, "length": 65536, "zero": true, "data": false}, { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680}, { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 196608, "length": 65536, "zero": true, "data": false}]
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Tue, Jun 08, 2021 at 07:38:10PM +0300, Nir Soffer wrote: > The example I provided was not detailed enough, what we actually do is: > > qemu-nbd .. 'json:{"driver": "qcow2", "backing": null, "file": > {"driver": "file", "filename": "top.qcow2"}}' > > So there is no backing chain and allocation depth is not relevant. > - Allocated areas should be reported with flags 0 > - Zero areas which are not holes should be reported as NBD_STATE_ZERO > - Zero areas which are holes (not allocated in this image) should be > reported as NBD_STATE_HOLE Thinking about this a bit more, here's something I noticed: $ qemu-img map --output=json -f raw base.raw [{ "start": 0, "length": 196608, "depth": 0, "zero": false, "data": true, "offset": 0}, { "start": 196608, "length": 65536, "depth": 0, "zero": true, "data": false, "offset": 196608}] which matches what I've said elsewhere in this thread: the entire image is reported as "depth":0 because the raw file is responsible for 100% of the content. But: $ qemu-img map --output=json -f qcow2 json:'{"driver":"qcow2","backing":null, \ "file":{"driver":"file","filename":"top.qcow2"}}' [{ "start": 0, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680}, { "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": false}] also reports the entire file at "depth":0, which is misleading, since we have just been arguing from the qemu:allocation-depth perspective (and also from bdrv_block_status) that the qcow2 image is NOT 100% allocated (in the sense where allocation == data comes locally). Perhaps it might be better if we tweaked the above qemu-img map to produce: [{ "start": 0, "length": 65536, "depth": -1, "zero": true, "data": false}, { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": true, "offset": 327680}, { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": false}, { "start": 196608, "length": 65536, "depth": -1, "zero": true, "data": false}] that is, use "depth":-1 to explicitly denote portions of a qcow2 file which are NOT provided locally, and which are not found anywhere in the backing chain. In other words, make it explicit in qemu-img map output what is possible with qemu:allocation-depth==0. Or tweak it slightly to mean that "depth":-1 corresponds to "cluster is not provided by the current layer, but we could not determine if it is provided by a particular backing layer or if it was unallocated overall". Then positive depth means we know which point in the backing chain we deferred to, 0 is local, and negative depth means that we defer to a backing layer (but could not report WHICH layer, if any). This tweak would make it easier for my thoughts of having qemu NBD clients automatically request qemu:allocation-depth without having to resort to x-dirty-bitmap hacks, and still be able to expose the information via qemu-img map. I'm off to another round of code hacking to see how it looks... -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Tue, Jun 8, 2021 at 9:46 PM Eric Blake wrote: > > On Tue, Jun 08, 2021 at 07:38:10PM +0300, Nir Soffer wrote: > > On Tue, Jun 8, 2021 at 12:22 AM Eric Blake wrote: > > > > > > On Mon, Jun 07, 2021 at 11:22:04PM +0300, Nir Soffer wrote: > > > > When zeroing a cluster in an image with backing file, qemu-img and > > > > qemu-nbd reported the area as a hole. This does not affect the guest > > > > since the area is read as zero, but breaks code trying to reconstruct > > > > the image chain based on qemu-img map or qemu-nbd block status response. > > > > > > Trying to reconstruct the image chain based on qemu-nbd block status > > > should not be attempted on just base:allocation data, but should also > > > take into account qemu:allocation-depth. > > > > This is correct when looking at the entire chain, but when we reconstruct > > image data, we copy each image in the layer *without* the backing chain. > > > > The example I provided was not detailed enough, what we actually do is: > > > > qemu-nbd .. 'json:{"driver": "qcow2", "backing": null, "file": > > {"driver": "file", "filename": "top.qcow2"}}' > > > > So there is no backing chain and allocation depth is not relevant. > > - Allocated areas should be reported with flags 0 > > - Zero areas which are not holes should be reported as NBD_STATE_ZERO > > - Zero areas which are holes (not allocated in this image) should be > > reported as NBD_STATE_HOLE > > Again, what you WANT is qemu:allocation-depth. > > $ ./qemu-nbd -r -t -f qcow2 -A 'json:{"driver":"qcow2", "backing":null, \ > "file":{"driver":"file", "filename":"top.qcow2"}}' > $ nbdinfo --map=qemu:allocation-depth nbd://localhost > 0 655360 unallocated > 65536 1310721 local > 196608 655360 unallocated > > $ nbdinfo --map nbd://localhost > 0 655363 hole,zero > 65536 655360 allocated > 131072 1310723 hole,zero > > You don't care whether the information reads as zero or not, but > whether top.qcow2 is responsible for the data at that cluster. > base:allocation does not answer that question. But > qemu:allocation-depth answers it perfectly. > > > > > > From the perspective of the > > > core NBD protocol, there is no backing file, so trying to guess what > > > the backing file contains without using qemu extensions is unlikely to > > > be correct, as shown in your example. The fact that you could abuse > > > it with qemu 5.2 but it broke in 6.0 > > > > I'm not abusing anything, I'm only using public APIs. qemu-nbd behavior > > should not change without good reason, and we did not have any good > > reason to change the behavior for qcow2 images. > > Ah, but we did. Exposing BDRV_BLOCK_ALLOCATED as server, but > consuming it as BDRV_BLOCK_DATA as client, was inconsistent. It was a > bug that we ever used BLOCK_ALLOCATED in the first place, when it has > _always_ been that the NBD semantics were supposed to be modeled on > our definition of BLOCK_DATA. That it took us a couple of years to > notice our bug is unfortunate, but we DO have a good reason for the > change - we were fixing an actual bug where we were reporting > incorrect information compared to what the NBD spec was documenting. > > > > > > is not necessarily the sign of a > > > regression in 6.0, but rather could be evidence that you have been > > > trying to use an undocumented implementation quirk rather than a > > > stable interface. > > > > I'm pretty convinced that this is a regression in qemu-nbd 6.0 since I > > created > > this regression :-) > > I understand that you were surprised by the ramifications of your > patch causing more changes than what you expected, but I still argue > that your patch was correct and that the decision to incorporate it > was intentional because it was the right thing to do. Papering over > the fallout for the sake of clients that should be using > qemu:allocation-depth instead does not seem like it is worth the > maintenance nightmare to me. > > > > > Since we started using qemu-nbd in 2018, qemu-nbd has always reported > > holes in qcow2 images, but not in raw files. We discussed this several > > times, > > and you explained that we have allocation information from qcow2, but not > > from raw format. > > > > My attempt to fix hole reporting in raw images has failed; reporting holes > > in > > raw images is nice to have, but it broke the behavior of qemu-nbd with qcow2 > > images, which is a critical issue for ovirt. > > Rather, ovirt had been relying on buggy behavior, and now that the bug > has been fixed, we are scrambling to figure out how to make ovirt > still play nicely. But my answer to that is to use > qemu:allocation-depth. It was introduced in 5.2, so it predates the > point where base:allocation behavior was fixed, and it provides the > answer to the question you are really asking (which parts of my image > came from the image directly, rather than a backing file), rather than >
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Tue, Jun 08, 2021 at 07:38:10PM +0300, Nir Soffer wrote: > On Tue, Jun 8, 2021 at 12:22 AM Eric Blake wrote: > > > > On Mon, Jun 07, 2021 at 11:22:04PM +0300, Nir Soffer wrote: > > > When zeroing a cluster in an image with backing file, qemu-img and > > > qemu-nbd reported the area as a hole. This does not affect the guest > > > since the area is read as zero, but breaks code trying to reconstruct > > > the image chain based on qemu-img map or qemu-nbd block status response. > > > > Trying to reconstruct the image chain based on qemu-nbd block status > > should not be attempted on just base:allocation data, but should also > > take into account qemu:allocation-depth. > > This is correct when looking at the entire chain, but when we reconstruct > image data, we copy each image in the layer *without* the backing chain. > > The example I provided was not detailed enough, what we actually do is: > > qemu-nbd .. 'json:{"driver": "qcow2", "backing": null, "file": > {"driver": "file", "filename": "top.qcow2"}}' > > So there is no backing chain and allocation depth is not relevant. > - Allocated areas should be reported with flags 0 > - Zero areas which are not holes should be reported as NBD_STATE_ZERO > - Zero areas which are holes (not allocated in this image) should be > reported as NBD_STATE_HOLE Again, what you WANT is qemu:allocation-depth. $ ./qemu-nbd -r -t -f qcow2 -A 'json:{"driver":"qcow2", "backing":null, \ "file":{"driver":"file", "filename":"top.qcow2"}}' $ nbdinfo --map=qemu:allocation-depth nbd://localhost 0 655360 unallocated 65536 1310721 local 196608 655360 unallocated $ nbdinfo --map nbd://localhost 0 655363 hole,zero 65536 655360 allocated 131072 1310723 hole,zero You don't care whether the information reads as zero or not, but whether top.qcow2 is responsible for the data at that cluster. base:allocation does not answer that question. But qemu:allocation-depth answers it perfectly. > > > From the perspective of the > > core NBD protocol, there is no backing file, so trying to guess what > > the backing file contains without using qemu extensions is unlikely to > > be correct, as shown in your example. The fact that you could abuse > > it with qemu 5.2 but it broke in 6.0 > > I'm not abusing anything, I'm only using public APIs. qemu-nbd behavior > should not change without good reason, and we did not have any good > reason to change the behavior for qcow2 images. Ah, but we did. Exposing BDRV_BLOCK_ALLOCATED as server, but consuming it as BDRV_BLOCK_DATA as client, was inconsistent. It was a bug that we ever used BLOCK_ALLOCATED in the first place, when it has _always_ been that the NBD semantics were supposed to be modeled on our definition of BLOCK_DATA. That it took us a couple of years to notice our bug is unfortunate, but we DO have a good reason for the change - we were fixing an actual bug where we were reporting incorrect information compared to what the NBD spec was documenting. > > > is not necessarily the sign of a > > regression in 6.0, but rather could be evidence that you have been > > trying to use an undocumented implementation quirk rather than a > > stable interface. > > I'm pretty convinced that this is a regression in qemu-nbd 6.0 since I created > this regression :-) I understand that you were surprised by the ramifications of your patch causing more changes than what you expected, but I still argue that your patch was correct and that the decision to incorporate it was intentional because it was the right thing to do. Papering over the fallout for the sake of clients that should be using qemu:allocation-depth instead does not seem like it is worth the maintenance nightmare to me. > > Since we started using qemu-nbd in 2018, qemu-nbd has always reported > holes in qcow2 images, but not in raw files. We discussed this several times, > and you explained that we have allocation information from qcow2, but not > from raw format. > > My attempt to fix hole reporting in raw images has failed; reporting holes in > raw images is nice to have, but it broke the behavior of qemu-nbd with qcow2 > images, which is a critical issue for ovirt. Rather, ovirt had been relying on buggy behavior, and now that the bug has been fixed, we are scrambling to figure out how to make ovirt still play nicely. But my answer to that is to use qemu:allocation-depth. It was introduced in 5.2, so it predates the point where base:allocation behavior was fixed, and it provides the answer to the question you are really asking (which parts of my image came from the image directly, rather than a backing file), rather than merely an indirect answer (how can I abuse the determination of which parts of the image are allocated or sparse to imply that those same portions must come from a backing image). There is nothing semantically wrong with a sparse cluster in the top
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Tue, Jun 8, 2021 at 12:22 AM Eric Blake wrote: > > On Mon, Jun 07, 2021 at 11:22:04PM +0300, Nir Soffer wrote: > > When zeroing a cluster in an image with backing file, qemu-img and > > qemu-nbd reported the area as a hole. This does not affect the guest > > since the area is read as zero, but breaks code trying to reconstruct > > the image chain based on qemu-img map or qemu-nbd block status response. > > Trying to reconstruct the image chain based on qemu-nbd block status > should not be attempted on just base:allocation data, but should also > take into account qemu:allocation-depth. This is correct when looking at the entire chain, but when we reconstruct image data, we copy each image in the layer *without* the backing chain. The example I provided was not detailed enough, what we actually do is: qemu-nbd .. 'json:{"driver": "qcow2", "backing": null, "file": {"driver": "file", "filename": "top.qcow2"}}' So there is no backing chain and allocation depth is not relevant. - Allocated areas should be reported with flags 0 - Zero areas which are not holes should be reported as NBD_STATE_ZERO - Zero areas which are holes (not allocated in this image) should be reported as NBD_STATE_HOLE > From the perspective of the > core NBD protocol, there is no backing file, so trying to guess what > the backing file contains without using qemu extensions is unlikely to > be correct, as shown in your example. The fact that you could abuse > it with qemu 5.2 but it broke in 6.0 I'm not abusing anything, I'm only using public APIs. qemu-nbd behavior should not change without good reason, and we did not have any good reason to change the behavior for qcow2 images. > is not necessarily the sign of a > regression in 6.0, but rather could be evidence that you have been > trying to use an undocumented implementation quirk rather than a > stable interface. I'm pretty convinced that this is a regression in qemu-nbd 6.0 since I created this regression :-) Since we started using qemu-nbd in 2018, qemu-nbd has always reported holes in qcow2 images, but not in raw files. We discussed this several times, and you explained that we have allocation information from qcow2, but not from raw format. My attempt to fix hole reporting in raw images has failed; reporting holes in raw images is nice to have, but it broke the behavior of qemu-nbd with qcow2 images, which is a critical issue for ovirt. The code using this was tested and released 3-4 month ago. This was added to support backup vendors using snapshot based backup, so they can move to use the NBD based pipeline, which is safer than the old way, uploading qcow2 images directly to storage. If I revert: commit 0da9856851dcca09222a1467e16ddd05dc66e460 Author: Nir Soffer Date: Fri Feb 19 18:07:52 2021 +0200 nbd: server: Report holes for raw images qemu-nbd reports zeroed areas in a useful way like it always did: $ ./qemu-nbd -r -t 'json:{"driver": "qcow2", "backing": null, "file": {"driver": "file", "filename": "top.qcow2"}}' & $ nbdinfo --map nbd://localhost 0 655363 hole,zero 65536 655360 allocated 131072 655362 zero 196608 655363 hole,zero There is no need to use allocation depth info, the base:allocation works fine for this use case, and the output makes sense. > > Here is simpler reproducer: > > > > # Create a qcow2 image with a raw backing file: > > $ qemu-img create base.raw $((4*64*1024)) > > $ qemu-img create -f qcow2 -b base.raw -F raw top.qcow2 > > > > # Write to first 3 clusters of base: > > $ qemu-io -f raw -c "write -P 65 0 64k" base.raw > > $ qemu-io -f raw -c "write -P 66 64k 64k" base.raw > > $ qemu-io -f raw -c "write -P 67 128k 64k" base.raw > > > > # Write to second cluster of top, hiding second cluster of base: > > $ qemu-io -f qcow2 -c "write -P 69 64k 64k" top.qcow2 > > > > # Write zeroes to third cluster of top, hiding third cluster of base: > > $ qemu-io -f qcow2 -c "write -z 128k 64k" top.qcow2 > > > > This creates: > > > > top: -D0- > > base: ABC- > > > > How current qemu-img and qemu-nbd report the state: > > > > $ qemu-img map --output json top.qcow2 > > [{ "start": 0, "length": 65536, "depth": 1, "zero": false, "data": > > true, "offset": 0}, > > { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": > > true, "offset": 327680}, > > { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": > > false}, > > { "start": 196608, "length": 65536, "depth": 1, "zero": true, "data": > > false, "offset": 196608}] > > Note how this one reports "depth":1 when the backing file is consulted... Yes, qemu-img includes enough info to tell about the status of the cluster, so we can keep it as is. > > $ qemu-nbd -r -t -f qcow2 top.qcow2 & > > $ qemu-img map --output json nbd://localhost > > [{ "start": 0, "length": 131072, "depth": 0, "zero": false, "data": >
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Mon, Jun 07, 2021 at 04:22:27PM -0500, Eric Blake wrote: [replying to myself] > > Here is simpler reproducer: > > > > # Create a qcow2 image with a raw backing file: > > $ qemu-img create base.raw $((4*64*1024)) > > $ qemu-img create -f qcow2 -b base.raw -F raw top.qcow2 > > > > # Write to first 3 clusters of base: > > $ qemu-io -f raw -c "write -P 65 0 64k" base.raw > > $ qemu-io -f raw -c "write -P 66 64k 64k" base.raw > > $ qemu-io -f raw -c "write -P 67 128k 64k" base.raw > > > > # Write to second cluster of top, hiding second cluster of base: > > $ qemu-io -f qcow2 -c "write -P 69 64k 64k" top.qcow2 > > > > # Write zeroes to third cluster of top, hiding third cluster of base: > > $ qemu-io -f qcow2 -c "write -z 128k 64k" top.qcow2 Aha. While reproducing this locally, I typoed this as 'write -z 12k 64k', which absolutely changes the map produced... > > $ ./qemu-nbd -r -t -f qcow2 top.qcow2 -A > $ nbdinfo --map=qemu:allocation-depth nbd://localhost > 0 1310721 local > 131072 1310722 backing depth 2 > > However, _that_ output looks odd - it claims that clusters 0 and 1 are > local, and 2 and 3 come from a backing file. Without reading code, I > would have expected something closer to the qcow2 view, claiming that > clusters 1 and 2 are local, while 0 and 3 come from a backing file (3 > could also be reported as unallocated, but only if you use a qcow2 as > the backing file instead of raw, since we have no easy way to > determine which holes map to file system allocations in raw files). and totally explains my confusion here. > > /me goes to debug... I'll need to reply in a later email when I've > spent more time on that. > After recreating the file properly, by writing the zeroes at 128k instead of 12k, I now see: $ nbdinfo --map=qemu:allocation-depth nbd://localhost 0 655362 backing depth 2 65536 1310721 local 196608 655362 backing depth 2 which is EXACTLY what I expected. And sufficient for you to recreate your backing chain: Cluster 0 is backing depth 2 + allocated, so it comes from the backing file; nothing to write in your replacement top.qcow2. Cluster 1 is local + allocated, so it comes from top.qcow2 and consists of actual data, definitely write that one. Cluster 2 is local + hole,zero, so it reads as zero, but comes from top.qcow2 without any allocation; when building your replacement .qcow2 file, you MUST write this cluster to match the local allocation and override anything being inherited from the backing file, but it is up to you whether you write it as allocated zeroes or as an unallocated but reads-as-zero cluster. Cluster 3 is backing depth 2 + hole,zero, which means it was read from the backing file, and you can safely omit it from your replacement top.qcow2. > In short, I agree that the current situation is awkward, but I'm not > sure that this patch is right. Rather, I'm wondering if we have a bug > in qemu:allocation-depth, and where once that is fixed, you should be > using that alongside base:allocation when deciding how to guess on how > to reconstruct a qcow2 backing chain using only information learned > over NBD. And since the problem was in my command line transcription skills, and not in qemu proper, I don't think we want this patch; rather, I feel we are better served if you could fix your downstream tooling to start using qemu:allocation-depth if you are trying to recreate which portions of a qcow2 file MUST be written in order for that qcow2 file backed by a different image to provide the same data as seen over NBD. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH] qemu-{img,nbd}: Don't report zeroed cluster as a hole
On Mon, Jun 07, 2021 at 11:22:04PM +0300, Nir Soffer wrote: > When zeroing a cluster in an image with backing file, qemu-img and > qemu-nbd reported the area as a hole. This does not affect the guest > since the area is read as zero, but breaks code trying to reconstruct > the image chain based on qemu-img map or qemu-nbd block status response. Trying to reconstruct the image chain based on qemu-nbd block status should not be attempted on just base:allocation data, but should also take into account qemu:allocation-depth. From the perspective of the core NBD protocol, there is no backing file, so trying to guess what the backing file contains without using qemu extensions is unlikely to be correct, as shown in your example. The fact that you could abuse it with qemu 5.2 but it broke in 6.0 is not necessarily the sign of a regression in 6.0, but rather could be evidence that you have been trying to use an undocumented implementation quirk rather than a stable interface. > > Here is simpler reproducer: > > # Create a qcow2 image with a raw backing file: > $ qemu-img create base.raw $((4*64*1024)) > $ qemu-img create -f qcow2 -b base.raw -F raw top.qcow2 > > # Write to first 3 clusters of base: > $ qemu-io -f raw -c "write -P 65 0 64k" base.raw > $ qemu-io -f raw -c "write -P 66 64k 64k" base.raw > $ qemu-io -f raw -c "write -P 67 128k 64k" base.raw > > # Write to second cluster of top, hiding second cluster of base: > $ qemu-io -f qcow2 -c "write -P 69 64k 64k" top.qcow2 > > # Write zeroes to third cluster of top, hiding third cluster of base: > $ qemu-io -f qcow2 -c "write -z 128k 64k" top.qcow2 > > This creates: > > top: -D0- > base: ABC- > > How current qemu-img and qemu-nbd report the state: > > $ qemu-img map --output json top.qcow2 > [{ "start": 0, "length": 65536, "depth": 1, "zero": false, "data": true, > "offset": 0}, > { "start": 65536, "length": 65536, "depth": 0, "zero": false, "data": > true, "offset": 327680}, > { "start": 131072, "length": 65536, "depth": 0, "zero": true, "data": > false}, > { "start": 196608, "length": 65536, "depth": 1, "zero": true, "data": > false, "offset": 196608}] Note how this one reports "depth":1 when the backing file is consulted... > > $ qemu-nbd -r -t -f qcow2 top.qcow2 & > $ qemu-img map --output json nbd://localhost > [{ "start": 0, "length": 131072, "depth": 0, "zero": false, "data": true, > "offset": 0}, > { "start": 131072, "length": 131072, "depth": 0, "zero": true, "data": > false, "offset": 131072}] ...but since NBD has no notion of a backing file, there is nothing that qemu can do to report depth information itself. If you want to reconstruct the backing chain, you should be able to further query qemu:allocation-depth, and piece the two queries together to get what you need: $ ./qemu-nbd -r -t -f qcow2 top.qcow2 -A $ nbdinfo --map=qemu:allocation-depth nbd://localhost 0 1310721 local 131072 1310722 backing depth 2 However, _that_ output looks odd - it claims that clusters 0 and 1 are local, and 2 and 3 come from a backing file. Without reading code, I would have expected something closer to the qcow2 view, claiming that clusters 1 and 2 are local, while 0 and 3 come from a backing file (3 could also be reported as unallocated, but only if you use a qcow2 as the backing file instead of raw, since we have no easy way to determine which holes map to file system allocations in raw files). /me goes to debug... I'll need to reply in a later email when I've spent more time on that. [Oh, and that reminds me, I would love to patch nbdinfo to let --map query all available contexts, not just base:allocation, without having to explicitly name alternative --map=FOO... But it missed today's stable release of libnbd 1.8] [The same information can be obtained via qemu-img using x-dirty-bitmap and --image-opts, but is so much more of a hack that for now I will just refer to iotest 309 instead of spelling it out here] > > $ nbdinfo --map nbd://localhost > 0 1310720 allocated > 131072 1310723 hole,zero This faithfully reflects what qemu-img saw, which is all the more the NBD protocol lets us send without the use of extensions like qemu:allocation-depth. > > The third extents is reported as a hole in both cases. In qmeu-nbd the qemu > cluster is merged with forth cluster which is actually a hole. > > This is incorrect since if it was a hole, the third cluster would be > exposed to the guest. Programs using qemu-nbd output to reconstruct the > image chain on other storage would be confused and copy only the first 2 > cluster. The results of this copy will be an image exposing the third > cluster from the base image, corrupting the guest data. This is where I disagree - if the NBD protocol exposed the notion of a backing file, then reporting a local hole should indeed imply