On Thu, 01/22 14:14, Max Reitz wrote: > On 2015-01-19 at 08:16, Jun Li wrote: > >On Thu, 01/15 13:47, Max Reitz wrote: > >>On 2015-01-03 at 07:23, Jun Li wrote: > >>>On Fri, 11/21 11:56, Max Reitz wrote: > >>>>So, as for what I think we do need to do when shrinking (and keep in mind: > >>>>The offset given to qcow2_truncate() is the guest size! NOT the host image > >>>>size!): > >>>> > >>>>(1) Determine the first L2 table and the first entry in the table which > >>>>will > >>>>lie beyond the new guest disk size. > >>>Here is not correct always. Due to the COW, using offset to calculate the > >>>first entry of the first L2 table will be incorrect. > >>Again: This is *not* about the host disk size or the host offset of some > >>cluster, but about the *guest* disk size. > >> > >>Let's make up an example. You have a 2 GB disk but you want to resize it to > >>1.25 GB. The cluster size is 64 kB, therefore we have 2 GB / 64 kB = 32,768 > >>data clusters (as long as there aren't any internal snapshots, which is a > >>prerequisite for resizing qcow2 images). > >> > >>Every L2 table contains 65,536 / 8 = 8,192 entries; there are thus 32,768 / > >>8,192 = 4 L2 tables. > >> > >>As you can see, one can directly derive the number of data clusters and L2 > >>tables from the guest disk size (as long as there aren't any internal > >>snapshots). > >> > >>So of course we can do the same for the target disk size: 1.25 GB / 64 kB = > >>20,480 data clusters; 20,480 / 8,192 = 2.5 L2 tables, therefore we need > >>three L2 tables but only half of the last one (4,096 entries). > >> > >Sorry, last time is my mis-understanding. If do not use qcow2_truncate(), I > >think don't existing above issue. > > > >For my original thought, I want to say: > >Sometimes the second L2 table will contain some entry, the pointer in this > >entry will point to a cluster which address is larger than 1.25 GB. > > Correct. > > >So if not use qcow2_truncate(), won't discard above cluster which address is > >larger than 1.25 GB.
Sorry, I do not express my meaning clearly. Here I want to say: As some entry(let call it entry1) will point to a cluster(let call it cluster1) which address is larger than 1.25GB, so if we use qcow2_truncate() and will discard this cluster1. So entry1 will have an error after cluster1 discard. If do not use qcow2_truncate(), so won't discard cluster1. > > I'm sorry, I can't really follow what you are trying to say here, so I'll > just try to reply with things that may or may not be what you wanted to talk > about. > > If you are using qemu-img resize and thus subsequently qcow2_truncate() to > shrink an image, you cannot expect the image to shrink to the specified file > length, for several reasons. > > First, if you shrink it to 1 GB, but only half of that is actually used, the > image might of course very well have a length below 1 GB. > > Second, there is metadata overhead. So if you are changing the guest disk > size to 1 GB (all of which is occupied), the host file size will exceed 1 GB > because of that overhead. > > Third, I keep repeating myself here, but file length is not file size. So > you may observe a file length of 10 GB or more because the clusters are > spread all over the image file. This is something we'd have to combat with > defragmentation; but the question is whether we really need to (see below > for more on that). The point is that it doesn't matter whether the image has > a file length of 10 GB; the file size will be around 1 GB anyway. > > >But I still have another worry. > > > >Suppose "virtual size" and "disk size" are all 2G. After we resize it to > >1.25G, seems we will get "virtual size" is 1.25G but "disk size" is still 2G > > No, it won't. I can prove it to you: Yes, you are right. I have double checked my PATCH v5. Seems I don't use qcow2_process_discards(and this function will call bdrv_discard() to discard cluster on host) in my patch v5. I will submit a new version of patch. Thanks. Regards, Jun Li > > $ qemu-img create -f qcow2 test.qcow2 64M > $ qemu-io -c 'write 0 64M' test.qcow2 > $ qemu-img info test.qcow2 > ... > disk size: 64M > ... > > Okay, so far it's just what we'd expect. Now let's implement my proposal for > truncation: Let's assume the image should be shrinked to 32 MB, so we > discard all clusters starting at 32 MB (guest offset) (which is 64 MB - 32 > MB = 32 MB of data): > > $ qemu-io -c 'discard 32M 32M' test.qcow2 > $ qemu-img info test.qcow2 > ... > disk size: 32M > ... > > Great! > > >if do not use "qcow2_truncate()" to truncate the file(Yes, I know use > >qcow2_truncate is not a resolution). This seems strange, not so perfect. > > > >>We know that every cluster references somewhere after that limit (that is, > >>every entry in the fourth L2 table and every entry starting with index 4,096 > >>in the third L2 table) is a data cluster with a guest offset somewhere > >>beyond 1.25 GB, so we don't need it anymore. > >> > >>Thus, we simply discard all those data clusters and after that we can > >>discard the fourth L2 table. That's it. > >> > >>If we really want to we can calculate the highest cluster host offset in use > >>and truncate the image accordingly. But that's optional, see the last point > >>in my "problems with this approach" list (having discarded the clusters > >>should save us all the space already). Furthermore, as I'm saying in that > >>list, to really solve this issue, we'd need qcow2 defragmentation. > >> > >Do we already have "qcow2 defragmentation" realization? > > No, we don't. The only way to defragment a qcow2 image right now is using > qemu-img convert to create a (defragmented) copy and then delete the old > image, which has the disadvantage of temporarily requiring double the disk > space and being an offline operation. > > So far, nobody has implemented online defragmentation, mainly for two > reasons: It would probably be pretty complicated (it'd probably need to be a > block job which links into a pretty low-level function provided by qcow2 > (defragment_some_clusters or something)) and second, so far there has been > little demand. Disk space is not an issue (as said before), because it > doesn't really matter to a modern file system whether your file has a length > of 100 MB of 100 GB; that's just some number. What really matters is how > much of that space is actually used; and if all unused clusters are > discarded, there won't be any space used for them (well, maybe there is some > metadata overhead, but that should be negligible). > > There are a couple of reasons why you'd want to defragment an image: > > First, it makes you feel better. I can relate to that, but it's not a real > reason. > > Second, it may improve performance: The guest may expect consecutive reads > to be fast; but if the clusters are sprinkled all over the host, consecutive > guest reads no longer necessarily translate to consecutive reads on the host > (same for writes, of course). Defragmentation would probably fix that, but > if you want to rely on this, you'd better use preallocated image files. > > Third, it looks better. People expect the file length to be raw indicator of > the file size. However, for me this is related to "it makes you feel > better", because this also is not a really good reason. > > Fourth, using a non-modern file system may let your file size explode > because suddenly, file length is actually equal to the file size. But I > think, in this case you should just use a better file system. > > I don't know whether "cp" copies holes in files; its manpage says it does > create sparse images, but I don't know how well it works; but I just assume > it works well enough. > > Max > > >Jun Li >