On 04/18/2017 02:52 PM, Alberto Garcia wrote: > On Thu 13 Apr 2017 05:17:21 PM CEST, Denis V. Lunev wrote: >> On 04/13/2017 06:04 PM, Alberto Garcia wrote: >>> On Thu 13 Apr 2017 03:30:43 PM CEST, Denis V. Lunev wrote: >>>> Yes, block size should be increased. I perfectly in agreement with >>>> your. But I think that we could do that by plain increase of the >>>> cluster size without any further dances. Sub-clusters as sub-clusters >>>> will help if we are able to avoid COW. With COW I do not see much >>>> difference. >>> I'm trying to summarize your position, tell me if I got everything >>> correctly: >>> >>> 1. We should try to reduce data fragmentation on the qcow2 file, >>> because it will have a long term effect on the I/O performance (as >>> opposed to an effect on the initial operations on the empty image). >> yes >> >>> 2. The way to do that is to increase the cluster size (to 1MB or >>> more). >> yes >> >>> 3. Benefit: increasing the cluster size also decreases the amount of >>> metadata (L2 and refcount). >> yes >> >>> 4. Problem: L2 tables become too big and fill up the cache more >>> easily. To solve this the cache code should do partial reads >>> instead of complete L2 clusters. >> yes. We can read full cluster as originally if L2 cache is empty. >> >>> 5. Problem: larger cluster sizes also mean more data to copy when >>> there's a COW. To solve this the COW code should be modified so it >>> goes from 5 OPs (read head, write head, read tail, write tail, >>> write data) to 2 OPs (read cluster, write modified cluster). >> yes, with small tweak if head and tail are in different clusters. In >> this case we >> will end up with 3 OPs. >> >>> 6. Having subclusters adds incompatible changes to the file format, >>> and they offer no benefit after allocation. >> yes >> >>> 7. Subclusters are only really useful if they match the guest fs block >>> size (because you would avoid doing COW on allocation). Otherwise >>> the only thing that you get is a faster COW (because you move less >>> data), but the improvement is not dramatic and it's better if we do >>> what's proposed in point 5. >> yes >> >>> 8. Even if the subcluster size matches the guest block size, you'll >>> get very fast initial allocation but also more chances to end up >>> with a very fragmented qcow2 image, which is worse in the long run. >> yes >> >>> 9. Problem: larger clusters make a less efficient use of disk space, >>> but that's a drawback you're fine with considering all of the >>> above. >> yes >> >>> Is that a fair summary of what you're trying to say? Anything else >>> missing? >> yes. >> >> 5a. Problem: initial cluster allocation without COW. Could be made >> cluster-size agnostic with the help of fallocate() call. Big >> clusters are even >> better as the amount of such allocations is reduced. >> >> Thank you very much for this cool summary! I am too tongue-tied. > Hi Denis, > > I don't have the have data to verify all your claims here, but in > general what you say makes sense. > > Although I'm not sure if I agree with everything (especially on whether > any of this applies to SSD drives at all) it seems that we all agree > that the COW algorithm can be improved, so perhaps I should start by > taking a look at that. > > Regards, > > Berto I understand. I just wanted to raise another possible (compatible) approach, which could help.
Den