29.10.2019 14:55, Max Reitz wrote: > On 29.10.19 12:48, Vladimir Sementsov-Ogievskiy wrote: >> 29.10.2019 11:50, Max Reitz wrote: >>> On 28.10.19 12:25, Vladimir Sementsov-Ogievskiy wrote: >>>> 28.10.2019 14:04, Kevin Wolf wrote: >>>>> Am 27.10.2019 um 13:35 hat Stefan Hajnoczi geschrieben: >>>>>> On Fri, Oct 25, 2019 at 11:58:46AM +0200, Max Reitz wrote: >>> >>> [...] >>> >>>>>>> (3) Drop handle_alloc_space(), i.e. revert c8bb23cbdbe32f. >>>>>>> To my knowledge I’m the only one who has provided any benchmarks >>>>>>> for >>>>>>> this commit, and even then I was a bit skeptical because it >>>>>>> performs >>>>>>> well in some cases and bad in others. I concluded that it’s >>>>>>> probably worth it because the “some cases” are more likely to >>>>>>> occur. >>>>>>> >>>>>>> Now we have this problem of corruption here (granted due to a >>>>>>> bug in >>>>>>> the XFS driver), and another report of massively degraded >>>>>>> performance on ppc64 >>>>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1745823 – sorry, a >>>>>>> private BZ; I hate that :-/ The report is about 40 % worse >>>>>>> performance for an in-guest fio write benchmark.) >>>>>>> >>>>>>> So I have to ask the question about what the justification for >>>>>>> keeping c8bb23cbdbe32f is. How much does performance increase >>>>>>> with >>>>>>> it actually? (On non-(ppc64+XFS) machines, obviously) >>>>>>> >>>>>>> Advantages: >>>>>>> + Trivial >>>>>>> + No layering violations >>>>>>> + We wouldn’t need to keep track of whether the kernel bug has >>>>>>> been >>>>>>> fixed or not >>>>>>> + Fixes the ppc64+XFS performance problem >>>>>>> >>>>>>> Disadvantages: >>>>>>> - Reverts cluster allocation performance to pre-c8bb23cbdbe32f >>>>>>> levels, whatever that means >>>>>> >>>>>> My favorite because it is clean and simple, but Vladimir has a valid >>>>>> use-case for requiring this performance optimization so reverting isn't >>>>>> an option. >>>>> >>>>> Vladimir also said that qcow2 subclusters would probably also solve his >>>>> problem, so maybe reverting and applying the subcluster patches instead >>>>> is a possible solution, too? >>>> >>>> I'm not sure about ssd case, it may need write-zero optimization anyway. >>> >>> What exactly do you need? Do you actually need to write zeroes (e.g. >>> because you’re storing images on block devices) or would it be >>> sufficient to just drop the COW areas when bdrv_has_zero_init() and >>> bdrv_has_zero_init_truncate() are true? >> >> Hmm, what do you mean? We need to zero COW areas. So, original way is to >> write real zeroes, optimized way is fallocate.. What do you mean by drop? >> Mark sublusters as zeroes by metadata? > > Why do you need to zero COW areas? For normal files, any data will read > as zero if you didn’t write anything there.
Hmm, but when allocating new cluster in qcow2, it's not guaranteed to be zero, as it may be reused previously allocated cluster.. > >> But still we'll have COW areas in subcluster, and we'll need to directly zero >> them.. And fallocate will most probably be faster on ssd ext4 case.. >> >>> >>> I’m asking because Dave Chinner said >>> (https://bugzilla.redhat.com/show_bug.cgi?id=1765547#c7) that >>> fallocate() is always slow at least with aio=native because it needs to >>> wait for all concurrent AIO writes to finish, and so it causes the AIO >>> pipeline to stall. >>> >>> (He suggested using XFS extent size hints to get the same effect as >>> write-zeroes for free, basically, but I don’t know whether that’s really >>> useful to us; as unallocated areas on XFS read back as zero anyway.) >>> >>>>> We already have some cases where the existing handle_alloc_space() >>>>> causes performance to actually become worse, and serialising requests as >>>>> a workaround isn't going to make performance any better. So even on >>>>> these grounds, keeping commit c8bb23cbdbe32f is questionable. >>>>> >>>> >>>> Can keeping handle_alloc_space under some config option be an option? >>> >>> Hm. A config option is weird if you’re the only one who’s going to >>> enable it. But other than that I don’t have anything against it. >>> >> >> It's just a bit easier for us to maintain config option, than out-of-tree >> patch. >> On the other hand, it's not a real problem to maintain this one patch in >> separate. >> It may return again to the tree, when XFS bug fixed. > > We’ll still have the problem that fallocate() must stall aio=native > requests. > Does it mean that fallocate is bad in general? Practice shows the opposite.. At least I have my examples with qemu-img bench. Can that thing be shown with qemu-img bench or something? -- Best regards, Vladimir