Thanks Jeff for this too. Best wishes, Navid.
________________________________________ From: Jeff Law <jeffreya...@gmail.com> Sent: Monday, November 22, 2021 19:09 To: Richard Biener; Navid Rahimi Cc: gcc-patches@gcc.gnu.org Subject: Re: [EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd [You don't often get email from jeffreya...@gmail.com. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] On 11/10/2021 1:35 AM, Richard Biener via Gcc-patches wrote: > On Tue, Nov 9, 2021 at 5:25 PM Navid Rahimi <navidrah...@microsoft.com> wrote: >> Hi Richard, >> >> Thank you so much for your detailed feedback. I am attaching another version >> of the patch which does include all the changes you mentioned. >> >> Bellow you can see my response to your feedbacks: >> >>> the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as >>> constants come last - you can then remove the 'c' >> Fixed. I was not aware of the canonical order. >> >>> you can use INTEGRAL_TYPE_P (type). >> Fixed. Didn't know about "type" either. >> >>> this test is not necessary >> Fixed. >> >>> But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So >>> it looks like a conflict with the x * (1 + b) <-> x + x * b transform >>> (fold_plusminus_mult)? That said, the special case of one >>> definitely makes the result cheaper (one less operation). >> For this special case, it does remove operation indeed. But I was not able >> to reproduce it for any other constant [1]. If it was possible to do it with >> other constants I would've changed the pattern to have be more general like >> "x * (C + y/x) - y -> C*x - y % x". Basically anything other than 1 wasn't a >> win. Regarding the "x * (1 + b) <-> x + x * b" transformation, it appears to >> me when there is a "- y" at the end "x * (1 + b)", there is opportunity to >> optimize. Without that "- y" I was not able to make any operation more >> performant. Either direction, looks like same amount of computation. >> >> 1) >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcompiler-explorer.com%2Fz%2FdWsq7Tzf4&data=04%7C01%7Cnavidrahimi%40microsoft.com%7C1ddd355d86ee4a5d645808d9ae2e9e73%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732337521412439%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=K4ecZcigYkh8%2BSiqDegRxm72gkL%2FnsbuDDp5nsM3ZqA%3D&reserved=0 >> >>> Please move the pattern next to the most relatest which I think is >> Fixed. >> >>> the return value of 1 is an unreliable way to fail, please instead call >>> __builtin_abort (); >> Fixed. >> >>> do we really need -O3 for this? I think that -O should be enough here? >> We don't specifically need that. But I realized that the optimization can >> happen in two different level at the compiler. It seems if you spread the >> computation over multiple statement like: >> int c = a/b; >> int y = b * (1 + c); >> return y - a; >> >> instead of : >> return b * (1 + a / b) - a; >> >> Then you have to have at least -O1 to have it optimized. Granted, I am not >> doing that in the testcase. In the new patch I am changing it to -O. Let me >> know if you have any suggestions. > -O is fine, generally at -O0 we shouldn't expect such transforms to > happen (but they still do, of course). > > The patch looks OK now. I've pushed this to the trunk for Navid. jeff