Thanks Jeff for this too.

Best wishes,
Navid.

________________________________________
From: Jeff Law <jeffreya...@gmail.com>
Sent: Monday, November 22, 2021 19:09
To: Richard Biener; Navid Rahimi
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a 
missing pattern to match.pd

[You don't often get email from jeffreya...@gmail.com. Learn why this is 
important at http://aka.ms/LearnAboutSenderIdentification.]

On 11/10/2021 1:35 AM, Richard Biener via Gcc-patches wrote:
> On Tue, Nov 9, 2021 at 5:25 PM Navid Rahimi <navidrah...@microsoft.com> wrote:
>> Hi Richard,
>>
>> Thank you so much for your detailed feedback. I am attaching another version 
>> of the patch which does include all the changes you mentioned.
>>
>> Bellow you can see my response to your feedbacks:
>>
>>> the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as
>>> constants come last - you can then remove the 'c'
>> Fixed. I was not aware of the canonical order.
>>
>>> you can use INTEGRAL_TYPE_P (type).
>> Fixed. Didn't know about "type" either.
>>
>>> this test is not necessary
>> Fixed.
>>
>>> But should we also optimize x * (2 + y/x) - y -> 2*x - y % x?  So
>>> it looks like a conflict with the x * (1 + b) <-> x + x * b transform
>>> (fold_plusminus_mult)?  That said, the special case of one
>>> definitely makes the result cheaper (one less operation).
>> For this special case, it does remove operation indeed. But I was not able 
>> to reproduce it for any other constant [1]. If it was possible to do it with 
>> other constants I would've changed the pattern to have be more general like 
>> "x * (C + y/x) - y -> C*x - y % x". Basically anything other than 1 wasn't a 
>> win. Regarding the "x * (1 + b) <-> x + x * b" transformation, it appears to 
>> me when there is a "- y" at the end "x * (1 + b)", there is opportunity to 
>> optimize. Without that "- y" I was not able to make any operation more 
>> performant. Either direction, looks like same amount of computation.
>>
>> 1) 
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcompiler-explorer.com%2Fz%2FdWsq7Tzf4&amp;data=04%7C01%7Cnavidrahimi%40microsoft.com%7C1ddd355d86ee4a5d645808d9ae2e9e73%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732337521412439%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=K4ecZcigYkh8%2BSiqDegRxm72gkL%2FnsbuDDp5nsM3ZqA%3D&amp;reserved=0
>>
>>> Please move the pattern next to the most relatest which I think is
>> Fixed.
>>
>>> the return value of 1 is an unreliable way to fail, please instead call
>>> __builtin_abort ();
>> Fixed.
>>
>>> do we really need -O3 for this?  I think that -O should be enough here?
>> We don't specifically need that. But I realized that the optimization can 
>> happen in two different level at the compiler. It seems if you spread the 
>> computation over multiple statement like:
>>    int c = a/b;
>>    int y = b * (1 + c);
>>    return y - a;
>>
>> instead of :
>>    return b * (1 + a / b) - a;
>>
>> Then you have to have at least -O1 to have it optimized. Granted, I am not 
>> doing that in the testcase. In the new patch I am changing it to -O. Let me 
>> know if you have any suggestions.
> -O is fine, generally at -O0 we shouldn't expect such transforms to
> happen (but they still do, of course).
>
> The patch looks OK now.
I've pushed this to the trunk for Navid.

jeff

Reply via email to