[Bug tree-optimization/115287] Missed optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`

2024-05-29 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115287

--- Comment #2 from XChy  ---
(In reply to Andrew Pinski from comment #1)
> Dup.
> 
> *** This bug has been marked as a duplicate of bug 113105 ***

Sorry for reporting it again. But I transfer it to rtl-optimization because the
developer told me such optimization should be done during RTL expansion, why is
the report tagged tree-optimization now?

[Bug rtl-optimization/115287] New: Missed optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`

2024-05-29 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115287

Bug ID: 115287
   Summary: Missed optimzation: fold `div(v, a) * b + rem(v, a)`
to `div(v, a) * (b - a) + v`
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt example: https://godbolt.org/z/b5va37Tzx

For example:

unsigned char _bin2bcd(unsigned val)
{
return ((val / 10) << 4) + val % 10;
}

can be folded to:

unsigned char new_bin2bcd(unsigned val)
{
return val / 10 * 6 + val;
}

With O3 on x86 generates:

_bin2bcd:
mov edx, edi
mov eax, 3435973837
imulrdx, rax
shr rdx, 35
mov eax, edx
lea edx, [rdx+rdx*4]
add edx, edx
sal eax, 4
sub edi, edx
add eax, edi
ret
new_bin2bcd:
mov eax, edi
mov edx, 3435973837
imulrax, rdx
shr rax, 35
lea eax, [rax+rax*2]
lea eax, [rdi+rax*2]
ret

For this case, "new_bin2bcd" is cheaper.

This C snippet is extracted from
https://github.com/torvalds/linux/blob/master/lib/bcd.c

[Bug tree-optimization/115034] Missed optimization: reduntant store of identical value in the slot

2024-05-10 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115034

--- Comment #2 from XChy  ---
(In reply to Andrew Pinski from comment #1)
> Note there is some memory model requirements here that I always forget if
> this can happen or not.

Hmm. Could you please provide some documents about the memory model of GCC or
specific constraints about C language? The semantics of IR in the LLVM issue
look good to me, since the store is non-volatile and non-atomic. But I'm not
sure how it would be after lifting to C.

[Bug tree-optimization/115035] New: Missed optimization: fold min/max in phi

2024-05-10 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115035

Bug ID: 115035
   Summary: Missed optimization: fold min/max in phi
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt link: https://godbolt.org/z/o489f6sj5
```
bool src(unsigned a, unsigned b, bool c) {
unsigned phi;
if(c) {
dummy();
phi = a < 6 ? a : 6;
} else {
phi = b;
}
return phi < 6;
}
```

can be folded to:

```
bool tgt(unsigned a, unsigned b, bool c) {
unsigned phi;
if(c) {
dummy();
phi = a;
} else {
phi = b;
}
return phi < 6;
}
```

[Bug tree-optimization/115034] New: Missed optimization: reduntant store of identical value in the slot

2024-05-10 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115034

Bug ID: 115034
   Summary: Missed optimization: reduntant store of identical
value in the slot
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt link: https://godbolt.org/z/fdxKaxGoj

```
int src(int* outl, bool c1, bool c2) {
int a;
*outl = 0;
if (c1)
if (c2) {
dummy();
return 0;
} else {
a = 1;
}
else {
// we don't need to assign a = 0
a = 0;
}
*outl = a;
return 0;
}
```

can be transformed into:

```
int tgt(int* outl, bool c1, bool c2) {
int a;
*outl = 0;
if (c1) {
if (c2) {
dummy();
return 0;
} else {
a = 1;
}
} else {
return 0;
}
*outl = 1;
return 0;
}
```

Because "*outl = 0" is known at the entry, the path "a = 0 -> *outl = 0" can be
cut off. That is, we can move "*outl = 1" into the path where c1 is true.

[Bug tree-optimization/114797] Missed optimization : fail to merge memset with unrelated clobber

2024-04-21 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114797

--- Comment #2 from XChy  ---
It looks like for completely overlapped memset, it's merged:
https://godbolt.org/z/4r7Eqr1Ee
With clobber, that's not the case: https://godbolt.org/z/8jhaEbKqo

[Bug tree-optimization/114797] New: Missed optimization : fail to merge memset with unrelated clobber

2024-04-21 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114797

Bug ID: 114797
   Summary: Missed optimization : fail to merge memset with
unrelated clobber
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt link: https://godbolt.org/z/E581jvaPs

Code like:
```
void src(char* config){
char stack[208];
memset(stack, 0, 184);
*config = 0;
memset(stack + 184, 0, 8);
use(stack);
}
```

can be transformed to

```
void tgt(char* config){
char stack[208];
memset(stack, 0, 192);
*config = 0;
use(stack);
}
```

GCC doesn't merge them even both memsets overlap:
https://godbolt.org/z/ffc6b8zqs

[Bug tree-optimization/114737] New: Missed optimization : fail to optimize load with select clobber

2024-04-16 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114737

Bug ID: 114737
   Summary: Missed optimization : fail to optimize load with
select clobber
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt link: https://godbolt.org/z/3e5sfvfKj

```
char src(void** p, char* p1, bool c) {
char* tostore1 = p1 + 1;
char* tostore2 = p1 + 4;
*p = tostore1;
*p1 = 0;

char* tostore = (c ? tostore1 : tostore2) + 1;
*tostore = 1;
return *p1;
}
```

"return *p1" can be optimized into "return 0" here.

```
char tgt(void** p, char* p1, bool c) {
char* tostore1 = p1 + 1;
char* tostore2 = p1 + 4;
*p = tostore1;
*p1 = 0;

char* tostore = (c ? tostore1 : tostore2) + 1;
*tostore = 1;
return 0;
}
```

[Bug tree-optimization/114725] New: Missed optimization: more precise range for and

2024-04-15 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114725

Bug ID: 114725
   Summary: Missed optimization: more precise range for and
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt link: https://godbolt.org/z/vTb1Y5b39

```
bool src(int offset) {
if(offset > 128) {
return 0;
} else {
dummy();
return (offset & -9) == 258;
}
}

```
can be folded to:
```
bool tgt(int offset) {
if(offset > 128) {
return 0;
} else {
dummy();
return 0;
}
}
```

[Bug tree-optimization/114712] New: Missed optimization: simplify if-else basic blocks that share common destinations

2024-04-13 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114712

Bug ID: 114712
   Summary: Missed optimization: simplify if-else basic blocks
that share common destinations
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt link: https://godbolt.org/z/hY6Eaj8fj

```
void dummy();
void dummy1();

void src(bool c1, bool c2){
if(c1) {
if(c2)
goto bb1;
else
goto bb2;
} else {
if(c2)
goto bb2;
else
goto bb1;
}

bb1:
dummy();
return;
bb2:
dummy1();
return;
}
```

can be folded to:

```
void tgt(bool c1, bool c2){
if (c1 ^ c2) {
dummy1();
} else {
dummy();
}
}
```
It saves unnecessary branches.

[Bug tree-optimization/114711] Missed optimization: fold load of global constant array if there is obivous pattern

2024-04-13 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114711

--- Comment #3 from XChy  ---
(In reply to Andrew Pinski from comment #1)
> Note the openssl code issue was reported in pr 114682

Oh, thanks for transferring this LLVM issue! And because I'm recently trying to
transfer some of my LLVM issues to GCC and do some statistic work, let me know
if you have transferred some of them.

[Bug tree-optimization/114711] New: Missed optimization: fold load of global constant array if there is obivous pattern

2024-04-13 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114711

Bug ID: 114711
   Summary: Missed optimization: fold load of global constant
array if there is obivous pattern
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt link: https://godbolt.org/z/MMxoGaj9T

```
static char* const names[3] = {"abc", "cdf", "dsadsa"};
char** const id2name[3] = {names, names + 1, names + 2};

void* src(size_t idx){
return id2name[idx];
}

```

can be folded to:

```
static char* const names[3] = {"abc", "cdf", "dsadsa"};
void* tgt(size_t idx){
return names + idx;
}
```

This is a real pattern from openssl.

[Bug tree-optimization/114704] Missed optimization : eliminate store if the value is known in all predecessors

2024-04-13 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114704

--- Comment #3 from XChy  ---
(In reply to Andrew Pinski from comment #1)
> Confirmed. A more general testcase:
> ```
> void dummy();
> 
> void src(int *p, int a){
> int t = *p;
> if(t == a)
> goto then;
> else {
> dummy();
> t = *p;
> if(t == a)
> goto then;
> else
> return;
> }
> 
> then:
> *p = t; // *p is already a, it's dead now
> }
> 
> ```

Do you mean "*p = a" at the end?

[Bug tree-optimization/114704] New: Missed optimization : eliminate store if the value is known in all predecessors

2024-04-12 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114704

Bug ID: 114704
   Summary: Missed optimization : eliminate store if the value is
known in all predecessors
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt link: https://godbolt.org/z/KEeGTM49E

For code like:
```
void src(int *p){
if(*p == 0)
goto then;
else {
dummy();
if(*p == 0)
goto then;
else
return;
}

then:
*p = 0; // *p is already 0, it's dead now
}

```
In then basic block, *p is known to be 0 in all predecessors, thus the store
"*p = 0" is redundant.

[Bug tree-optimization/114702] New: Missed optimization: fail to infer c - b != if a + b != c

2024-04-12 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114702

Bug ID: 114702
   Summary: Missed optimization: fail to infer c - b != if a + b
!= c
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt link: https://godbolt.org/z/58rxGaabW

```
void src(int a, int b){
if(b + a == 32){
return;
}else{
if(32 - b == a)
dummy();
else
dummy1();
}
}
```

Obviously, "32 - b == a" is false. We can get:

```
void tgt(int a, int b){
if(b + a == 32){
return;
}else{
dummy1();
}
}
```


This is a real-world dead code after inlining functions in QEMU.

[Bug tree-optimization/114331] New: Missed optimization: indicate knownbits from dominating condition switch(trunc(a))

2024-03-13 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114331

Bug ID: 114331
   Summary: Missed optimization: indicate knownbits from
dominating condition switch(trunc(a))
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt link: https://godbolt.org/z/dso53ndTo
For code like:

int src(int num) {
switch((short)num){
case 111:
  return num & 0xfffe;
case 267:
case 204:
case 263:
  return 0;
default:
  dummy();
  return 0;
}
}

"num & 0xfffe" can be folded to "110". But both LLVM and GCC fail to fold it.

[Bug tree-optimization/113487] Missed optimization:simplify demanded bits on multi-use instructions like select

2024-01-18 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113487

--- Comment #2 from XChy  ---
I may miss something here... Apart from this one, it seems that GCC doesn't
simplify **one-use** instruction based on demanded bits too:
https://godbolt.org/z/67bYxd8hY

But LLVM indeed handle the one-use case.

[Bug tree-optimization/113487] New: Missed optimization:simplify demanded bits on multi-use instructions like select

2024-01-18 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113487

Bug ID: 113487
   Summary: Missed optimization:simplify demanded bits on
multi-use instructions like select
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt example: https://godbolt.org/z/EfKrTbK77

Based on common demanded bits from s&8 and s&16,
```
void src(bool c, unsigned a, unsigned b) {
unsigned s = c ? a & 24 : b & 25;
use(s & 8);
use(s & 16);
}
```

can be folded to:

```
void tgt(bool c, unsigned a, unsigned b) {
unsigned s = c ? a : b ;
use(s & 8);
use(s & 16);
}
```

Both LLVM and GCC missed this optimization opportunity.

[Bug tree-optimization/113105] Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`

2023-12-23 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105

--- Comment #5 from XChy  ---
(In reply to Jakub Jelinek from comment #4)
> So, e.g. on x86_64,
> unsigned int
> f1 (unsigned val)
> {
>   return val / 10 * 16 + val % 10;
> }
> 
> unsigned int
> f2 (unsigned val)
> {
>   return val / 10 * 6 + val;
> }
> 
> unsigned int
> f3 (unsigned val, unsigned a, unsigned b)
> {
>   return val / a * b + val % a;
> }
> 
> unsigned int
> f4 (unsigned val, unsigned a, unsigned b)
> {
>   return val / a * (b - a) + val % a;
> }
> 
> unsigned int
> f5 (unsigned val)
> {
>   return val / 93 * 127 + val % 93;
> }
> 
> unsigned int
> f6 (unsigned val)
> {
>   return val / 93 * (127 - 93) + val;
> }
> 
> f2, f3 and f5 are shorter compared to f1, f4 and f6 at -O2.
> With -Os, f3 is shorter than f4, while f1/f2 and f5/f6 are the same size
> (and also same number of insns there, perhaps f1 better than f2 as it uses
> shift rather than imul).
> So, this is really something that needs to take into account the machine
> specific expansion etc., isn't a clear winner all the time.

Thanks for your explanations! It's a good fold for those targets with expensive
cost on "v % a", but not for those cheap. I'm not a GCC developer, do you think
I should report to rtl-optimization?

And it seems that f6 has smaller size than f5 at -O2 in your example:
https://godbolt.org/z/PEWKfj1je

[Bug tree-optimization/113071] `((a == c) || (a == b)) ? a : b` is sometimes not optimized to `(a == c) ? c : b`

2023-12-23 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113071

XChy  changed:

   What|Removed |Added

 CC||xxs_chy at outlook dot com

--- Comment #1 from XChy  ---
May the fold below is a more general one?

(a == b | other_cond) ? a : b 

can be

other_cond ? a : b

Actually a == c in this example is irrelevant and can be replaced by any other
condition.

[Bug tree-optimization/113105] Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`

2023-12-21 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105

--- Comment #2 from XChy  ---
(In reply to Jakub Jelinek from comment #1)
> When it is signed v / a * b + v % a, I think it can introduce UB which
> wasn't there originally.
> E.g. for v = 0, a = INT_MIN and b = 3.  So, if it isn't done just for
> unsigned types,
> parts of it need to be done in unsigned.

Yes, this fold is true if there is no nooverflow/nowrap constraint. For those
with  nooverflow/nowrap constraint, it stays unclear to me when to fold.

For your reference, LLVM expands "v % a" to "v - (v / a) * a", and then
reassociates "(v / a) * b - (v / a) * a + v" to "(v / a) * (b - a) + v" to
solve this issue.

[Bug tree-optimization/113105] New: Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`

2023-12-21 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105

Bug ID: 113105
   Summary: Missing optimzation: fold `div(v, a) * b + rem(v, a)`
to `div(v, a) * (b - a) + v`
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt example: https://godbolt.org/z/b5va37Tzx

For example:

unsigned char _bin2bcd(unsigned val)
{
return ((val / 10) << 4) + val % 10;
}

can be folded to:

unsigned char new_bin2bcd(unsigned val)
{
return val / 10 * 6 + val;
}

This C snippet is extracted from
https://github.com/torvalds/linux/blob/master/lib/bcd.c

Both GCC and LLVM missed it.

[Bug tree-optimization/112982] New: Missing optimization: fold max(b, a + 1) to b when a < b

2023-12-12 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112982

Bug ID: 112982
   Summary: Missing optimization: fold max(b, a + 1) to b when a <
b
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt example: https://godbolt.org/z/nn8hT16Ka

long src(long a, long b) {
if(a < b){
return max(b, a + 1);
}else{
return 0;
}
}

can be folded to:

long tgt(long a, long b) {
if(a < b){
return b;
}else{
return 0;
}
}

Similarly, such pattern can be generalized to `a > b`, `min(b, a + 1)`.
Both GCC and LLVM missed such optimization opportunity.

[Bug tree-optimization/112900] New: Missing optimization: canonicalize `select c, x - 1, x + 1` to `x + (select c, -1, 1)` (or reversely)

2023-12-07 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112900

Bug ID: 112900
   Summary: Missing optimization: canonicalize `select c, x - 1, x
+ 1` to `x + (select c, -1, 1)`  (or reversely)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

GCC works much better on folding branches into select than llvm for:
https://godbolt.org/z/jWzePjqTs

But GCC seems to generate different X86 assembly for the code
below(https://godbolt.org/z/Erq56Tbjq):

int src(int x, int y, bool cond) {
return (x > y ? x - 1 : x + 1);
}

int tgt(int x, int y, bool cond) {
return x + (x > y ? -1 : 1);
}

For `src`, we compute both `x-1` and `x+1` and apply `cmov` to return one of
them.
For `tgt`, we fold `(x > y ? -1 : 1)` to `(x > y) * 2 - 1`.

LLVM prefers the latter, so I think it may be better to canonicalize `select c,
x - 1, x + 1` to `x + (select c, -1, 1)`. Though it's uncanonicalized in
tree-optimization stage, please let me know if there is other backend's factors
like ILP deciding that.

[Bug tree-optimization/112884] New: Missing optimization: fold a%2==0 ? a/2*2 : 0 to a%2==0 ? a : 0

2023-12-06 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112884

Bug ID: 112884
   Summary: Missing optimization: fold a%2==0 ? a/2*2 : 0 to
a%2==0 ? a : 0
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt example: https://godbolt.org/z/Ec1ax79r8

For the select arm "a / 2 * 2" in:

unsigned src(unsigned a) {
return a % 2 == 0 ? (a / 2 * 2) : 0;
}

it's equivalent to "a", so the program could be folded to:

unsigned tgt(unsigned a) {
return a % 2 == 0 ? a : 0;
}

Both GCC and LLVM missed such optimization in select arms.

[Bug tree-optimization/112857] Missing optimzation: fold (b + ~a) > 0 to a - b < -1

2023-12-05 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112857

--- Comment #3 from XChy  ---
(In reply to Richard Biener from comment #2)
> careful about overflow. 

I'm not a developer of GCC, but for "(b + ~a) > 0 -> a - b < -1", I can say
it's a valid refinement for both signed and unsigned, with SMT verfication of
LLVM IR.

> Also note compares against zero might be cheaper (but that's eventually an 
> RTL 
> expansion thing).

Yes, that depends on specific platform, and may need to handle in backends.

[Bug tree-optimization/112857] Missing optimzation: fold (b + ~a) > 0 to a - b < -1

2023-12-05 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112857

--- Comment #4 from XChy  ---
(In reply to Andrew Pinski from comment #1)

> For the above, GCC is able to get the best code for g0, g10, g1, and g4
> (though g1 and g4 are still `(b-a) > 0` at the gimple level. While LLVM is
> able to get it for g0, g10 and g4 (g3 is close though with `(b-a) > 0`).

Thanks for generalizing this fold!

[Bug tree-optimization/112857] New: Missing optimzation: fold (b + ~a) > 0 to a - b < -1

2023-12-04 Thread xxs_chy at outlook dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112857

Bug ID: 112857
   Summary: Missing optimzation: fold (b + ~a) > 0 to a - b < -1
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt link example: https://godbolt.org/z/Pba4Y164f
For c code like:


bool src(int a, int b){
return (b + ~a) > 0;
}

can be folded to:


bool tgt(int a, int b){
return a - b < -1;
}


But both GCC and LLVM missed it.