[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542 --- Comment #9 from GCC Commits --- The master branch has been updated by Sam James : https://gcc.gnu.org/g:2e662dedf84aa23fdff7bceca040432bf9f1ab72 commit r15-2412-g2e662dedf84aa23fdff7bceca040432bf9f1ab72 Author: Sam James Date: Tue Jul 30 12:20:47 2024 +0100 testsuite: fix whitespace in dg-do compile directives Nothing seems to change here in reality at least on x86_64-pc-linux-gnu, but important to fix nonetheless in case people copy it. PR rtl-optimization/48633 PR tree-optimization/83072 PR tree-optimization/83073 PR tree-optimization/96542 PR tree-optimization/96707 PR tree-optimization/97567 PR target/69225 PR target/89929 PR target/96562 * g++.dg/pr48633.C: Fix whitespace in dg directive. * g++.dg/pr96707.C: Likewise. * g++.target/i386/mv28.C: Likewise. * gcc.dg/Warray-bounds-flex-arrays-1.c: Likewise. * gcc.dg/pr83072-2.c: Likewise. * gcc.dg/pr83073.c: Likewise. * gcc.dg/pr96542.c: Likewise. * gcc.dg/pr97567-2.c: Likewise. * gcc.target/i386/avx512fp16-11a.c: Likewise. * gcc.target/i386/avx512fp16-13.c: Likewise. * gcc.target/i386/avx512fp16-14.c: Likewise. * gcc.target/i386/avx512fp16-conjugation-1.c: Likewise. * gcc.target/i386/avx512fp16-neg-1a.c: Likewise. * gcc.target/i386/avx512fp16-set1-pch-1a.c: Likewise. * gcc.target/i386/avx512fp16vl-conjugation-1.c: Likewise. * gcc.target/i386/avx512fp16vl-neg-1a.c: Likewise. * gcc.target/i386/avx512fp16vl-set1-pch-1a.c: Likewise. * gcc.target/i386/avx512vlfp16-11a.c: Likewise. * gcc.target/i386/pr69225-1.c: Likewise. * gcc.target/i386/pr69225-2.c: Likewise. * gcc.target/i386/pr69225-3.c: Likewise. * gcc.target/i386/pr69225-4.c: Likewise. * gcc.target/i386/pr69225-5.c: Likewise. * gcc.target/i386/pr69225-6.c: Likewise. * gcc.target/i386/pr69225-7.c: Likewise. * gcc.target/i386/pr96562-1.c: Likewise. * gcc.target/riscv/rv32e_stack.c: Likewise. * gfortran.dg/c-interop/removed-restrictions-3.f90: Likewise. * gnat.dg/renaming1.adb: Likewise.
[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |12.0
[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542 Andrew Macleod changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from Andrew Macleod --- Range-ops uses wi_fold (implemented by each opcode) to individually fold subranges one at a time and then combines them. This patch first calls wi_fold_in_parts which checks if one of the subranges is small, and if so, further splits that subrange into constants. Currently, if a subrange is 4 or less values, then we call it individually for each of the 4 values. 4 was chosen as a reasonable tradeoff between excess work vs benefit. this gets all 3 cases now. fixed
[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542 --- Comment #7 from CVS Commits --- The master branch has been updated by Andrew Macleod : https://gcc.gnu.org/g:704e8a825c78b9a8424c291509413bbb48e602c7 commit r12-2381-g704e8a825c78b9a8424c291509413bbb48e602c7 Author: Andrew MacLeod Date: Fri Jul 16 11:42:14 2021 -0400 Add wi_fold_in_parts. range-ops uses wi_fold to individually fold subranges one at a time and then combined them. This patch first calls wi_fold_in_parts which checks if one of the subranges is small, and if so, further splits that subrange into constants. gcc/ PR tree-optimization/96542 * range-op.cc (range_operator::wi_fold_in_parts): New. (range_operator::fold_range): Call wi_fold_in_parts. (operator_lshift::wi_fold): Fix broken lshift by [0,0]. * range-op.h (wi_fold_in_parts): Add prototype. gcc/testsuite * gcc.dg/pr96542.c: New.
[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542 --- Comment #6 from Andrew Macleod --- Likewise, for unsigned baz (unsigned int x) { if (x >= 4) return 32; return (-1U >> x) * 16; } === BB 2 x_3(D) unsigned int VARYING _4 UNDEFINED : if (x_3(D) > 3) goto ; [INV] else goto ; [INV] 2->3 (T) x_3(D) : unsigned int [4, +INF] 2->4 (F) x_3(D) : unsigned int [0, 3] === BB 3 _4 UNDEFINED : // predicted unlikely by early return (on trees) predictor. goto ; [INV] === BB 4 x_3(D) unsigned int [0, 3] : _1 = 4294967295 >> x_3(D); _4 = _1 * 16; _1 : unsigned int [536870911, +INF] 0x >> [0,3] is currently producing [0x2FFF, 0x] again making operator_rshift capable of generating multi-ranges, 0x >> [0,3] should produce [0x2FFF, 0x2FFF][0x4FFF, 0x4FFF][0x8FFF, 0x8FFF][0x, 0x] and then _4 = _1 *16 should automatically produce: [0xFFF0, 0xFFF0] I think multiply does that today, if its given the proper input. === BB 5 _4 unsigned int VARYING : # _2 = PHI <32(3), _4(4)> return _2; And then this PHI will have the constant propagated and become: # _2 = PHI <32(3), 0xFFF0(4)> return _2 ANd given that, presumably phi-opt or maybe even VRPs simplfication will turn that into the desired conditional once the constant is calculated.
[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542 --- Comment #5 from Andrew Macleod --- I think this all goes away when multi-range is enabled. The original testcase produces: === BB 2 x_4(D) unsigned int VARYING : tmp_5 = x_4(D) != 0; _1 = (int) tmp_5; _2 = 255 >> _1; _3 = (unsigned char) _2; _6 = _3 * 2; return _6; _1 : int [0, 1] _2 : int [127, 255] _3 : unsigned char [127, +INF] A number of the range-ops routines are simply inherited from the single range codebase of value_range, and are not multi-range enabled yet. In particular, rshift here. with a tweak to operator_rshift, we can take 255 >> [0,1] and instead calculate _2 = [127,127][255,255] which would make the results: _1 : int [0, 1] _2 : int [127, 127][255, 255] _3 : unsigned char [127, 127] [255, 255] Then when _6 is calculated as [127, 127][255, 255] * 2 , range-ops will calculate the result to be [254, 254] The whole thing will just fold away to return 254 (or -2 if you want to sign it :-) All we need to do is get the multi-range code enabled it's coming.
[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542 --- Comment #4 from Jakub Jelinek --- As for COND_EXPR, if we do it that way, it should be rather keyed on a range with only two possible values in the range.
[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542 Jakub Jelinek changed: What|Removed |Added CC||aldyh at gcc dot gnu.org, ||amacleod at redhat dot com --- Comment #3 from Jakub Jelinek --- (In reply to Marc Glisse from comment #2) > > Or shall say VRP try harder if it sees [0, 1] ranges? > > If a range has only 2 (or some other small number) values, try propagating > each and see if some variables end up with the same value in both cases? Maybe. The question is if it should be done in forwprop, or say vrp or in the ranger code (does that one do forward propagation)? In any case, it should be limited to ranges with small number of values (perhaps decided with a param) and also bound on how many immediate use stmts it attempts to propagate it through.
[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542 --- Comment #2 from Marc Glisse --- (In reply to Jakub Jelinek from comment #1) > In bar, this is optimized, because fold_binary_op_with_conditional_arg > optimizes > 255 >> (x ? 1 : 0) into x ? 127 : 255 and when multiplied by two in unsigned > char this results in x ? 254 : 254. > We don't have anything comparable in match.pd yet I believe (and should we?). We have something for VEC_COND_EXPR to fold a op (b?c:d), but not for COND_EXPR, which you would be unlikely to see in gimple (and the generator of phiopt transforms from match.pd patterns hasn't appeared yet). Also, we only have x!=0, and while fold_binary_op_with_conditional_arg tries to handle it like x!=0?1:0, we indeed don't do anything like that for gimple. And it seems possibly better suited to forward propagation than backward like match.pd. > Or shall say VRP try harder if it sees [0, 1] ranges? If a range has only 2 (or some other small number) values, try propagating each and see if some variables end up with the same value in both cases? Or if enough simplifications occur that it is worth introducing a conditional? I am not sure it would be worth the trouble. > Though, shouldn't we optimize e.g. > unsigned > baz (unsigned int x) > { > if (x >= 4) return 32; > return (-1U >> x) * 16; > } > too to return x >= 4 ? 32U : -16U; ? > Not sure where and how to generalize it though. > Value range of -1U >> [0, 3] is not really useful here, nonzero bits either. > And having a specialized (const1 >> x) * const2 optimizer based on x's value > range would work, but not sure if it has a real-world benefit. And here this is complicated by the fact that we do not narrow the operation, so it is less obvious that the constant is -1.
[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542 Jakub Jelinek changed: What|Removed |Added Ever confirmed|0 |1 CC||jakub at gcc dot gnu.org Status|UNCONFIRMED |NEW Last reconfirmed||2020-08-10 --- Comment #1 from Jakub Jelinek --- unsigned char foo (unsigned int x) { _Bool y = x; return (((unsigned char) ~0) >> y) * 2; } unsigned char bar (unsigned int x) { return (((unsigned char) ~0) >> (_Bool) x) * 2; } In bar, this is optimized, because fold_binary_op_with_conditional_arg optimizes 255 >> (x ? 1 : 0) into x ? 127 : 255 and when multiplied by two in unsigned char this results in x ? 254 : 254. We don't have anything comparable in match.pd yet I believe (and should we?). Or shall say VRP try harder if it sees [0, 1] ranges? Though, shouldn't we optimize e.g. unsigned baz (unsigned int x) { if (x >= 4) return 32; return (-1U >> x) * 16; } too to return x >= 4 ? 32U : -16U; ? Not sure where and how to generalize it though. Value range of -1U >> [0, 3] is not really useful here, nonzero bits either. And having a specialized (const1 >> x) * const2 optimizer based on x's value range would work, but not sure if it has a real-world benefit.