[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2023-03-21 Thread adam.warner.nz at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #25 from Adam Warner  ---
Documenting a workaround I've found for the unnecessary sign extension. I'm
still perplexed at the improbability of this appearing to work!

workaround_bsr_sign_extension.c:

#include 

uint64_t bsr_u64(uint64_t a) {
  if (sizeof(unsigned long) == 8) return 63 - __builtin_clzl(a);
  if (sizeof(unsigned long long) == 8) return 63 - __builtin_clzll(a);
}

uint64_t bsr_u64_alt(uint64_t a) {
  if (sizeof(unsigned long) == 8) return UINT64_C(63) - __builtin_clzl(a);
  if (sizeof(unsigned long long) == 8) return UINT64_C(63) -
__builtin_clzll(a);
}

int main(void) {return 0;}

$ gcc -O3 workaround_bsr_sign_extension.c && objdump -d -m i386:x86-64:intel
a.out|less

Relevant output:

1140 :
1140:   48 0f bd c7 bsrrax,rdi
1144:   48 98   cdqe
1146:   c3  ret
1147:   66 0f 1f 84 00 00 00nopWORD PTR [rax+rax*1+0x0]
114e:   00 00 

1150 :
1150:   48 0f bd c7 bsrrax,rdi
1154:   c3  ret

In the alternative implementation the superfluous 32- to 64-bit sign extension
instruction CDQE is no longer generated.

[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2021-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Andrew Pinski  changed:

   What|Removed |Added

 CC||gpiez at web dot de

--- Comment #24 from Andrew Pinski  ---
*** Bug 50168 has been marked as a duplicate of this bug. ***

[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2019-12-31 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #23 from Alexander Monakov  ---
In libcpp, search_line_fast implementations suffer from this, and what is
worse, attempts to workaround it by explicitly requesting zero extension don't
work:

char *foo(char *p, int x)
{
  return p + (unsigned)__builtin_ctz(x);
}

The above code is deliberately asking for zero extension, and yet various
optimizations in GCC transform it back to costlier form with sign extension.

(FWIW, LLVM gets this right)

[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2019-09-29 Thread steinar+gcc at gunderson dot no
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #22 from Steinar H. Gunderson  ---
Yes; I had a project recently where I won 15% overall performance (!) by using
an asm statement with bsf instead of ffsll(). (This skipped both the sign
extension and the check for zero; my argument could never be zero, due to a
loop condition, but GCC didn't manage to remove the check nevertheless.)

[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2019-09-29 Thread david.bolvansky at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Dávid Bolvanský  changed:

   What|Removed |Added

 CC||david.bolvansky at gmail dot 
com

--- Comment #21 from Dávid Bolvanský  ---
long long foo (int x) { return __builtin_ctz (x); }

still produces 

foo(int):
xor eax, eax
tzcnt   eax, edi
cdqe
ret


instead of

foo(int):# @foo(int)
tzcnt   eax, edi
ret

[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-06 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #20 from Jakub Jelinek jakub at gcc dot gnu.org ---
Author: jakub
Date: Sat Jul  6 09:34:17 2013
New Revision: 200731

URL: http://gcc.gnu.org/viewcvs?rev=200731root=gccview=rev
Log:
PR target/29776
* fold-const.c (tree_call_nonnegative_warnv_p): Return true
for BUILT_IN_C{LZ,LRSB}*.
* tree.h (CASE_INT_FN): Add FN##IMAX case.
* tree-vrp.c (extract_range_basic): Handle
BUILT_IN_{FFS,PARITY,POPCOUNT,C{LZ,TZ,LRSB}}*.  For
BUILT_IN_CONSTANT_P if argument isn't (D) of PARM_DECL,
fall thru to code calling set_value*.
* builtins.c (expand_builtin): Remove *IMAX cases.
(fold_builtin_bitop): For BUILT_IN_CLRSB* return NULL_TREE
if width is bigger than 2*HWI.

* libgcc2.c (__floattisf): Avoid undefined signed overflow.

* gcc.dg/tree-ssa/vrp89.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/vrp89.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/builtins.c
trunk/gcc/fold-const.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vrp.c
trunk/gcc/tree.h
trunk/libgcc/ChangeLog
trunk/libgcc/libgcc2.c


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-05 Thread neleai at seznam dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #15 from Ondrej Bilka neleai at seznam dot cz ---
On Thu, Jul 04, 2013 at 07:46:07PM +, glisse at gcc dot gnu.org wrote:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776
 
 --- Comment #14 from Marc Glisse glisse at gcc dot gnu.org ---
 (In reply to Jakub Jelinek from comment #12)
  Not all CPUs that have defined behavior for 0 define it to the precision
  though, and even on i?86 it is undefined even when using lzcnt/tzcnt on
  older CPUs.
  Even the libgcc routines provide various return values for 0 depending on
  target (look for COUNT_LEADING_ZEROS_0).
  So it is not an option to redefine the builtins, they are undefined behavior
  for 0 and that can't change.
 
 I didn't mean to document the value for 0 or have other optimizations create
 calls to this builtin relying on this property. I just meant that if we 
 already
 have a call to clz and the argument happens to be 0 (which is undefined), we
 could optimize based on the assumption that the result is the precision, that
 may break less code than your current patch. Otherwise we might as well assert
 (well, tell it to VRP) that the argument is non-zero.
 
 But you are probably right that trying to work around the undefined value for  0
 is wrong, another builtin should be used instead.

Wait, when user triggers undefined behaviour anything is possible. So we
can change return value and be still correct, rigth?


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-05 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #16 from Jakub Jelinek jakub at gcc dot gnu.org ---
Defining the value at this point would mean that we'd need to generate
slower/larger code on various targets (including i386/x86_64 when not
targetting the really most recent CPUs only) and possibly create new libgcc
entrypoints, at least for targets where the current library routine doesn't
return the value that would be newly defined.  That would slow down users that
know they aren't passing 0 to the builtins (all correct ones).
If you want builtins with defined behavior for 0, the only way is to add new
ones, but the question is what the return value should be for 0.
Or __builtin_ffs can be used, but that is 0 for 0 and otherwise 1 +
__builtin_ctz, so if you wanted ctz from that, you'd subtract one, which would
imply that for 0 this ctz variant would return -1.

BTW, the patch broke regtest on x86_64, because libgcc contained code like:
void foo (__int128_t u)
{
  if ((long long) u == u)
return something;
  long long hi = u  64;
  if (hi  0)
hi = -hi;
  int count = __builtin_clzll (hi);
  if (count == 0)
return somethingelse;
  ...
}

VRP figured out that count can't be 0 in a valid program (which is true), so I
had to fix it up to do hi = -(unsigned long long) hi; so that it doesn't
trigger undefined behavior for the minimum value of u.


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-05 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #17 from Jakub Jelinek jakub at gcc dot gnu.org ---
Unfortunately looking at longlong.h, the header on some targets uses
__builtin_clz* or __builtin_ctz* even with defining COUNT_LEADING_ZEROS_0 to
some value (e.g. on alpha, arm, avr, coldfire, mips).
libgcc itself doesn't use COUNT_LEADING_ZEROS_0 macro, nor does glibc, but
wonder if other projects don't use it.
Code that happened to work (albeit, according to gcc documentation (but not
longlong.h) undefined) could break with my patch.
So, either we'd need to change longlong.h not to define COUNT_LEADING_ZEROS_0
when count_leading_zeros is implemented using the builtin, or we'd need to be
more conservative in the VRP patch.  The problem is what is more conservative.
The C?Z_DEFINED_VALUE_AT_ZERO macros aren't unfortunately immediately usable,
because they are well defined just for some integer modes (the ones for which
there is some hw insn or at least optab), while the builtin might be for some
other mode (larger or smaller).  If the builtin is for smaller mode and HW
supports only wider insns, generally clz is expanded as clzlarger +
(largerbitsize - thisbitsize), ctz just as doing larger ctz and not
adjusting in any way.  So perhaps VRP could check whether there is an optab
for the mode of the builtin and if yes, look at its C.Z_DEFINED_VALUE_AT_ZERO,
otherwise assume undefined.  That would let the longlong.h work, supposedly
it uses the builtins only for the modes which are in hw (otherwise the libgcc
implementation using those macros would be not working).
Because trying to handle all the cases would be a mess, looking at
CLZ_DEFINED_VALUE_AT_ZERO when it is defined it is usually number of bits in
the mode and even the various extra clz expansions (both the widening which is
expanded as wider clz + difference of the wider and narrower mode bitsize and
doubleword narrowing which is expanded as runtime non-zero check on the first
word and either clz of the first word or bitsize difference + clz of second
word) probably honor it, but for ctz some targets use bitsize, some targets use
-1, but e.g. when widening we'd happily return bitsize of the larger mode and
not adjust, or when ctz is expanded using clz it can be again different.

So, thoughts on this?


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-05 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #18 from Jakub Jelinek jakub at gcc dot gnu.org ---
Or yet another option is just to always use [0, prec] VR for CLZ (and lower it
for defined range, as the patch does right now), i.e. for CLZ
change maxi = prec - 1; to maxi = prec; and drop if (maxi == prec) maxi = prec
- 1;, looking for all targets where the value for 0 is defined for CLZ it is
mode size, and it ought to behave well even with the widening/narrowing
expansions.

And for CTZ just leave 0 as undefined, because there really are just too many
possibilities what you can get, and it isn't relevant to longlong.h either,
because there is no COUNT_TRAILING_ZEROS_0 macro.


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-05 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #19 from Jakub Jelinek jakub at gcc dot gnu.org ---
Also, perhaps expand_builtin_unop for the int bitops builtin
(ffs/parity/popcount/clz/ctz/clrsb) optabs, if target returned by expand_unop
is wider than target_mode, perhaps we could set SUBREG_PROMOTED_VAR_P
and !SUBREG_PROMOTED_UNSIGNED_P.  When the builtins are expanded using libcall,
the libcalls return int and so does the builtin, thus the target_mode should
then be equal to what we have, and otherwise I'd think the result should be
already extended, as in target should be the return value, not some value with
target value only in the low bits.  But that would require verification for all
targets.


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-04 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org ---
Created attachment 30453
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30453action=edit
gcc49-pr29776.patch

Untested VRP patch.  There is SSA_NAME_RANGE_INFO support being reviewed
currently which will make VRP info persistent and some zero/sign-extension pass
that makes use of that info, perhaps that will take care of the rest.

I'm not 100% sure about CLZ/CTZ in the patch, because it could return any value
for argument of 0, but as we document it as undefined behavior, perhaps it is
fine.

With more code we could do even better than this and define from argument's VR
smaller VR_RANGE than this patch does, say for ffs if 0 isn't in the VR_RANGE
of the argument, the minimum value of the builtin won't be 0, but 1, and from
maximum value we could take floor_log of the maximum value plus 1 as the
maximum of the resulting range.  Ditto for popcount, for parity the current [0,
1] is sufficient, for clz we could use precision minus floor_log of the maximum
value
as the minimum of resulting vr, for ctz derive from it maximum.  Dunno if it is
worth it though.


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-04 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #30453|0   |1
is obsolete||

--- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org ---
Created attachment 30454
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30454action=edit
gcc49-pr29776.patch

It wasn't actually that hard to improve the VRs in some cases.


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-04 Thread dean at arctic dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #10 from dean at arctic dot org ---
On Thu, 4 Jul 2013, jakub at gcc dot gnu.org wrote:

 I'm not 100% sure about CLZ/CTZ in the patch, because it could return any 
 value
 for argument of 0, but as we document it as undefined behavior, perhaps it is
 fine.

this is unfortunate really -- the newer LZCNT/TZCNT instructions on x86 
explicitly return 8*sizeof(type) for input of zero because the undefined 
behaviour of BSF/BSR posed a lot of problems in inner loops.  it might be 
useful to add explicit builtins for this behaviour.

thanks for taking a look at the bug :)

-dean


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-04 Thread glisse at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #11 from Marc Glisse glisse at gcc dot gnu.org ---
(In reply to dean from comment #10)
 this is unfortunate really -- the newer LZCNT/TZCNT instructions on x86 
 explicitly return 8*sizeof(type) for input of zero

Same as the power cntlz* instructions. And in both cases we implement
intrinsics as a call to __builtin_clz*. So it would make sense to assume this
behavior in the optimizations (if it is undefined behavior, picking this
behavior won't break more things than assuming it never happens).


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-04 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #12 from Jakub Jelinek jakub at gcc dot gnu.org ---
Not all CPUs that have defined behavior for 0 define it to the precision
though, and even on i?86 it is undefined even when using lzcnt/tzcnt on older
CPUs.
Even the libgcc routines provide various return values for 0 depending on
target (look for COUNT_LEADING_ZEROS_0).
So it is not an option to redefine the builtins, they are undefined behavior
for 0 and that can't change.
There is __builtin_ffs that provides defined behavior for 0.


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-04 Thread joseph at codesourcery dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #13 from joseph at codesourcery dot com joseph at codesourcery dot 
com ---
For the RTL operations and optabs, we have CLZ_DEFINED_VALUE_AT_ZERO and 
CTZ_DEFINED_VALUE_AT_ZERO, but as noted in the documentation they do not 
refer to definedness for the built-in functions.  (That's similar to 
SHIFT_COUNT_TRUNCATED / TARGET_SHIFT_TRUNCATION_MASK, which again describe 
semantics at the RTL level but do not affect the undefinedness of 
out-of-range shifts in C/C++ source code or GIMPLE.)


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-04 Thread glisse at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #14 from Marc Glisse glisse at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #12)
 Not all CPUs that have defined behavior for 0 define it to the precision
 though, and even on i?86 it is undefined even when using lzcnt/tzcnt on
 older CPUs.
 Even the libgcc routines provide various return values for 0 depending on
 target (look for COUNT_LEADING_ZEROS_0).
 So it is not an option to redefine the builtins, they are undefined behavior
 for 0 and that can't change.

I didn't mean to document the value for 0 or have other optimizations create
calls to this builtin relying on this property. I just meant that if we already
have a call to clz and the argument happens to be 0 (which is undefined), we
could optimize based on the assumption that the result is the precision, that
may break less code than your current patch. Otherwise we might as well assert
(well, tell it to VRP) that the argument is non-zero.

But you are probably right that trying to work around the undefined value for 0
is wrong, another builtin should be used instead.

 There is __builtin_ffs that provides defined behavior for 0.

So, should that be used for the x86 and power intrinsics? Or a new
__builtin_clz0?


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-03 Thread sgunderson at bigfoot dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

sgunderson at bigfoot dot com changed:

   What|Removed |Added

 CC||sgunderson at bigfoot dot com

--- Comment #6 from sgunderson at bigfoot dot com ---
Without knowing anything about the GCC internals here, I could perhaps also
point out that GCC should know that these have limited range. As a trivial
example:

int foo(int x)
{
int z = __builtin_ctz(x);
if (z  2000) {
return 1;
} else {
return 0;
}
}

There's no way this function can return anything but 0, and VRP should probably
be taught that. (I wonder if this would fix the unneccessary sign extension
too?)


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2013-07-03 Thread sgunderson at bigfoot dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #7 from sgunderson at bigfoot dot com ---
Wait, sorry, someone's already pointed that out. Ignore me, then...

I can at least confirm it still happens with GCC 4.8.1.


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2012-07-30 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #4 from Uros Bizjak ubizjak at gmail dot com 2012-07-30 11:36:23 
UTC ---
 Perhaps REE can be taught about ctz giving a non-negative result.

Maybe we need some VRP pass to remove these extensions. Please note an example
from (duplicate) PR54115, where we generate:

#include stdint.h
uint64_t foo(long x){
 return __builtin_ctzl(x);
}

foo:
  bsfq  %rdi, %rax
  cltq

this is DImode - DImode operation, followed by sign-extend from SImode partial
reg.

Your examples in comment #3:

  bsfl %esi, %esi
  movslq  %esi, %rsi

can be fixed by slapping any_extend:DI to CTZ pattern, to consume either
ZERO_EXTEND or SIGN_EXTEND of the value (it doesn't matter for ranges [0..(some
small number)]).

The DImode example above can be fixed by adding SUBREG to all patterns. But,
I think that there is more optimal implementation than exploding the number of
bit manipulating patterns.


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2012-07-30 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

--- Comment #5 from Uros Bizjak ubizjak at gmail dot com 2012-07-30 12:48:19 
UTC ---
Created attachment 27895
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27895
Target patch that handles ctz extensions

x86 target patch that teaches gcc about extensions for ctz.

Patched gcc generates no extension for:

long long foo (int x) { return __builtin_ctz (x); }

and

int foo (long long x) { return __builtin_ctzll (x); }

(and their unsigned variants) on x86.


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2012-07-29 Thread pinskia at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Andrew Pinski pinskia at gcc dot gnu.org changed:

   What|Removed |Added

 CC||neleai at seznam dot cz

--- Comment #2 from Andrew Pinski pinskia at gcc dot gnu.org 2012-07-29 
08:15:14 UTC ---
*** Bug 54115 has been marked as a duplicate of this bug. ***


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2012-07-29 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Steven Bosscher steven at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed|2011-12-24 00:00:00 |2012-07-29 0:00
 CC||uros at gcc dot gnu.org

--- Comment #3 from Steven Bosscher steven at gcc dot gnu.org 2012-07-29 
13:23:30 UTC ---
Tested with trunk r189929 on x86-64 (generic).

#define MODE signed // or unsigned
unsigned mask[8];

unsigned foo(unsigned y, unsigned char x)
{
  MODE int c = (MODE) __builtin_ctz(x);
  unsigned int m = mask[c];
  return y  m;
}


With MODE==signed:
movzbl  %sil, %esi
movl%edi, %eax
rep; bsfl   %esi, %esi
movslq  %esi, %rsi
andlmask(,%rsi,4), %eax
ret

With MODE==unsigned:
movzbl  %sil, %esi
movl%edi, %eax
rep; bsfl   %esi, %esi
movl%esi, %esi
andlmask(,%rsi,4), %eax
ret

The movl %esi, %esi is a zeroextendsidi:
#(insn:TI 9 2 10 2 (parallel [
#(set (reg:SI 4 si [orig:60 D.1716 ] [60])
#(ctz:SI (reg:SI 4 si [orig:68 D.1715 ] [68])))
#(clobber (reg:CC 17 flags))
#]) t.c:6 666 {ctzsi2}
# (expr_list:REG_UNUSED (reg:CC 17 flags)
#(nil)))
rep; bsfl   %esi, %esi  # 9 ctzsi2  [length = 4]
#(insn:TI 10 9 12 2 (set (reg:DI 4 si [orig:70 D.1716 ] [70])
#(zero_extend:DI (reg:SI 4 si [orig:60 D.1716 ] [60]))) t.c:7 113
{*zero_extendsidi2_rex64}
# (nil))
movl%esi, %esi  # 10*zero_extendsidi2_rex64/1   [length
= 2]

Perhaps REE can be taught about ctz giving a non-negative result.


[Bug target/29776] result of ffs/clz/ctz/popcount/parity are already sign-extended

2011-12-23 Thread pinskia at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29776

Andrew Pinski pinskia at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2011-12-24
 Ever Confirmed|0   |1

--- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org 2011-12-24 
06:46:16 UTC ---
Trying 9 - 10:
Failed to match this instruction:
(set (reg:DI 69 [ D.1710 ])
(sign_extend:DI (ctz:SI (reg:SI 67 [ x ]

Should be changed to zero_extend and then that could be matched really.