[Bug target/101865] _ARCH_PWR8 is not defined when using -mcpu=power8

2021-08-31 Thread wschmidt at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101865

--- Comment #14 from wschmidt at linux dot ibm.com ---
On 8/31/21 11:09 AM, bergner at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101865
>
> --- Comment #13 from Peter Bergner  ---
> (In reply to Tulio Magno Quites Machado Filho from comment #12)
>> There is a chance, that my previous comment is wrong with regards the
>> generation of VSX instructions for Power8.
>>
>> I don't know what the second command means:
>>
>> $ gcc-11 -mcpu=power10 -dM -E - < /dev/null | grep -E 'VECTOR|VSX|ALTIVEC'
>> #define __VSX__ 1
>> #define __ALTIVEC__ 1
>> #define __POWER9_VECTOR__ 1
>> #define __APPLE_ALTIVEC__ 1
>> #define __POWER8_VECTOR__ 1
>> $ gcc-11 -mcpu=power10 -mno-power8-vector -dM -E - < /dev/null | grep -E
>> 'VECTOR|VSX|ALTIVEC'
>> #define __VSX__ 1
>> #define __ALTIVEC__ 1
>> #define __APPLE_ALTIVEC__ 1
> __VSX__ doesn't mean all of VSX is enabled.  IIRC, __VSX__ is the macro you
> would use to see whether you have POWER7 VSX support.  For POWER8's VSX
> support, you'd use __POWER8_VECTOR__, etc.  So in your last compile, you
> disabled vector support from POWER8 onwards, but that leaves vector support
> from POWER7 and earlier, ie, __VSX__ and __ALTIVEC__.  If you had used
> -mno-vsx, you'd still have __ALTIVEC__ and __APPLE_ALTIVEC__ defined.  
> Finally,
> if you have used -mno-altivec, then you would have disabled all vector 
> support.
>
I disagree with that.  You should use __VSX__ && _ARCH_PWR9 to check for 
P9 vector support, etc.  The __POWERn_VECTOR__ things really are not 
great and I wish they had never been added.

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2020-06-21 Thread wschmidt at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

--- Comment #4 from wschmidt at linux dot ibm.com ---
On 6/19/20 12:43 PM, jens.seifert at de dot ibm.com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737
>
> Jens Seifert  changed:
>
> What|Removed |Added
> 
>   Status|RESOLVED|UNCONFIRMED
>   Resolution|DUPLICATE   |---
>
> --- Comment #3 from Jens Seifert  ---
> This is different as the extsw also happens if the result gets used e.g.
> followed by a andc, which is my case. I obviously oversimplified the sample. 
> It
> has nothing to do with function result and ABI requirements. gcc assume that
> the result of -(a < b) implemented by subfc, subfe is signed 32-bit. But the
> result is already 64-bit.
>
> unsigned long long branchlesconditional(unsigned long long a, unsigned long
> long b, unsigned long long c)
> {
> unsigned long long mask = -(a < b);
> return c &~ mask;
> }
>
> results in
>
> _Z20branchlesconditionalyyy:
> .LFB1:
>  .cfi_startproc
>  subfc 4,4,3
>  subfe 3,3,3
>  not 3,3
>  extsw 3,3
>  and 3,3,5
>  blr
>
> expected
> subfc
> subfe
> andc
>
Thanks for verifying, Jens!

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2020-05-31 Thread wschmidt at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398

--- Comment #35 from wschmidt at linux dot ibm.com ---
Hi Jeff,

Just a quick comment.  We should never discuss raw runtimes of SPEC 
benchmarks on Power hardware in public.  It's okay to talk about 
improvements (>12% in this case), but not wall clock time.  Not a big 
deal, but there are some legal reasons regarding SPEC that cause us to 
be a little careful.

Thanks!
Bill

On 5/21/20 12:29 AM, guojiufu at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
>
> --- Comment #26 from Jiu Fu Guo  ---
> Had a test on spec2017 xz_r by changing the specified loop manually, on
> ppc64le.
>
> original loop (this loops occur three times in code):
>  while (++len != len_limit)
>  if (pb[len] != cur[len])
>  break;
> changed to loop:
> typedef long long __attribute__((may_alias)) TYPEE;
>
>for(++len; len + sizeof(TYPEE) <= len_limit; len += sizeof(TYPEE)) {
>  long long a = *((TYPEE*)(cur+len));
>  long long b = *((TYPEE*)(pb+len));
>  if (a != b) {
>break; //to optimize len can be move forward here.
>}
>  }
>for (;len != len_limit; ++len)
>  if (pb[len] != cur[len])
>break;
>
> We can see xz_r runtime improved from 433s to 382s(>12%).
> It would be very valuable to do this kind of widening reading/checking.
>

[Bug fortran/95053] [11 regression] ICE in f951: gfc_divide()

2020-05-14 Thread wschmidt at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95053

--- Comment #24 from wschmidt at linux dot ibm.com ---
On 5/14/20 12:08 PM, sgk at troutmask dot apl.washington.edu wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95053
>
> --- Comment #23 from Steve Kargl  ---
> On Thu, May 14, 2020 at 02:57:37PM +, wschmidt at gcc dot gnu.org wrote:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95053
>>
>> Bill Schmidt  changed:
>>
>> What|Removed |Added
>> 
>>   CC||wschmidt at gcc dot gnu.org
>>
>> --- Comment #22 from Bill Schmidt  ---
>> Breaking legitimate code, even if "borderline," does not seem right to me.
>> Zero division is generally a runtime exception because of such cases.
>>
>> You write code for a general case, then later you discover "oh, well, we 
>> could
>> make this variable zero for our specific usage," and now the compiler throws 
>> a
>> fit?  Seems like this is warning-level stuff.
>>
> If Bill's reduction of the several thousand-line file to 10ish
> lines is an accurate reduction (and I have no reasons to doubt
> that it isn't), then no.  It is an programming error.  This is
> not the first time that gfortran has found a programming error
> in WRF.  Sure, in this case the 'if (cdleps > 0)' leads to dead
> code elimination, but DCE happens after gfortran has done some
> constant folding and common subexpression elimination in the
> front-end.
>
I'm afraid I disagree.  A divide-by-zero that cannot ever be executed is 
not an error.

[Bug target/90763] PowerPC vec_xl_len should take const

2020-02-06 Thread wschmidt at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90763

--- Comment #1 from wschmidt at linux dot ibm.com ---
Hi Martin,

Could you please CC me on all ppc bugs as well as Segher?  I do all of 
the "project management" activities for the IBM GCC team.

Thanks!
Bill

On 2/6/20 8:28 AM, marxin at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90763
>
> Martin Liška  changed:
>
> What|Removed |Added
> 
>   Status|UNCONFIRMED |NEW
> Last reconfirmed||2020-02-06
>   CC||marxin at gcc dot gnu.org,
> ||segher at gcc dot gnu.org
>   Ever confirmed|0   |1
>

[Bug lto/91287] LTO disables linking with scalar MASS library (Fortran only)

2019-07-31 Thread wschmidt at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91287

--- Comment #10 from wschmidt at linux dot ibm.com ---
On 7/31/19 2:25 AM, rguenther at suse dot de wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91287
>
> --- Comment #9 from rguenther at suse dot de  ---
> On Wed, 31 Jul 2019, luoxhu at cn dot ibm.com wrote:
>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91287
>>
>> --- Comment #8 from Xiong Hu XS Luo  ---
>> (In reply to Thomas Koenig from comment #6)
>>> (In reply to Xiong Hu XS Luo from comment #4)
>>>
>>>> /tmp/cctrpu2h.ltrans0.ltrans.o: In function `MAIN__':
>>>> :(.text+0x114): undefined reference to `_gfortran_st_write'
>>>> :(.text+0x12c): undefined reference to
>>>> `_gfortran_transfer_character_write'
>>> You're not linkging against libgfortran.
>>>
>>> Either use gfortran as command for compiling or linking, or
>>> add the appropriate libraries (-lgfortran -lquadmath) to
>>> the linking step.
>> Thanks Thomas and Richard.  Sorry that I am not familiar with fortran.  The
>> regression was fixed by Martin's new change.
>>
>> The c code included math.h actually.
>>
>> cat atan2bashzowie.c
>> #include 
>> #include 
>> #include 
>>
>> double __attribute__((noinline)) zowie (double x, double y, double z)
>> {
>>   return atan2 (x * y, z);
>> }
>>
>> double __attribute__((noinline)) rand_finite_double (void)
>> {
>>   union {
>> double d;
>> unsigned char uc[sizeof(double)];
>>   } u;
>>   do {
>> for (unsigned i = 0; i < sizeof u.uc; i++) {
>>   u.uc[i] = (unsigned char) rand();
>> }
>>   } while (!isfinite(u.d));
>>   return u.d;
>> }
>>
>> int main ()
>> {
>>   double a = rand_finite_double ();
>>   printf ("%lf\n", zowie (a, 4.5, 2.2));
>>   return 0;
>> }
>> cat build.sh
>> ~/local/gcc_t/bin/gcc -O3 -mcpu=power9 atan2bashzowie.c -mveclibabi=mass
>> -L/opt/mass/8.1.3/Linux_LE/lib/ -lmass -lmass_simdp8 -lmassv -lmassvp8 -o 
>> a.out
>> nm a.out | grep atan2
>> ~/local/gcc_t/bin/gcc -O3 -mcpu=power9 atan2bashzowie.c -mveclibabi=mass
>> -L/opt/mass/8.1.3/Linux_LE/lib/ -lmass -flto -lmass_simdp8 -lmassv -lmassvp8 
>> -o
>> a.out
>> nm a.out | grep atan2
>> ./build.sh
>> 1700 T atan2
>> 1700 T _atan2
>> 17e0 T atan2
>> 17e0 T _atan2
> Err, but [_]atan2 are surely not vector variants.  Also is massv a static
> library here?  It looks more like you are not getting the code vectorized
> with -flto but without and both variants end up using massv (the -flto
> variant using the scalar atan2)?
>
> That said, you have to do more detailed analysis of what actually
> happens and what you _want_ to happen.  The bugreport summary
> doesn't really match what you show.
>
Agree that there's some unnecessary confusion here.  I think the
temporary ICE and the build issues obscured the original intent of the bug.

There are two libraries provided with the MASS project.  libmass
provides scalar replacements for corresponding libm scalar math
functions.  libmassv provides the vectorized versions of those
functions.  For this bug we are only concerned about libmass and scalar
math functions.

With the C version of the code, we correctly generate symbols atan2 and
_atan2 that will be satisfied by libmass.  With the Fortran version of
the code and without -flto, we again generate symbols atan2f and _atan2f
that will be satisfied by libmass.  When we add -flto to the Fortran
version of the code, we instead generate symbols for atan2f@@GLIBC_2.17,
disallowing libmass from satisfying them.

We see different behavior in the "visibility" LTO pass between the C and
Fortran versions, which seems to be a clue.