Undocumented parameters for __builtin_cpu_supports()?

2018-11-06 Thread Martin Reinecke
Compiling and running the following code on a CPU with BMI2 support
prints "BMI2 detected" as one would expect:

#include 

int main(void)
  {
  if (__builtin_cpu_supports("bmi2"))
printf("BMI2 detected\n");
  return 0;
  }

However, "bmi2" is not documented in the list of arguments at
https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions

Is this list supposed to be complete? If yes, I can open a PR. If not,
is there any other way to obtain the full list of allowed parameters?

Cheers,
  Martin



Re: Undocumented parameters for __builtin_cpu_supports()?

2018-11-06 Thread Martin Liška
On 11/6/18 1:35 PM, Martin Reinecke wrote:
> Compiling and running the following code on a CPU with BMI2 support
> prints "BMI2 detected" as one would expect:
> 
> #include 
> 
> int main(void)
>   {
>   if (__builtin_cpu_supports("bmi2"))
> printf("BMI2 detected\n");
>   return 0;
>   }
> 
> However, "bmi2" is not documented in the list of arguments at
> https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions

Hi.

Thanks for reporting that. Yes, it's missing.

> 
> Is this list supposed to be complete? If yes, I can open a PR. If not,
> is there any other way to obtain the full list of allowed parameters?

If I see correctly, not.

I filed a PR for it:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87903

for which I'll prepare a patch.

Martin

> 
> Cheers,
>   Martin
> 



GCC segfault while compiling SPEC 2017 fprate tests

2018-11-06 Thread Steve Ellcey
I was doing some benchmarking with SPEC 2017 fprate on aarch64
(Thunderx2) and I am getting some segfaults from GCC while compiling.

I am working with delta to try and cut down one of the test cases
but I was wondering if anyone else has seen this problem.  The
three tests that segfault while compiling are 510.parest_r, 511.povray_r,
and 521.wrf_r.

If I compile 510.parest_r with -Ofast -std=gnu++17 -fpermissive
I get this segfault:

during GIMPLE pass: vect
source/numerics/histogram.cc: In member function 'void 
dealii::Histogram::evaluate(const std::vector >&, const 
std::vector&, unsigned int, dealii::Histogram::IntervalSpacing) [with 
number = float]':
source/numerics/histogram.cc:54:6: internal compiler error: Segmentation fault
   54 | void Histogram::evaluate (const std::vector > &values,
  |  ^
0xdfd48f crash_signal
/home/sellcey/gcc-tot/src/gcc/gcc/toplev.c:325
0x108a6e4 contains_struct_check(tree_node*, tree_node_structure_enum, char 
const*, int, char const*)
/home/sellcey/gcc-tot/src/gcc/gcc/tree.h:3231
0x108a6e4 slpeel_duplicate_current_defs_from_edges
/home/sellcey/gcc-tot/src/gcc/gcc/tree-vect-loop-manip.c:984
0x108c87b slpeel_tree_duplicate_loop_to_edge_cfg(loop*, loop*, edge_def*)
/home/sellcey/gcc-tot/src/gcc/gcc/tree-vect-loop-manip.c:1074
0x1090ba3 vect_do_peeling(_loop_vec_info*, tree_node*, tree_node*, tree_node**, 
tree_node**, tree_node**, int, bool, bool)
/home/sellcey/gcc-tot/src/gcc/gcc/tree-vect-loop-manip.c:2580
0x108071b vect_transform_loop(_loop_vec_info*)
/home/sellcey/gcc-tot/src/gcc/gcc/tree-vect-loop.c:8243
0x10a311f try_vectorize_loop_1
/home/sellcey/gcc-tot/src/gcc/gcc/tree-vectorizer.c:965
0x10a3adb vectorize_loops()
/home/sellcey/gcc-tot/src/gcc/gcc/tree-vectorizer.c:1097
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


Re: GCC segfault while compiling SPEC 2017 fprate tests

2018-11-06 Thread Richard Biener
On November 6, 2018 5:45:52 PM GMT+01:00, Steve Ellcey  
wrote:
>I was doing some benchmarking with SPEC 2017 fprate on aarch64
>(Thunderx2) and I am getting some segfaults from GCC while compiling.

This should be already fixed. 

>I am working with delta to try and cut down one of the test cases
>but I was wondering if anyone else has seen this problem.  The
>three tests that segfault while compiling are 510.parest_r,
>511.povray_r,
>and 521.wrf_r.
>
>If I compile 510.parest_r with -Ofast -std=gnu++17 -fpermissive
>I get this segfault:
>
>during GIMPLE pass: vect
>source/numerics/histogram.cc: In member function 'void
>dealii::Histogram::evaluate(const std::vector
>>&, const std::vector&, unsigned int,
>dealii::Histogram::IntervalSpacing) [with number = float]':
>source/numerics/histogram.cc:54:6: internal compiler error:
>Segmentation fault
>54 | void Histogram::evaluate (const std::vector >
>&values,
>  |  ^
>0xdfd48f crash_signal
>/home/sellcey/gcc-tot/src/gcc/gcc/toplev.c:325
>0x108a6e4 contains_struct_check(tree_node*, tree_node_structure_enum,
>char const*, int, char const*)
>/home/sellcey/gcc-tot/src/gcc/gcc/tree.h:3231
>0x108a6e4 slpeel_duplicate_current_defs_from_edges
>/home/sellcey/gcc-tot/src/gcc/gcc/tree-vect-loop-manip.c:984
>0x108c87b slpeel_tree_duplicate_loop_to_edge_cfg(loop*, loop*,
>edge_def*)
>/home/sellcey/gcc-tot/src/gcc/gcc/tree-vect-loop-manip.c:1074
>0x1090ba3 vect_do_peeling(_loop_vec_info*, tree_node*, tree_node*,
>tree_node**, tree_node**, tree_node**, int, bool, bool)
>/home/sellcey/gcc-tot/src/gcc/gcc/tree-vect-loop-manip.c:2580
>0x108071b vect_transform_loop(_loop_vec_info*)
>/home/sellcey/gcc-tot/src/gcc/gcc/tree-vect-loop.c:8243
>0x10a311f try_vectorize_loop_1
>/home/sellcey/gcc-tot/src/gcc/gcc/tree-vectorizer.c:965
>0x10a3adb vectorize_loops()
>/home/sellcey/gcc-tot/src/gcc/gcc/tree-vectorizer.c:1097
>Please submit a full bug report,
>with preprocessed source if appropriate.
>Please include the complete backtrace with any bug report.
>See  for instructions.



Re: GCC segfault while compiling SPEC 2017 fprate tests

2018-11-06 Thread Arseny Solokha
> I was doing some benchmarking with SPEC 2017 fprate on aarch64
> (Thunderx2) and I am getting some segfaults from GCC while compiling.
>
> I am working with delta to try and cut down one of the test cases
> but I was wondering if anyone else has seen this problem.  The
> three tests that segfault while compiling are 510.parest_r, 511.povray_r,
> and 521.wrf_r.

This is probably PR87889, already fixed on trunk.


Re: GCC segfault while compiling SPEC 2017 fprate tests

2018-11-06 Thread Steve Ellcey
On Wed, 2018-11-07 at 00:16 +0700, Arseny Solokha wrote:
> 
> This is probably PR87889, already fixed on trunk.

Yup, that was the problem.  I have updated my sources and things are
building now.  Thanks for the info.

Steve Ellcey


Fixing Bug Id 66074

2018-11-06 Thread nick
Greetings all,
I am wondering why this bug is only for the function reported:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66074
Seems there are lots of other functions in that file that could
use the exact same  optimization, would it be better to rewrite
the file to use the void object pointer for all functions that 
can use a secondary version for that so this would touch the 
function, gcc_jit_result_get_global as an example as it looks 
similar enough i.e. same function arguments. Due to this I am
assuming all non error functions in this file should be rewritten
this.

Thanks and I will send a patch if that's the case,
Nick


Fixing Bug Id 66074

2018-11-06 Thread nick
Greetings all,
I am wondering why this bug is only for the function reported:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66074
Seems there are lots of other functions in that file that could
use the exact same  optimization, would it be better to rewrite
the file to use the void object pointer for all functions that 
can use a secondary version for that so this would touch the 
function, gcc_jit_result_get_global as an example as it looks 
similar enough i.e. same function arguments. Due to this I am
assuming all non error functions in this file should be rewritten
this.

Thanks and I will send a patch if that's the case,
Nick


Extracting live registers

2018-11-06 Thread Paulo Matos
Hi,

I remember from awhile ago that there's some option (or there was...)
that gets GCC to print some register allocation information together
with the assembler output.

I am interested in obtaining the live registers per basic block. I think
the option I had in mind did that but I can't remember the option name
anymore. Can someone point me out to the option or a way to extract such
information?

Kind regards,
-- 
Paulo Matos


Re: Extracting live registers

2018-11-06 Thread Paulo Matos
Apologies, wrong mailing list. Should have sent this to gcc-help.

On 06/11/2018 21:35, Paulo Matos wrote:
> Hi,
> 
> I remember from awhile ago that there's some option (or there was...)
> that gets GCC to print some register allocation information together
> with the assembler output.
> 
> I am interested in obtaining the live registers per basic block. I think
> the option I had in mind did that but I can't remember the option name
> anymore. Can someone point me out to the option or a way to extract such
> information?
> 
> Kind regards,
> 

-- 
Paulo Matos


Re: Extracting live registers

2018-11-06 Thread Segher Boessenkool
Hi Paulo,

On Tue, Nov 06, 2018 at 09:35:35PM +0100, Paulo Matos wrote:
> I remember from awhile ago that there's some option (or there was...)
> that gets GCC to print some register allocation information together
> with the assembler output.
> 
> I am interested in obtaining the live registers per basic block. I think
> the option I had in mind did that but I can't remember the option name
> anymore. Can someone point me out to the option or a way to extract such
> information?

-fdump-rtl-alignments[-all] is the last dump with all that information I
think.  This one also has all this info without -all it seems.  With -all
it shows it interleaving the RTL dump as well, which may or may not be
handy for you.


Segher


Cortex M0 Floating Point Library

2018-11-06 Thread Daniel Engel
Hi, 

Over the past couple of years, I have hand-assembled a new floating point 
library for the ARM Cortex M0 architecture.  I know the M0 is not generally 
regarded as a number-crunching machine, but I felt it deserved at least some of 
the attention that has previously been bestowed on the AVR architecture.  As 
this work has been incidental to my employer's line of business, they have 
tentatively agreed to assign the copyright and facilitate a release of this 
library as open source.  

I have efficient implementations of all of the integer and single-precision 
AEABI functions:

*  clzsi2, clzdi2, umulsidi3, mulsidi3, muldi3 (aeabi_lmul)
*  ashldi3 (aeabi_llsl), lshrdi3 (aeabi_llsr), ashrdi3 (aeabi_lasr)
*  aeabi_lcmp, aeabi_ulcmp
*  udivsi3 (aeabi_uidivmod), divsi3 (aeabi_idivmod), udivdi3 _aeabi_uldivmod), 
divdi3 (aeabi_ldivmod)
*  addsf3 (aeabi_fadd), subsf3 (aeabi_fsub, aeabi_frsub), mulsf3 (aeabi_fmul), 
divsf3 (aeabi_fdiv), fdimf
*  cmpsf2 (aeabi_fcmpun), eqsf2 (aeabi_fcmpeq), nesf2 (aeabi_fcmpne), gesf2 
(aeabi_fcmpge), gtsf2, unordsf2
*  floatundisf (aeabi_ul2f),floatunsisf (aeabi_ui2f),floatdisf 
(aeabi_l2f),floatsisf (aeabi_i2f)
*  fixsfdi (aeabi_f2lz), fixunssfdi (aeabi_f2ulz), fixsfsi (aeabi_f2iz), 
fixunssfsi (aeabi_f2uiz)
*  aeabi_f2d, aeabi_d2f, aeabi_h2f, aeabi_f2h

I also have efficient implementations of several of the simpler libm functions:

*  frexpf, ldexpf, scalbnf
*  fmaxf, fminf
*  rintf, lrintf, ulrintf, llrintf, ullrintf, roundf, lroundf, ulroundf, 
llroundf, ullroundf
*  truncf, ceilf, floorf
*  fpclassifyf, isnormalf, isnanf, isinff, isfinitef, isposf, isnegf
*  ilogbf, logbf, modff
*  sqrtf, cbrtf
*  log2f, logf, log10f, log1p2f, log1pf, log1p10f, logXf, log1pXf
*  sinf, cosf, sincosf, sinpif, cospif, sincospif
*  tanf, cotf, tanpif, cotpif

Presently, the library comprises about 40 files with about 8000 lines of asm 
(unified syntax).  The test vectors weigh significantly more.  All of the 
floating point functions are IEEE754 compliant.  I can provide more complete 
performance statistics on request, but here are a few highlights: 

* Small: Less than 3kb for everything above.  Only 450 bytes for basic addsf3, 
subsf3, mulsf3, divsf3, and cmpsf2.
* Fast: addsf3 = 75 instruction cycles, subsf3 = 80, mulsf3 = 95, divsf3 = 260 
to 360, cmpsf2 = 35.
* Correct: Simultaneous calculation of sincosf() in less than 500 instruction 
cycles, accurate within +/- 1 ulp, including arbitrarily large values of 'x'.
* Bonus: round10iff(x, n) (a non-standard function) correctly rounds floating 
point values 'x' to an integer power of 10 'n'; this function simulates 
conversion to a decimal string, truncation, and conversion back to binary32 
without any string-handling overhead.

To date, I have only built this library as part of a user space embedded 
application.  I have not attempted to build or patch the GCC toolchain itself.  
If accepted, I suspect there will be at least a little work to restructure it 
for inclusion with libgcc.  But, before proceeding with that work, I need to 
have some idea of direction and goal.  

The first question, then, is what might the best home for this library be?  
Many of the lower level functions (e.f. clzsi2, addsf3) replace the generic 
implementations of libgcc.  However, the higher level functions (e.g. ldexpf, 
sincosf) traditionally link from libm, which I don't believe is typically 
distributed with gcc.  The compact nature of this library of course follows 
from a tight integration between higher and lower level functions.  I have 
considered a few strategies: 

* Add everything into the base libgcc, 
* Add everything into libm (newlib?) and rely on link order to supersede 
libgcc, 
* Split the implementation with some magic to ensure that libm functions only 
link in the presence of the correct libgcc,
* Establish an independent library specific to the Cortex M0 architecture, or
* Something else entirely...

If there is any interest in incorporating this work into GCC, please advise.  

Thanks,
Daniel Engel


Re: Cortex M0 Floating Point Library

2018-11-06 Thread Joel Sherrill
On Tue, Nov 6, 2018, 10:32 PM Daniel Engel  Hi,
>
> Over the past couple of years, I have hand-assembled a new floating point
> library for the ARM Cortex M0 architecture.  I know the M0 is not generally
> regarded as a number-crunching machine, but I felt it deserved at least
> some of the attention that has previously been bestowed on the AVR
> architecture.  As this work has been incidental to my employer's line of
> business, they have tentatively agreed to assign the copyright and
> facilitate a release of this library as open source.
>
> I have efficient implementations of all of the integer and
> single-precision AEABI functions:
>
> *  clzsi2, clzdi2, umulsidi3, mulsidi3, muldi3 (aeabi_lmul)
> *  ashldi3 (aeabi_llsl), lshrdi3 (aeabi_llsr), ashrdi3 (aeabi_lasr)
> *  aeabi_lcmp, aeabi_ulcmp
> *  udivsi3 (aeabi_uidivmod), divsi3 (aeabi_idivmod), udivdi3
> _aeabi_uldivmod), divdi3 (aeabi_ldivmod)
> *  addsf3 (aeabi_fadd), subsf3 (aeabi_fsub, aeabi_frsub), mulsf3
> (aeabi_fmul), divsf3 (aeabi_fdiv), fdimf
> *  cmpsf2 (aeabi_fcmpun), eqsf2 (aeabi_fcmpeq), nesf2 (aeabi_fcmpne),
> gesf2 (aeabi_fcmpge), gtsf2, unordsf2
> *  floatundisf (aeabi_ul2f),floatunsisf (aeabi_ui2f),floatdisf
> (aeabi_l2f),floatsisf (aeabi_i2f)
> *  fixsfdi (aeabi_f2lz), fixunssfdi (aeabi_f2ulz), fixsfsi (aeabi_f2iz),
> fixunssfsi (aeabi_f2uiz)
> *  aeabi_f2d, aeabi_d2f, aeabi_h2f, aeabi_f2h
>
> I also have efficient implementations of several of the simpler libm
> functions:
>
> *  frexpf, ldexpf, scalbnf
> *  fmaxf, fminf
> *  rintf, lrintf, ulrintf, llrintf, ullrintf, roundf, lroundf, ulroundf,
> llroundf, ullroundf
> *  truncf, ceilf, floorf
> *  fpclassifyf, isnormalf, isnanf, isinff, isfinitef, isposf, isnegf
> *  ilogbf, logbf, modff
> *  sqrtf, cbrtf
> *  log2f, logf, log10f, log1p2f, log1pf, log1p10f, logXf, log1pXf
> *  sinf, cosf, sincosf, sinpif, cospif, sincospif
> *  tanf, cotf, tanpif, cotpif
>
> Presently, the library comprises about 40 files with about 8000 lines of
> asm (unified syntax).  The test vectors weigh significantly more.  All of
> the floating point functions are IEEE754 compliant.  I can provide more
> complete performance statistics on request, but here are a few highlights:
>
> * Small: Less than 3kb for everything above.  Only 450 bytes for basic
> addsf3, subsf3, mulsf3, divsf3, and cmpsf2.
> * Fast: addsf3 = 75 instruction cycles, subsf3 = 80, mulsf3 = 95, divsf3 =
> 260 to 360, cmpsf2 = 35.
> * Correct: Simultaneous calculation of sincosf() in less than 500
> instruction cycles, accurate within +/- 1 ulp, including arbitrarily large
> values of 'x'.
> * Bonus: round10iff(x, n) (a non-standard function) correctly rounds
> floating point values 'x' to an integer power of 10 'n'; this function
> simulates conversion to a decimal string, truncation, and conversion back
> to binary32 without any string-handling overhead.
>

This sounds like a nice body of work. Congratukations.

Does paranoia pass?

>
> To date, I have only built this library as part of a user space embedded
> application.  I have not attempted to build or patch the GCC toolchain
> itself.  If accepted, I suspect there will be at least a little work to
> restructure it for inclusion with libgcc.  But, before proceeding with that
> work, I need to have some idea of direction and goal.
>
> The first question, then, is what might the best home for this library
> be?  Many of the lower level functions (e.f. clzsi2, addsf3) replace the
> generic implementations of libgcc.  However, the higher level functions
> (e.g. ldexpf, sincosf) traditionally link from libm, which I don't believe
> is typically distributed with gcc.  The compact nature of this library of
> course follows from a tight integration between higher and lower level
> functions.  I have considered a few strategies:
>
> * Add everything into the base libgcc,
> * Add everything into libm (newlib?) and rely on link order to supersede
> libgcc,
>

This will almost certainly break at some point, for someone, and be hard to
even figure out it happened because the code will work but just be bigger
or slower.

* Split the implementation with some magic to ensure that libm functions
> only link in the presence of the correct libgcc,
>

I think this is the proper solution. It just puts better implementations in
the place the infrastructure already supports having a target specific
option.

* Establish an independent library specific to the Cortex M0 architecture,
> or
>

This is likely to get you the smallest number of users.  People have to
find it and then integrate it on their own. Don't make it hard for folks to
find and use your work.


* Something else entirely...
>
> If there is any interest in incorporating this work into GCC, please
> advise.
>

I think so but I am just one voice from the RTEMS community. But I think
any M0 user would be pleased.

--joel

>
> Thanks,
> Daniel Engel
>


Re: Extracting live registers

2018-11-06 Thread Paulo Matos



On 07/11/2018 00:40, Segher Boessenkool wrote:
> Hi Paulo,
> 
> -fdump-rtl-alignments[-all] is the last dump with all that information I
> think.  This one also has all this info without -all it seems.  With -all
> it shows it interleaving the RTL dump as well, which may or may not be
> handy for you.
> 

Thanks, however it provides no correspondence to the set of asm
instructions in the basic block. After you mentioned
-fdump-rtl-alignments, I tried a few related flags and hit upon what I
thought would work: -dA and -dP, but unfortunately these don't output
live out information per basic block so it's not helpful for my
application. It would be great if -dA or -dP would show live out info as
well, but that doesn't seem to be the case at the moment.

-- 
Paulo Matos