[Bug libgcc/108279] Improved speed for float128 routines

2023-01-14 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279

--- Comment #12 from Michael_S  ---
(In reply to Thomas Koenig from comment #10)
> What we would need for incorporation into gcc is to have several
> functions, which would then called depending on which floating point
> options are in force at the time of invocation.
> 
> So, let's go through the gcc options, to see what would fit where. Walking
> down the options tree, depth first.
> 
> From the gcc docs:
> 
> '-ffast-math'
>  Sets the options '-fno-math-errno', '-funsafe-math-optimizations',
>  '-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans',
>  '-fcx-limited-range' and '-fexcess-precision=fast'.
> 
> -fno-math-errno is irrelevant in this context, no need to look at that.
> 
> '-funsafe-math-optimizations'
> 
>  Allow optimizations for floating-point arithmetic that (a) assume
>  that arguments and results are valid and (b) may violate IEEE or
>  ANSI standards.  When used at link time, it may include libraries
>  or startup files that change the default FPU control word or other
>  similar optimizations.
> 
>  This option is not turned on by any '-O' option since it can result
>  in incorrect output for programs that depend on an exact
>  implementation of IEEE or ISO rules/specifications for math
>  functions.  It may, however, yield faster code for programs that do
>  not require the guarantees of these specifications.  Enables
>  '-fno-signed-zeros', '-fno-trapping-math', '-fassociative-math' and
>  '-freciprocal-math'.
> 
> '-fno-signed-zeros'
>  Allow optimizations for floating-point arithmetic that ignore the
>  signedness of zero.  IEEE arithmetic specifies the behavior of
>  distinct +0.0 and -0.0 values, which then prohibits simplification
>  of expressions such as x+0.0 or 0.0*x (even with
>  '-ffinite-math-only').  This option implies that the sign of a zero
>  result isn't significant.
> 
>  The default is '-fsigned-zeros'.
> 
> I don't think this options is relevant.
> 
> '-fno-trapping-math'
>  Compile code assuming that floating-point operations cannot
>  generate user-visible traps.  These traps include division by zero,
>  overflow, underflow, inexact result and invalid operation.  This
>  option requires that '-fno-signaling-nans' be in effect.  Setting
>  this option may allow faster code if one relies on "non-stop" IEEE
>  arithmetic, for example.
> 
>  This option should never be turned on by any '-O' option since it
>  can result in incorrect output for programs that depend on an exact
>  implementation of IEEE or ISO rules/specifications for math
>  functions.
> 
>  The default is '-ftrapping-math'.
> 
> Relevant.
> 
> '-ffinite-math-only'
>  Allow optimizations for floating-point arithmetic that assume that
>  arguments and results are not NaNs or +-Infs.
> 
>  This option is not turned on by any '-O' option since it can result
>  in incorrect output for programs that depend on an exact
>  implementation of IEEE or ISO rules/specifications for math
>  functions.  It may, however, yield faster code for programs that do
>  not require the guarantees of these specifications.
> 
> This does not have further suboptions. Relevant.
> 
> '-fassociative-math'
> 
>  Allow re-association of operands in series of floating-point
>  operations.  This violates the ISO C and C++ language standard by
>  possibly changing computation result.  NOTE: re-ordering may change
>  the sign of zero as well as ignore NaNs and inhibit or create
>  underflow or overflow (and thus cannot be used on code that relies
>  on rounding behavior like '(x + 2**52) - 2**52'.  May also reorder
>  floating-point comparisons and thus may not be used when ordered
>  comparisons are required.  This option requires that both
>  '-fno-signed-zeros' and '-fno-trapping-math' be in effect.
>  Moreover, it doesn't make much sense with '-frounding-math'.  For
>  Fortran the option is automatically enabled when both
>  '-fno-signed-zeros' and '-fno-trapping-math' are in effect.
> 
>  The default is '-fno-associative-math'.
> 
> Not relevant, I think - this influences compiler optimizations.
> 
> '-freciprocal-math'
> 
>  Allow the reciprocal of a value to be used instead of dividing by
>  the value if this enables optimizations.  For example 'x / y' can
>  be replaced with 'x * (1/y)', which is useful if '(1/y)' is subject
>  to common subexpression elimination.  Note that this loses
>  precision and increases the number of flops operating on the value.
> 
>  The default is '-fno-reciprocal-math'.
> 
> Again, not relevant.
> 
> 
> '-frounding-math'
>  Disable transformations and optimizations that assume default
>  floating-point rounding behavior.  This is round-to-zero for all
>  floating point to integer 

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-14 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279

--- Comment #11 from Michael_S  ---
(In reply to Thomas Koenig from comment #9)
> Created attachment 54273 [details]
> matmul_r16.i
> 
> Here is matmul_r16.i from a relatively recent trunk.

Thank you.
Unfortunately, I was not able to link it with main in Fortran.
So, I still only have to guess why even after replacement of __multf3 and
__addtf3 by my implementations it is still more than twice slower (on Zen3)
then what it should be.
Looking at source and assuming that inner loop starts at line 8944, this
loop looks very strange, but apart from bad programming style and apart from
misunderstanding of what is optimal scheduling there is nothing criminal about
it. May be, because of wrong scheduling, it is 10-15% slower than the best, but
certainly it should not be 2.4 times slower that I am seeing.


That's what I got from linker:
/usr/bin/ld: matmul_r16.o: warning: relocation against
`_gfortrani_matmul_r16_avx128_fma3' in read-only section `.text'
/usr/bin/ld: matmul_r16.o: in function `matmul_r16_avx':
matmul_r16.c:(.text+0x48): undefined reference to `_gfortrani_compile_options'
/usr/bin/ld: matmul_r16.c:(.text+0x342): undefined reference to
`_gfortrani_compile_options'
/usr/bin/ld: matmul_r16.c:(.text+0x10e4): undefined reference to
`_gfortrani_size0'
/usr/bin/ld: matmul_r16.c:(.text+0x10f1): undefined reference to
`_gfortrani_xmallocarray'
/usr/bin/ld: matmul_r16.c:(.text+0x12e5): undefined reference to
`_gfortrani_runtime_error'
/usr/bin/ld: matmul_r16.c:(.text+0x13a9): undefined reference to
`_gfortrani_runtime_error'
/usr/bin/ld: matmul_r16.c:(.text+0x13e7): undefined reference to
`_gfortrani_runtime_error'
/usr/bin/ld: matmul_r16.o: in function `matmul_r16_avx2':
matmul_r16.c:(.text+0x24c8): undefined reference to
`_gfortrani_compile_options'
/usr/bin/ld: matmul_r16.c:(.text+0x27c2): undefined reference to
`_gfortrani_compile_options'
/usr/bin/ld: matmul_r16.c:(.text+0x3564): undefined reference to
`_gfortrani_size0'
/usr/bin/ld: matmul_r16.c:(.text+0x3571): undefined reference to
`_gfortrani_xmallocarray'
/usr/bin/ld: matmul_r16.c:(.text+0x3765): undefined reference to
`_gfortrani_runtime_error'
/usr/bin/ld: matmul_r16.c:(.text+0x3829): undefined reference to
`_gfortrani_runtime_error'
/usr/bin/ld: matmul_r16.c:(.text+0x3867): undefined reference to
`_gfortrani_runtime_error'
/usr/bin/ld: matmul_r16.o: in function `matmul_r16_avx512f':
matmul_r16.c:(.text+0x4948): undefined reference to
`_gfortrani_compile_options'
/usr/bin/ld: matmul_r16.c:(.text+0x4c47): undefined reference to
`_gfortrani_compile_options'
/usr/bin/ld: matmul_r16.c:(.text+0x5a32): undefined reference to
`_gfortrani_size0'
/usr/bin/ld: matmul_r16.c:(.text+0x5a3f): undefined reference to
`_gfortrani_xmallocarray'
/usr/bin/ld: matmul_r16.c:(.text+0x5c35): undefined reference to
`_gfortrani_runtime_error'
/usr/bin/ld: matmul_r16.c:(.text+0x5cfb): undefined reference to
`_gfortrani_runtime_error'
/usr/bin/ld: matmul_r16.c:(.text+0x5d35): undefined reference to
`_gfortrani_runtime_error'
/usr/bin/ld: matmul_r16.o: in function `matmul_r16_vanilla':
matmul_r16.c:(.text+0x6de8): undefined reference to
`_gfortrani_compile_options'
/usr/bin/ld: matmul_r16.c:(.text+0x70e2): undefined reference to
`_gfortrani_compile_options'
/usr/bin/ld: matmul_r16.c:(.text+0x7e84): undefined reference to
`_gfortrani_size0'
/usr/bin/ld: matmul_r16.c:(.text+0x7e91): undefined reference to
`_gfortrani_xmallocarray'
/usr/bin/ld: matmul_r16.c:(.text+0x8085): undefined reference to
`_gfortrani_runtime_error'
/usr/bin/ld: matmul_r16.c:(.text+0x8149): undefined reference to
`_gfortrani_runtime_error'
/usr/bin/ld: matmul_r16.c:(.text+0x8187): undefined reference to
`_gfortrani_runtime_error'
/usr/bin/ld: matmul_r16.o: in function `_gfortran_matmul_r16':
matmul_r16.c:(.text+0x92b7): undefined reference to
`_gfortrani_matmul_r16_avx128_fma3'
/usr/bin/ld: matmul_r16.c:(.text+0x92ea): undefined reference to
`_gfortrani_matmul_r16_avx128_fma4'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
collect2: error: ld returned 1 exit status

[Bug tree-optimization/99408] s3251 benchmark of TSVC vectorized by clang runs about 7 times faster compared to gcc

2023-01-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408

--- Comment #4 from Jan Hubicka  ---
On Zen4 it is 20s for gcc and 6.9s for aocc, so still a problem.

[Bug middle-end/108376] TSVC s1279 runs 40% faster with aocc than gcc at zen4

2023-01-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108376

--- Comment #3 from Jan Hubicka  ---
If I make the arrays random then GCC code is indeed faster:
#include 
#include 

typedef float real_t;
#define iterations 100
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D];
real_t aa[LEN_2D][LEN_2D];
real_t bb[LEN_2D][LEN_2D];
real_t cc[LEN_2D][LEN_2D];
real_t qq;
int
main(void)
{
//reductions
//if to max reduction

real_t x;
for (int i = 0; i < LEN_1D; i++)
{
   a[i]=(rand() %5) - 3;
   b[i]=(rand() %6) - 3;
}
for (int nl = 0; nl < iterations; nl++) {
for (int i = 0; i < LEN_1D; i++) {
if (a[i] < (real_t)0.) {
if (b[i] > a[i]) {
c[i] += d[i] * e[i];
}
}
}
//dummy(a, b, c, d, e, aa, bb, cc, 0.);
}

return x;
}

jh@alberti:~/tsvc/bin> ~/aocc-compiler-4.0.0/bin/clang -Ofast s1279.c
-march=native
s1279.c:23:14: warning: implicit declaration of function 'rand' is invalid in
C99 [-Wimplicit-function-declaration]
   a[i]=(rand() %5) - 3;
 ^
1 warning generated.
jh@alberti:~/tsvc/bin> time ./a.out

real0m5.638s
user0m5.636s
sys 0m0.000s
jh@alberti:~/tsvc/bin> ~/trunk-install/bin/gcc -Ofast s1279.c -march=native
s1279.c: In function 'main':
s1279.c:23:14: warning: implicit declaration of function 'rand'
[-Wimplicit-function-declaration]
   23 |a[i]=(rand() %5) - 3;
  |  ^~~~
jh@alberti:~/tsvc/bin> time ./a.out

real0m2.791s
user0m2.790s
sys 0m0.000s


sorry for wrong code, just for reference the loop compiles as:
.L4:
xorl%eax, %eax
.p2align 4
.p2align 3
.L3:
vmovaps a(%rax), %ymm2
vmovaps b(%rax), %ymm3
vmovaps c(%rax), %ymm6
addq$32, %rax
vmovaps c-32(%rax), %ymm0
vmovaps e-32(%rax), %ymm4
vcmpps  $1, %ymm1, %ymm2, %k1
vcmpps  $14, %ymm2, %ymm3, %k1{%k1}
vfmadd231ps d-32(%rax), %ymm4, %ymm0{%k1}
vfmadd231ps d-32(%rax), %ymm4, %ymm0
vblendmps   %ymm0, %ymm6, %ymm0{%k1}
vmovaps %ymm0, c-32(%rax)
cmpq$128000, %rax
jne .L3
decl%edx
jne .L4

[Bug libstdc++/108409] std::chrono::current_zone() doesn't work on AIX

2023-01-14 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108409

--- Comment #3 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #0)
> We should parse the TZ env var and see if it is already an IANA name, and
> handle a few other special cases. E.g. gcc119 in the cfarm hax TZ=CUT0 which
> means a time zone named "CUT" (coordinated universal time) with a 0 offset
> from UTC. So map to UTC. More generally, "FOOn" is a time zone called "FOO"
> with a -n offset, so we could map any such string to "Etc/GMT-n"

It now works if TZ contains an IANA time zone name, or any string matching
"???0". If the systemwide TZ isn't one of those, users can define TZ for their
own programs' environment.

I don't know if that's good enough.

[Bug libstdc++/108409] std::chrono::current_zone() doesn't work on AIX

2023-01-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108409

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:d80e5a7b30e5d045c808f5235123e366e4e9286c

commit r13-5170-gd80e5a7b30e5d045c808f5235123e366e4e9286c
Author: Jonathan Wakely 
Date:   Sat Jan 14 20:13:32 2023 +

libstdc++: Implement std::chrono::current_zone() for AIX [PR108409]

libstdc++-v3/ChangeLog:

PR libstdc++/108409
* src/c++20/tzdb.cc (current_zone()) [_AIX]: Use TZ environment
variable.

[Bug ipa/56139] [10/11/12/13 Regression] unmodified static data could go in .rodata, not .data

2023-01-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56139

Jan Hubicka  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #4 from Jan Hubicka  ---
I have some code that makes it possible to attach summaries to references (like
we do for calls and symbols) and then mark addresses that are never used for
compoarsion.  Similarly we could probably mark readonly addresses.
We could even squeeze out a bit in the reference representation itself.
Is there easy way to tell if address is never read from during IPA summary
generation time?

[Bug bootstrap/107950] partial LTO linking of libbackend.a: gcc/gcc-rich-location.cc:207: undefined reference to `range_label_for_type_mismatch::get_text(unsigned int) const'

2023-01-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107950

--- Comment #7 from Jan Hubicka  ---
Thanks for looking into the incremental link of libbackend. I had it in my tree
for a while but never got around implementing correct way to enable it only
during bootstrap since host compiler may not support it. It would be nice to
have it in since it should reduce WPA memory use and also test this code path.

I also think it is the case where partial linking makes the symbol to be pulled
into LTO binary at the initial link time.  It should be optimized away if
linker was not complaining.

[Bug middle-end/108410] New: x264 averaging loop not optimized well for avx512

2023-01-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410

Bug ID: 108410
   Summary: x264 averaging loop not optimized well for avx512
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

x264 benchmark has a loop averaging two unsigned char arrays that is executed
with relatively low trip counts that does not play well with our vectorized
code.  For AVX512 most time is spent in unvectorized variant since the average
number of iterations is too small to reach the vector code.

This table shows runtimes of averaging given block size with scalar loop,
vectorized loop for individual vector sizes and aocc codegen:

size   scalar 128 256 512aocc
28.139.499.499.499.49
45.796.106.107.456.78
65.445.435.426.785.87
85.192.715.316.445.42
   125.143.175.336.104.97
   164.851.191.535.931.36
   204.822.031.906.101.90
   244.600.962.586.102.26
   284.511.552.976.002.55
   324.520.680.600.600.77
   344.770.960.880.800.96
   384.421.361.371.171.29
   424.400.841.821.731.63

So for sizes 2-8 scalar loop wins.
For sizes 12-16 128bit vectorization wins, 20-28 behaves funily.
However avx512 vectorization is a huge loss for all sizes up to 31 bytes.
aocc seems to win for 16 bytes.

Note that one problem is that for 256bit vector we peel the epilogue loop
(since trip counts fits in max-completely-peeled-insns and
max-completely-peel-times. Bumping both twice makes avx512 prologue unrolled
too but it does not seem to help x264 benchmark itself.

bmk.c:
#include 
unsigned char a[1];
unsigned char b[1];
unsigned char c[1];

__attribute__ ((weak))
void
avg (unsigned char *a, unsigned char *b, unsigned char *c, int size)
{
  for (int i = 0; i > 1;
}
}
int
main(int argc, char**argv)
{
  int size = atoi (argv[1]);
  for (long i = 0 ; i < 100/size; i++)
{
  avg (a,b,c,size);
}
  return 0;
}
#include 
unsigned char a[1];
unsigned char b[1];
unsigned char c[1];

__attribute__ ((weak))
void
avg (unsigned char *a, unsigned char *b, unsigned char *c, int size)
{
  for (int i = 0; i > 1;
}
}
int
main(int argc, char**argv)
{
  int size = atoi (argv[1]);
  for (long i = 0 ; i < 100/size; i++)
{
  avg (a,b,c,size);
}
  return 0;
}

bmk.sh:
gcc -Ofast -march=native bmk.c -fno-tree-vectorize -o bmk.scalar
gcc -Ofast -march=native bmk.c -mprefer-vector-width=128 -o bmk.128
gcc -Ofast -march=native bmk.c -mprefer-vector-width=256 -o bmk.256
gcc -Ofast -march=native bmk.c -mprefer-vector-width=512 -o bmk.512
~/aocc-compiler-4.0.0//bin/clang -Ofast -march=native bmk.c -o bmk.aocc

echo "size   scalar 128 256 512aocc"
for size in 2 4 6 8 12 16 20 24 28 32 34 38 42
do
  scalar=`time -f "%e" ./bmk.scalar $size 2>&1`
  v128=`time -f "%e" ./bmk.128 $size 2>&1`
  v256=`time -f "%e" ./bmk.256 $size 2>&1`
  v512=`time -f "%e" ./bmk.512 $size 2>&1`
  aocc=`time -f "%e" ./bmk.aocc $size 2>&1`
  printf "%5i %7.2f %7.2f %7.2f %7.2f %7.2f\n" $size $scalar $v128 $v256 $v512
$aocc
done


aocc codegen:
# %bb.0:# %entry
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset %rbx, -16
testl   %ecx, %ecx
jle .LBB0_15
# %bb.1:# %iter.check
movl%ecx, %r8d
cmpl$16, %ecx
jae .LBB0_3
# %bb.2:
xorl%eax, %eax
jmp .LBB0_14
.LBB0_3:# %vector.memcheck
leaq(%rsi,%r8), %r9
leaq(%rdi,%r8), %rax
leaq(%rdx,%r8), %r10
cmpq%rdi, %r9
seta%r11b
cmpq%rsi, %rax
seta%bl
cmpq%rdi, %r10
seta%r9b
cmpq%rdx, %rax
seta%r10b
xorl%eax, %eax
testb   %bl, %r11b
jne .LBB0_14
# %bb.4:# %vector.memcheck
andb%r10b, %r9b
jne .LBB0_14
# %bb.5:# %vector.main.loop.iter.check
cmpl$128, %ecx
jae .LBB0_7
# %bb.6:
xorl%eax, %eax
jmp .LBB0_11
.LBB0_7:# %vector.ph
movl%r8d, %eax
andl$-128, %eax
xorl%ecx, %ecx
.p2align4, 0x90
.LBB0_8:# %vector.body
# =>This Inner Loop Header: Depth=1
vmovdqu (%rdx,%rcx), %ymm0
vmovdqu 32(%rdx,%rcx), %ymm1

[Bug c++/108407] SegFault with structured binding and OpenMP without optimization

2023-01-14 Thread mmoelle1 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108407

--- Comment #4 from Matthias Möller  ---
Thank you, I have changed the code as suggested and it compiles and runs fine
in all optimization levels including '-O0'.

[Bug libstdc++/108409] std::chrono::current_zone() doesn't work on AIX

2023-01-14 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108409

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-01-14

--- Comment #1 from Jonathan Wakely  ---
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03
describes the format of the TZ variable when it's not an IANA name.

When TZ does not name an IANA zone, we could potentially create a new
chrono::time_zone object, generated from the std and dst names, with the
appropriate offsets and DST transitions. Then current_zone() would return a
pointer to that custom zone.

[Bug libstdc++/108409] New: std::chrono::current_zone() doesn't work on AIX

2023-01-14 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108409

Bug ID: 108409
   Summary: std::chrono::current_zone() doesn't work on AIX
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---
Target: *-*aix*

terminate called after throwing an instance of 'std::runtime_error'
  what():  tzdb: cannot determine current zone
FAIL: std/time/tzdb/1.cc execution test

terminate called after throwing an instance of 'std::runtime_error'
  what():  tzdb: cannot determine current zone
FAIL: std/time/zoned_time/custom.cc execution test


The std::chrono::current_zone() function is supposed to determine the machine's
time zone. As noted in libstdc++-v3/src/c++20/tzdb.cc:

// TODO AIX stores current zone in $TZ in /etc/environment but the value
// is typically a POSIX time zone name, not IANA zone.
// https://developer.ibm.com/articles/au-aix-posix/
// https://www.ibm.com/support/pages/managing-time-zone-variable-posix

__throw_runtime_error("tzdb: cannot determine current zone");

How should we solve this?

We should parse the TZ env var and see if it is already an IANA name, and
handle a few other special cases. E.g. gcc119 in the cfarm hax TZ=CUT0 which
means a time zone named "CUT" (coordinated universal time) with a 0 offset from
UTC. So map to UTC. More generally, "FOOn" is a time zone called "FOO" with a
-n offset, so we could map any such string to "Etc/GMT-n"

We could add some AIX-specific extension point, so programs can tell the
library the IANA (aka Olson) name of the current time zone. Maybe read it from
another file, something configurable and controlled by the user/program. But if
we handle the TZ variable, users can just set that in their program's env and
another extension point probably isn't needed.

[Bug tree-optimization/92342] [10/11/12/13 Regression] a small missed transformation into x?b:0

2023-01-14 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92342

--- Comment #29 from Gabriel Ravier  ---
Looks like the patch fixes this bug, unless I'm missing something.

[Bug c++/108407] SegFault with structured binding and OpenMP without optimization

2023-01-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108407

--- Comment #3 from Andrew Pinski  ---
If you do:
  return std::tuple(a,b);

You don't get the reference.

[Bug c++/108407] SegFault with structured binding and OpenMP without optimization

2023-01-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108407

--- Comment #2 from Andrew Pinski  ---
>  return std::tie(a,b);

That returns a reference to the two local variables. Both have now gone out of
scope.

[Bug c++/108407] SegFault with structured binding and OpenMP without optimization

2023-01-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108407

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
With -fsanitize=undefined,address we get:

=
==1==ERROR: AddressSanitizer: stack-use-after-return on address 0x7ff991800030
at pc 0x004016c4 bp 0x7ffd675ea150 sp 0x7ffd675ea148
READ of size 4 at 0x7ff991800030 thread T0
#0 0x4016c3 in main /app/example.cpp:19
#1 0x7ff993d50082 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId:
1878e6b475720c7c51969e69ab2d276fae6d1dee)
#2 0x40115d in _start (/app/output.s+0x40115d) (BuildId:
c6fef22ac59389c6ed0248b91200737f3dfa67d0)

Address 0x7ff991800030 is located in stack of thread T0 at offset 48 in frame
#0 0x401225 in create() /app/example.cpp:7

  This frame has 2 object(s):
[48, 52) 'a' (line 8) <== Memory access at offset 48 is inside this
variable
[64, 72) 'b' (line 9)
HINT: this may be a false positive if your program uses some custom stack
unwind mechanism, swapcontext or vfork
  (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-return /app/example.cpp:19 in main
Shadow bytes around the buggy address:
  0x7ff9917ffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ff9917ffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ff9917ffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ff9917fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ff9917fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x7ff99180: f5 f5 f5 f5 f5 f5[f5]f5 f5 f5 f5 f5 00 00 00 00
  0x7ff991800080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ff991800100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ff991800180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ff991800200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ff991800280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:   fa
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:f7
  Container overflow:  fc
  Array cookie:ac
  Intra object redzone:bb
  ASan internal:   fe
  Left alloca redzone: ca
  Right alloca redzone:cb
==1==ABORTING

[Bug d/108408] New: libphobos: Support building on *-*-cygwin

2023-01-14 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108408

Bug ID: 108408
   Summary: libphobos: Support building on *-*-cygwin
   Product: gcc
   Version: 11.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: d
  Assignee: ibuclaw at gdcproject dot org
  Reporter: nightstrike at gmail dot com
  Target Milestone: ---

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99794 for reference.

This PR is tracking the state of building libphobos on cygwin using 11.3, the
last compiler that can be bootstrapped natively.  Currently, it fails due to
missing definitions for FILE, snprintf, time_t, and clock_t.  I'm currently
trying to add them to libphobos/libdruntime/core/stdc/stdio.d, because the path
for CRuntime_Newlib is currently missing.  I seem to be falling down a rabbit
hole of needing to define struct after struct, so I'm trying to gather the work
here in hopes that some kind soul can help.

I should clarify that I'm unfamiliar with D, so what I'm putting here
definitely needs someone with experience to finish the work.  My hope is to
just get it far enough along that others can do so.  Whoever put in the gcc
warning that automatically converts function pointers for you, please accept my
thanks!

I started by lifting the structure definitions from cygwin's newlib, which
unfortunately have quite a few conditional components.  I don't know how
important this is if for instance a structure has:

struct S {
#ifdef A
int a;
#else
void * b;
};

or some variation of all manner of conditions that change the struct layout and
the total size.

Guidance as to whether this is the right approach or a total waste of time
would be appreciated :).  With what I have so far, I'm down to just the
following:

/cygdrive/k/gcc/src/gcc-git/libphobos/libdruntime/core/sys/posix/stdc/time.d:52:15:
error: module core.sys.posix.sys.types import 'time_t' not found
   52 | public import core.sys.posix.sys.types : time_t, clock_t;
  |   ^
/cygdrive/k/gcc/src/gcc-git/libphobos/libdruntime/core/sys/posix/stdc/time.d:52:15:
error: module core.sys.posix.sys.types import 'clock_t' not found
   52 | public import core.sys.posix.sys.types : time_t, clock_t;
  |   ^
/cygdrive/k/gcc/src/gcc-git/libphobos/libdruntime/core/stdc/stdio.d:1514:9:
error: undefined identifier 'fpos_t', did you mean alias '_fpos_t'?
 1514 | int fgetpos(FILE* stream, scope fpos_t * pos);
  | ^
/cygdrive/k/gcc/src/gcc-git/libphobos/libdruntime/core/stdc/stdio.d:1516:9:
error: undefined identifier 'fpos_t', did you mean alias '_fpos_t'?
 1516 | int fsetpos(FILE* stream, scope const fpos_t* pos);
  | ^
../../../../libphobos/libdruntime/core/demangle.d:2615:16: error: module
core.stdc.stdio import 'snprintf' not found, did you mean function
'core.stdc.stdio.sprintf'?
 2615 | import core.stdc.stdio : snprintf;
  |^


This is the diff so far:


diff --git a/libphobos/libdruntime/core/stdc/stdio.d
b/libphobos/libdruntime/core/stdc/stdio.d
index c76b922a3eb..52bcc9d7cdd 100644
--- a/libphobos/libdruntime/core/stdc/stdio.d
+++ b/libphobos/libdruntime/core/stdc/stdio.d
@@ -397,6 +397,196 @@ else version (CRuntime_Microsoft)
 ///
 alias shared(_iobuf) FILE;
 }
+else version (CRuntime_Newlib)
+{
+alias long  _off64_t;
+alias long  _fpos_t;
+alias long  _fpos64_t;
+alias int   _float_t;
+
+struct __sbuf {
+char*   _base;
+int _size;
+}
+
+struct _mbstate_t {
+int _count;
+union {
+dchar   _wch;
+char[4] _wchb;
+}
+}
+
+struct _rand48 {
+ushort[3] _seed;
+ushort[3] _mult;
+ushort _add;
+}
+
+struct __tm {
+int   __tm_sec;
+int   __tm_min;
+int   __tm_hour;
+int   __tm_mday;
+int   __tm_mon;
+int   __tm_year;
+int   __tm_wday;
+int   __tm_yday;
+int   __tm_isdst;
+}
+
+struct __lc_cats {
+const void*  ptr;
+char*buf;
+}
+
+struct lconv {
+char* decimal_point;
+char* thousands_sep;
+char* grouping;
+char* int_curr_symbol;
+char* currency_symbol;
+char* mon_decimal_point;
+char* mon_thousands_sep;
+char* mon_grouping;
+char* positive_sign;
+char* negative_sign;
+char int_frac_digits;
+char frac_digits;
+char p_cs_precedes;
+char p_sep_by_space;
+char n_cs_precedes;
+char n_sep_by_space;
+char p_sign_posn;
+char n_sign_posn;
+char int_n_cs_precedes;
+char int_n_sep_by_space;
+char int_n_sign_posn;
+char int_p_cs_precedes;
+char int_p_sep_by_space;
+char int_p_sign_posn;
+}
+
+struct __locale_t {
+char[7][31 + 1] categories; 

[Bug c++/108407] New: SegFault with structured binding and OpenMP without optimization

2023-01-14 Thread mmoelle1 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108407

Bug ID: 108407
   Summary: SegFault with structured binding and OpenMP without
optimization
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mmoelle1 at gmail dot com
  Target Milestone: ---

The following code snippet compiles well but the binary stops with a
segmentation fault when compiled with 'g++ -O0  -std=c++17'. When compiled with
optimization (-O1 or better) turned on the binary works fine.

#include 
#ifdef OPENMMP_
#include 
#endif

auto create()
{
  inta = 10;
  double b = 1.0;
  return std::tie(a,b);
}

int main()
{
  auto [a, b] = create();
  double vector[100];

#pragma omp parallel for
  for (int i=0; i

[Bug middle-end/108300] `abort()` macro cause bootstrap failure on *-w64-mingw32

2023-01-14 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108300

nightstrike  changed:

   What|Removed |Added

 CC||nightstrike at gmail dot com

--- Comment #15 from nightstrike  ---
Someone on irc (jakub?) suggested just changing all of the aborts to
gcc_unreachable. Is that a viable option?

[Bug libstdc++/107189] Inconsistent range insertion implementations in std::_Rb_tree in

2023-01-14 Thread fdumont at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107189

François Dumont  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |13.0

--- Comment #3 from François Dumont  ---
I am making this bug resolved for the useless _Alloc_node instance. Regarding
the inconsistent implementation feel free to open another issue with more
explanations.

Thanks

[Bug target/82028] Windows x86_64 should not pass float aggregates in xmm

2023-01-14 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82028

nightstrike  changed:

   What|Removed |Added

 CC||nightstrike at gmail dot com

--- Comment #5 from nightstrike  ---
(In reply to jon_y from comment #4)
> I can't seem to change the bug status to confirmed.

"NEW" is confirmed

[Bug target/90256] Optimizer with interrupt routines

2023-01-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90256

--- Comment #4 from Andrew Pinski  ---
The reason why it is target specific is because the attribute interrupt is
target specific and ipa-icf code has no knowledge of it. Basically the x86_64
backend when it sees interrupt attribute it should also add no_icf attribute.

[Bug target/90256] Optimizer with interrupt routines

2023-01-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90256

--- Comment #3 from Andrew Pinski  ---
Easy work around is add to the attribute, noipa.

[Bug target/90256] Optimizer with interrupt routines

2023-01-14 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90256

nightstrike  changed:

   What|Removed |Added

 CC||nightstrike at gmail dot com

--- Comment #2 from nightstrike  ---
This is not target specific (or at least it also happens on x86_64-pc-linux).

[Bug tree-optimization/106103] ICE in binds_to_current_def_p when source object files are compiled with -flto -Os

2023-01-14 Thread ivanka2012 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106103

--- Comment #2 from Ivan  ---
Putting -fno-declone-ctor-dtor in the flags "fixes" the bug.

[Bug ipa/108383] g++ ICE with -O3 and -flto and -fdeclone-ctor-dtor on simple function

2023-01-14 Thread ivanka2012 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108383

Ivan  changed:

   What|Removed |Added

 CC||ivanka2012 at gmail dot com

--- Comment #5 from Ivan  ---
This is related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106103

[Bug c++/80561] Missed optimization: std::array data is aligned if array is aligned

2023-01-14 Thread jzwinck at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80561

John Zwinck  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from John Zwinck  ---
This was fixed in GCC 8.  Thank you.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2023-01-14 Thread jzwinck at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 80561, which changed state.

Bug 80561 Summary: Missed optimization: std::array data is aligned if array is 
aligned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80561

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug modula2/108405] modula-2: Testsuite fails: concurrentstore.mod, contimer.mod, tinytimer.mod on Darwin (and likely elsewhere)

2023-01-14 Thread schwab--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108405

--- Comment #3 from Andreas Schwab  ---
NPTL does not have the alignment restriction.

[Bug tree-optimization/108406] New: Missed integer optimization on x86-64 unless -fwrapv is used

2023-01-14 Thread jzwinck at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108406

Bug ID: 108406
   Summary: Missed integer optimization on x86-64 unless -fwrapv
is used
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jzwinck at gmail dot com
  Target Milestone: ---

Consider this C++ code:

#include 

// returns a if less than b or if b is INT32_MIN
int32_t special_min(int32_t a, int32_t b)
{
return a < b || b == INT32_MIN ? a : b;
}

GCC with -fwrapv correctly realizes that subtracting 1 from b can eliminate the
special case, and it generates this code for x86-64:

lea edx, [rsi-1]
mov eax, edi
cmp edi, edx
cmovg   eax, esi
ret

But without -fwrapv it generates worse code:

mov eax, esi
cmp edi, esi
jl  .L4
cmp esi, -2147483648
je  .L4
ret
.L4:
mov eax, edi
ret

If I wrote "hand optimized" C++ code trying to implement that optimization, I
understand -fwrapv would be required, otherwise the compiler could decide the
signed overflow is UB. But here the compiler is in control, it knows the
behavior of integer overflow on x86-64, and so it should not matter whether
-fwrapv is used.

Demo: https://godbolt.org/z/o881Mdqoa

Stack Overflow discussion:
https://stackoverflow.com/questions/75110108/gcc-wont-use-its-own-optimization-trick-without-fwrapv

This is somewhat related to #102032 in the sense that it's an optimization
missed without -fwrapv, but the type of optimization is different.  It is
possible there's a single solution that would solve both problems (and others).

[Bug modula2/108405] modula-2: Testsuite fails: concurrentstore.mod, contimer.mod, tinytimer.mod on Darwin (and likely elsewhere)

2023-01-14 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108405

--- Comment #2 from Iain Sandoe  ---
(In reply to Iain Sandoe from comment #1)
> note that a default size of 8Mb is not enough for either Linux or Arm64
> Darwin (both have PTHREAD_STACK_MIN of 16384).

this is, of course, rubbish .. the default is 8Mb not 8k (which is fine for
both 4096 and 16384 page sizes).

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-14 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279

--- Comment #10 from Thomas Koenig  ---
What we would need for incorporation into gcc is to have several
functions, which would then called depending on which floating point
options are in force at the time of invocation.

So, let's go through the gcc options, to see what would fit where. Walking
down the options tree, depth first.

>From the gcc docs:

'-ffast-math'
 Sets the options '-fno-math-errno', '-funsafe-math-optimizations',
 '-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans',
 '-fcx-limited-range' and '-fexcess-precision=fast'.

-fno-math-errno is irrelevant in this context, no need to look at that.

'-funsafe-math-optimizations'

 Allow optimizations for floating-point arithmetic that (a) assume
 that arguments and results are valid and (b) may violate IEEE or
 ANSI standards.  When used at link time, it may include libraries
 or startup files that change the default FPU control word or other
 similar optimizations.

 This option is not turned on by any '-O' option since it can result
 in incorrect output for programs that depend on an exact
 implementation of IEEE or ISO rules/specifications for math
 functions.  It may, however, yield faster code for programs that do
 not require the guarantees of these specifications.  Enables
 '-fno-signed-zeros', '-fno-trapping-math', '-fassociative-math' and
 '-freciprocal-math'.

'-fno-signed-zeros'
 Allow optimizations for floating-point arithmetic that ignore the
 signedness of zero.  IEEE arithmetic specifies the behavior of
 distinct +0.0 and -0.0 values, which then prohibits simplification
 of expressions such as x+0.0 or 0.0*x (even with
 '-ffinite-math-only').  This option implies that the sign of a zero
 result isn't significant.

 The default is '-fsigned-zeros'.

I don't think this options is relevant.

'-fno-trapping-math'
 Compile code assuming that floating-point operations cannot
 generate user-visible traps.  These traps include division by zero,
 overflow, underflow, inexact result and invalid operation.  This
 option requires that '-fno-signaling-nans' be in effect.  Setting
 this option may allow faster code if one relies on "non-stop" IEEE
 arithmetic, for example.

 This option should never be turned on by any '-O' option since it
 can result in incorrect output for programs that depend on an exact
 implementation of IEEE or ISO rules/specifications for math
 functions.

 The default is '-ftrapping-math'.

Relevant.

'-ffinite-math-only'
 Allow optimizations for floating-point arithmetic that assume that
 arguments and results are not NaNs or +-Infs.

 This option is not turned on by any '-O' option since it can result
 in incorrect output for programs that depend on an exact
 implementation of IEEE or ISO rules/specifications for math
 functions.  It may, however, yield faster code for programs that do
 not require the guarantees of these specifications.

This does not have further suboptions. Relevant.

'-fassociative-math'

 Allow re-association of operands in series of floating-point
 operations.  This violates the ISO C and C++ language standard by
 possibly changing computation result.  NOTE: re-ordering may change
 the sign of zero as well as ignore NaNs and inhibit or create
 underflow or overflow (and thus cannot be used on code that relies
 on rounding behavior like '(x + 2**52) - 2**52'.  May also reorder
 floating-point comparisons and thus may not be used when ordered
 comparisons are required.  This option requires that both
 '-fno-signed-zeros' and '-fno-trapping-math' be in effect.
 Moreover, it doesn't make much sense with '-frounding-math'.  For
 Fortran the option is automatically enabled when both
 '-fno-signed-zeros' and '-fno-trapping-math' are in effect.

 The default is '-fno-associative-math'.

Not relevant, I think - this influences compiler optimizations.

'-freciprocal-math'

 Allow the reciprocal of a value to be used instead of dividing by
 the value if this enables optimizations.  For example 'x / y' can
 be replaced with 'x * (1/y)', which is useful if '(1/y)' is subject
 to common subexpression elimination.  Note that this loses
 precision and increases the number of flops operating on the value.

 The default is '-fno-reciprocal-math'.

Again, not relevant.


'-frounding-math'
 Disable transformations and optimizations that assume default
 floating-point rounding behavior.  This is round-to-zero for all
 floating point to integer conversions, and round-to-nearest for all
 other arithmetic truncations.  This option should be specified for
 programs that change the FP rounding mode dynamically, or that may
 be executed with a non-default rounding mode.  This option disables
 

[Bug modula2/108405] modula-2: Testsuite fails: concurrentstore.mod, contimer.mod, tinytimer.mod on Darwin (and likely elsewhere)

2023-01-14 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108405

Iain Sandoe  changed:

   What|Removed |Added

 Target||x86_64-darwin
   Keywords||testsuite-fail

--- Comment #1 from Iain Sandoe  ---
note that a default size of 8Mb is not enough for either Linux or Arm64 Darwin
(both have PTHREAD_STACK_MIN of 16384).

[Bug modula2/108405] New: modula-2: Testsuite fails: concurrentstore.mod, contimer.mod, tinytimer.mod on Darwin (and likely elsewhere)

2023-01-14 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108405

Bug ID: 108405
   Summary: modula-2: Testsuite fails: concurrentstore.mod,
contimer.mod, tinytimer.mod on Darwin (and likely
elsewhere)
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: modula2
  Assignee: gaius at gcc dot gnu.org
  Reporter: iains at gcc dot gnu.org
  Target Milestone: ---

The test cases in the subject all fail on Darwin for the same reason, there is
an attempt to set a stack size that violates the constraints of
pthread_attr_setstacksize.

On Darwin;
 pthread_attr_setstacksize() will fail if:
 [EINVAL]   stacksize is less than PTHREAD_STACK_MIN
 [EINVAL]   stacksize is not a multiple of the system page size.

On Linux:
   pthread_attr_setstacksize() can fail with the following error:

   EINVAL The stack size is less than PTHREAD_STACK_MIN (16384) bytes.
   On some systems, pthread_attr_setstacksize() can fail with the error
EINVAL if stacksize is not a multiple of the system page size.

--- So the report reported on Darwin might well occur also on (at least some)
Linux systems.

The problem is in
 PROCEDURE initPreemptive (seconds, microsecs: CARDINAL) ;
which tries to call 
 Create (timer, 1000, MAX (Urgency), NIL, timerId) ;

Where 1000 violates the constraints on stack size (definitely on Darwin,
maybe on some Linux).

So .. the short-term solution is to fix initPreemptive to use a suitable value
(patch to be posted).

However:

1. We should have detected the bad user value earlier and thrown an exception?
2. It is not clear to me how these magic numbers (embedded in the library) have
been chosen (there is 8Mb as defaultSize and then here we add 10Mb)  perhaps
this is something that should be configured or at least set according to a
target query?

[Bug modula2/108404] New: M2RTS_Halt fails with a segv (it should emit a diagnostic and exit).

2023-01-14 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108404

Bug ID: 108404
   Summary: M2RTS_Halt fails with a segv (it should emit a
diagnostic and exit).
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: modula2
  Assignee: gaius at gcc dot gnu.org
  Reporter: iains at gcc dot gnu.org
  Target Milestone: ---

On Darwin several tests fail because there is an invalid stack size set (that
is a separate bug).

The fault should have been reported by M2RTS_Halt (it is detected correctly in
Rico.cc).

Setting a break point on the entry to M2RTS_Halt :

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x000154f0 concurrentstore.x0`M2RTS_Halt at
M2RTS.mod:296:1
   293  
   294  PROCEDURE Halt (file: ARRAY OF CHAR; line: CARDINAL;
   295  function: ARRAY OF CHAR; description: ARRAY OF CHAR) ;
-> 296  BEGIN
   297 ErrorMessage (description, file, line, function) ;
   298 HALT
   299  END Halt ;

examining the registers:

(lldb) reg read
General Purpose Registers:
   rax = 0x0016
   rbx = 0x62c08118
   rcx = 0x000100014c00  "failed to set stack size attribute"
   rdx = 0x000100014bf2  "initThread"
   rdi = 0x000100014b00 
"/src-local/gcc-master/libgm2/libm2iso/RTco.cc"
   rsi = 0x0172

this is correct ABI - RDI - file, RSI = line number, RDX = function, RCX =
message.
(four integer/pointer arguments).

 however if we continue from this point ...

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS
(code=2, address=0x3014e7178)
frame #0: 0x0001554f concurrentstore.x0`M2RTS_Halt at
M2RTS.mod:296:1
   293  
   294  PROCEDURE Halt (file: ARRAY OF CHAR; line: CARDINAL;
   295  function: ARRAY OF CHAR; description: ARRAY OF CHAR) ;
-> 296  BEGIN
   297 ErrorMessage (description, file, line, function) ;

I cannot (at present) debug this further since I do not have an installed
debugger that supports Module-2 (but it might  well repeat on Linux - the ABI
is basically the same).  In any case, it seems likely that the problem is in
the prologue or very early in the function since the break line is on BEGIN in
both cases.

[Bug c++/108365] [9/10/11/12 Regression] Wrong code with -O0

2023-01-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108365

Jakub Jelinek  changed:

   What|Removed |Added

Summary|[9/10/11/12/13 Regression]  |[9/10/11/12 Regression]
   |Wrong code with -O0 |Wrong code with -O0

--- Comment #8 from Jakub Jelinek  ---
Fixed on the trunk so far.
Guess for backports we want instead a minimal change (i.e. just the
+&& INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (op0, 0)))
and
+&& (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (op0, 0)))
+< TYPE_PRECISION (type0)))
additions for C++ FE).

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-14 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279

--- Comment #9 from Thomas Koenig  ---
Created attachment 54273
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54273=edit
matmul_r16.i

Here is matmul_r16.i from a relatively recent trunk.

[Bug c++/108365] [9/10/11/12/13 Regression] Wrong code with -O0

2023-01-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108365

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:5b3a88640f962d4ffca31ae651bed2d8672f1a8c

commit r13-5163-g5b3a88640f962d4ffca31ae651bed2d8672f1a8c
Author: Jakub Jelinek 
Date:   Sat Jan 14 10:17:14 2023 +0100

c++: Avoid incorrect shortening of divisions [PR108365]

The following testcase is miscompiled, because we shorten the division
in a case where it should not be shortened.
Divisions (and modulos) can be shortened if it is unsigned division/modulo,
or if it is signed division/modulo where we can prove the dividend will
not be the minimum signed value or divisor will not be -1, because e.g.
on sizeof(long long)==sizeof(int)*2 && __INT_MAX__ == 0x7fff targets
(-2147483647 - 1) / -1 is UB
but
(int) (-2147483648LL / -1LL) is not, it is -2147483648.
The primary aim of both the C and C++ FE division/modulo shortening I
assume
was for the implicit integral promotions of {,signed,unsigned} {char,short}
and because at this point we have no VRP information etc., the shortening
is done if the integral promotion is from unsigned type for the divisor
or if the dividend is an integer constant other than -1.
This works fine for char/short -> int promotions when char/short have
smaller precision than int - unsigned char -> int or unsigned short -> int
will always be a positive int, so never the most negative.

Now, the C FE checks whether orig_op0 is TYPE_UNSIGNED where op0 is either
the same as orig_op0 or that promoted to int, I think that works fine,
if it isn't promoted, either the division/modulo common type will have the
same precision as op0 but then the division/modulo is unsigned and so
without UB, or it will be done in wider precision (e.g. because op1 has
wider precision), but then op0 can't be minimum signed value.  Or it has
been promoted to int, but in that case it was again from narrower type and
so never minimum signed int.

But the C++ FE was checking if op0 is a NOP_EXPR from TYPE_UNSIGNED.
First of all, not sure if the operand of NOP_EXPR couldn't be non-integral
type where TYPE_UNSIGNED wouldn't be meaningful, but more importantly,
even if it is a cast from unsigned integral type, we only know it can't be
minimum signed value if it is a widening cast, if it is same precision or
narrowing cast, we know nothing.

So, the following patch for the NOP_EXPR cases checks just in case that
it is from integral type and more importantly checks it is a widening
conversion, and then next to it also allows op0 to be just unsigned,
promoted or not, as that is what the C FE will do for those cases too
and I believe it must work - either the division/modulo common type
will be that unsigned type, then we can shorten and don't need to worry
about UB, or it will be some wider signed type but then it can't be most
negative value of the wider type.
And changes both the C and C++ FEs to do the same thing, using a helper
function in c-family.

2023-01-14  Jakub Jelinek  

PR c++/108365
* c-common.h (may_shorten_divmod): New static inline function.

* c-typeck.cc (build_binary_op): Use may_shorten_divmod for
integral
division or modulo.

* typeck.cc (cp_build_binary_op): Use may_shorten_divmod for
integral
division or modulo.

* c-c++-common/pr108365.c: New test.
* g++.dg/opt/pr108365.C: New test.
* g++.dg/warn/pr108365.C: New test.

[Bug debug/106746] [13 Regression] '-fcompare-debug' failure (length) with -O2 -fsched2-use-superblocks since r13-2041-g6624ad73064de241

2023-01-14 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106746

--- Comment #18 from Jakub Jelinek  ---
Thanks for looking into this.

[Bug debug/106746] [13 Regression] '-fcompare-debug' failure (length) with -O2 -fsched2-use-superblocks since r13-2041-g6624ad73064de241

2023-01-14 Thread aoliva at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106746

--- Comment #17 from Alexandre Oliva  ---
Created attachment 54272
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54272=edit
patch that fixes the problem for reasons not fully understood

It seems that looking up the MEM exprs in DEBUG_INSNs disturbs something in
cselib and causes pending MEMs to conflict that, in the non-debug case, don't.

There's no need for these lookups in debug insns, the results aren't used, and
I thought I'd just queue up this improvement but, to my surprise, it made the
problem go away.