[Bug libgcc/108279] Improved speed for float128 routines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #12 from Michael_S --- (In reply to Thomas Koenig from comment #10) > What we would need for incorporation into gcc is to have several > functions, which would then called depending on which floating point > options are in force at the time of invocation. > > So, let's go through the gcc options, to see what would fit where. Walking > down the options tree, depth first. > > From the gcc docs: > > '-ffast-math' > Sets the options '-fno-math-errno', '-funsafe-math-optimizations', > '-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans', > '-fcx-limited-range' and '-fexcess-precision=fast'. > > -fno-math-errno is irrelevant in this context, no need to look at that. > > '-funsafe-math-optimizations' > > Allow optimizations for floating-point arithmetic that (a) assume > that arguments and results are valid and (b) may violate IEEE or > ANSI standards. When used at link time, it may include libraries > or startup files that change the default FPU control word or other > similar optimizations. > > This option is not turned on by any '-O' option since it can result > in incorrect output for programs that depend on an exact > implementation of IEEE or ISO rules/specifications for math > functions. It may, however, yield faster code for programs that do > not require the guarantees of these specifications. Enables > '-fno-signed-zeros', '-fno-trapping-math', '-fassociative-math' and > '-freciprocal-math'. > > '-fno-signed-zeros' > Allow optimizations for floating-point arithmetic that ignore the > signedness of zero. IEEE arithmetic specifies the behavior of > distinct +0.0 and -0.0 values, which then prohibits simplification > of expressions such as x+0.0 or 0.0*x (even with > '-ffinite-math-only'). This option implies that the sign of a zero > result isn't significant. > > The default is '-fsigned-zeros'. > > I don't think this options is relevant. > > '-fno-trapping-math' > Compile code assuming that floating-point operations cannot > generate user-visible traps. These traps include division by zero, > overflow, underflow, inexact result and invalid operation. This > option requires that '-fno-signaling-nans' be in effect. Setting > this option may allow faster code if one relies on "non-stop" IEEE > arithmetic, for example. > > This option should never be turned on by any '-O' option since it > can result in incorrect output for programs that depend on an exact > implementation of IEEE or ISO rules/specifications for math > functions. > > The default is '-ftrapping-math'. > > Relevant. > > '-ffinite-math-only' > Allow optimizations for floating-point arithmetic that assume that > arguments and results are not NaNs or +-Infs. > > This option is not turned on by any '-O' option since it can result > in incorrect output for programs that depend on an exact > implementation of IEEE or ISO rules/specifications for math > functions. It may, however, yield faster code for programs that do > not require the guarantees of these specifications. > > This does not have further suboptions. Relevant. > > '-fassociative-math' > > Allow re-association of operands in series of floating-point > operations. This violates the ISO C and C++ language standard by > possibly changing computation result. NOTE: re-ordering may change > the sign of zero as well as ignore NaNs and inhibit or create > underflow or overflow (and thus cannot be used on code that relies > on rounding behavior like '(x + 2**52) - 2**52'. May also reorder > floating-point comparisons and thus may not be used when ordered > comparisons are required. This option requires that both > '-fno-signed-zeros' and '-fno-trapping-math' be in effect. > Moreover, it doesn't make much sense with '-frounding-math'. For > Fortran the option is automatically enabled when both > '-fno-signed-zeros' and '-fno-trapping-math' are in effect. > > The default is '-fno-associative-math'. > > Not relevant, I think - this influences compiler optimizations. > > '-freciprocal-math' > > Allow the reciprocal of a value to be used instead of dividing by > the value if this enables optimizations. For example 'x / y' can > be replaced with 'x * (1/y)', which is useful if '(1/y)' is subject > to common subexpression elimination. Note that this loses > precision and increases the number of flops operating on the value. > > The default is '-fno-reciprocal-math'. > > Again, not relevant. > > > '-frounding-math' > Disable transformations and optimizations that assume default > floating-point rounding behavior. This is round-to-zero for all > floating point to integer
[Bug libgcc/108279] Improved speed for float128 routines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #11 from Michael_S --- (In reply to Thomas Koenig from comment #9) > Created attachment 54273 [details] > matmul_r16.i > > Here is matmul_r16.i from a relatively recent trunk. Thank you. Unfortunately, I was not able to link it with main in Fortran. So, I still only have to guess why even after replacement of __multf3 and __addtf3 by my implementations it is still more than twice slower (on Zen3) then what it should be. Looking at source and assuming that inner loop starts at line 8944, this loop looks very strange, but apart from bad programming style and apart from misunderstanding of what is optimal scheduling there is nothing criminal about it. May be, because of wrong scheduling, it is 10-15% slower than the best, but certainly it should not be 2.4 times slower that I am seeing. That's what I got from linker: /usr/bin/ld: matmul_r16.o: warning: relocation against `_gfortrani_matmul_r16_avx128_fma3' in read-only section `.text' /usr/bin/ld: matmul_r16.o: in function `matmul_r16_avx': matmul_r16.c:(.text+0x48): undefined reference to `_gfortrani_compile_options' /usr/bin/ld: matmul_r16.c:(.text+0x342): undefined reference to `_gfortrani_compile_options' /usr/bin/ld: matmul_r16.c:(.text+0x10e4): undefined reference to `_gfortrani_size0' /usr/bin/ld: matmul_r16.c:(.text+0x10f1): undefined reference to `_gfortrani_xmallocarray' /usr/bin/ld: matmul_r16.c:(.text+0x12e5): undefined reference to `_gfortrani_runtime_error' /usr/bin/ld: matmul_r16.c:(.text+0x13a9): undefined reference to `_gfortrani_runtime_error' /usr/bin/ld: matmul_r16.c:(.text+0x13e7): undefined reference to `_gfortrani_runtime_error' /usr/bin/ld: matmul_r16.o: in function `matmul_r16_avx2': matmul_r16.c:(.text+0x24c8): undefined reference to `_gfortrani_compile_options' /usr/bin/ld: matmul_r16.c:(.text+0x27c2): undefined reference to `_gfortrani_compile_options' /usr/bin/ld: matmul_r16.c:(.text+0x3564): undefined reference to `_gfortrani_size0' /usr/bin/ld: matmul_r16.c:(.text+0x3571): undefined reference to `_gfortrani_xmallocarray' /usr/bin/ld: matmul_r16.c:(.text+0x3765): undefined reference to `_gfortrani_runtime_error' /usr/bin/ld: matmul_r16.c:(.text+0x3829): undefined reference to `_gfortrani_runtime_error' /usr/bin/ld: matmul_r16.c:(.text+0x3867): undefined reference to `_gfortrani_runtime_error' /usr/bin/ld: matmul_r16.o: in function `matmul_r16_avx512f': matmul_r16.c:(.text+0x4948): undefined reference to `_gfortrani_compile_options' /usr/bin/ld: matmul_r16.c:(.text+0x4c47): undefined reference to `_gfortrani_compile_options' /usr/bin/ld: matmul_r16.c:(.text+0x5a32): undefined reference to `_gfortrani_size0' /usr/bin/ld: matmul_r16.c:(.text+0x5a3f): undefined reference to `_gfortrani_xmallocarray' /usr/bin/ld: matmul_r16.c:(.text+0x5c35): undefined reference to `_gfortrani_runtime_error' /usr/bin/ld: matmul_r16.c:(.text+0x5cfb): undefined reference to `_gfortrani_runtime_error' /usr/bin/ld: matmul_r16.c:(.text+0x5d35): undefined reference to `_gfortrani_runtime_error' /usr/bin/ld: matmul_r16.o: in function `matmul_r16_vanilla': matmul_r16.c:(.text+0x6de8): undefined reference to `_gfortrani_compile_options' /usr/bin/ld: matmul_r16.c:(.text+0x70e2): undefined reference to `_gfortrani_compile_options' /usr/bin/ld: matmul_r16.c:(.text+0x7e84): undefined reference to `_gfortrani_size0' /usr/bin/ld: matmul_r16.c:(.text+0x7e91): undefined reference to `_gfortrani_xmallocarray' /usr/bin/ld: matmul_r16.c:(.text+0x8085): undefined reference to `_gfortrani_runtime_error' /usr/bin/ld: matmul_r16.c:(.text+0x8149): undefined reference to `_gfortrani_runtime_error' /usr/bin/ld: matmul_r16.c:(.text+0x8187): undefined reference to `_gfortrani_runtime_error' /usr/bin/ld: matmul_r16.o: in function `_gfortran_matmul_r16': matmul_r16.c:(.text+0x92b7): undefined reference to `_gfortrani_matmul_r16_avx128_fma3' /usr/bin/ld: matmul_r16.c:(.text+0x92ea): undefined reference to `_gfortrani_matmul_r16_avx128_fma4' /usr/bin/ld: warning: creating DT_TEXTREL in a PIE collect2: error: ld returned 1 exit status
[Bug tree-optimization/99408] s3251 benchmark of TSVC vectorized by clang runs about 7 times faster compared to gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408 --- Comment #4 from Jan Hubicka --- On Zen4 it is 20s for gcc and 6.9s for aocc, so still a problem.
[Bug middle-end/108376] TSVC s1279 runs 40% faster with aocc than gcc at zen4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108376 --- Comment #3 from Jan Hubicka --- If I make the arrays random then GCC code is indeed faster: #include #include typedef float real_t; #define iterations 100 #define LEN_1D 32000 #define LEN_2D 256 real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D]; real_t aa[LEN_2D][LEN_2D]; real_t bb[LEN_2D][LEN_2D]; real_t cc[LEN_2D][LEN_2D]; real_t qq; int main(void) { //reductions //if to max reduction real_t x; for (int i = 0; i < LEN_1D; i++) { a[i]=(rand() %5) - 3; b[i]=(rand() %6) - 3; } for (int nl = 0; nl < iterations; nl++) { for (int i = 0; i < LEN_1D; i++) { if (a[i] < (real_t)0.) { if (b[i] > a[i]) { c[i] += d[i] * e[i]; } } } //dummy(a, b, c, d, e, aa, bb, cc, 0.); } return x; } jh@alberti:~/tsvc/bin> ~/aocc-compiler-4.0.0/bin/clang -Ofast s1279.c -march=native s1279.c:23:14: warning: implicit declaration of function 'rand' is invalid in C99 [-Wimplicit-function-declaration] a[i]=(rand() %5) - 3; ^ 1 warning generated. jh@alberti:~/tsvc/bin> time ./a.out real0m5.638s user0m5.636s sys 0m0.000s jh@alberti:~/tsvc/bin> ~/trunk-install/bin/gcc -Ofast s1279.c -march=native s1279.c: In function 'main': s1279.c:23:14: warning: implicit declaration of function 'rand' [-Wimplicit-function-declaration] 23 |a[i]=(rand() %5) - 3; | ^~~~ jh@alberti:~/tsvc/bin> time ./a.out real0m2.791s user0m2.790s sys 0m0.000s sorry for wrong code, just for reference the loop compiles as: .L4: xorl%eax, %eax .p2align 4 .p2align 3 .L3: vmovaps a(%rax), %ymm2 vmovaps b(%rax), %ymm3 vmovaps c(%rax), %ymm6 addq$32, %rax vmovaps c-32(%rax), %ymm0 vmovaps e-32(%rax), %ymm4 vcmpps $1, %ymm1, %ymm2, %k1 vcmpps $14, %ymm2, %ymm3, %k1{%k1} vfmadd231ps d-32(%rax), %ymm4, %ymm0{%k1} vfmadd231ps d-32(%rax), %ymm4, %ymm0 vblendmps %ymm0, %ymm6, %ymm0{%k1} vmovaps %ymm0, c-32(%rax) cmpq$128000, %rax jne .L3 decl%edx jne .L4
[Bug libstdc++/108409] std::chrono::current_zone() doesn't work on AIX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108409 --- Comment #3 from Jonathan Wakely --- (In reply to Jonathan Wakely from comment #0) > We should parse the TZ env var and see if it is already an IANA name, and > handle a few other special cases. E.g. gcc119 in the cfarm hax TZ=CUT0 which > means a time zone named "CUT" (coordinated universal time) with a 0 offset > from UTC. So map to UTC. More generally, "FOOn" is a time zone called "FOO" > with a -n offset, so we could map any such string to "Etc/GMT-n" It now works if TZ contains an IANA time zone name, or any string matching "???0". If the systemwide TZ isn't one of those, users can define TZ for their own programs' environment. I don't know if that's good enough.
[Bug libstdc++/108409] std::chrono::current_zone() doesn't work on AIX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108409 --- Comment #2 from CVS Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:d80e5a7b30e5d045c808f5235123e366e4e9286c commit r13-5170-gd80e5a7b30e5d045c808f5235123e366e4e9286c Author: Jonathan Wakely Date: Sat Jan 14 20:13:32 2023 + libstdc++: Implement std::chrono::current_zone() for AIX [PR108409] libstdc++-v3/ChangeLog: PR libstdc++/108409 * src/c++20/tzdb.cc (current_zone()) [_AIX]: Use TZ environment variable.
[Bug ipa/56139] [10/11/12/13 Regression] unmodified static data could go in .rodata, not .data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56139 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #4 from Jan Hubicka --- I have some code that makes it possible to attach summaries to references (like we do for calls and symbols) and then mark addresses that are never used for compoarsion. Similarly we could probably mark readonly addresses. We could even squeeze out a bit in the reference representation itself. Is there easy way to tell if address is never read from during IPA summary generation time?
[Bug bootstrap/107950] partial LTO linking of libbackend.a: gcc/gcc-rich-location.cc:207: undefined reference to `range_label_for_type_mismatch::get_text(unsigned int) const'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107950 --- Comment #7 from Jan Hubicka --- Thanks for looking into the incremental link of libbackend. I had it in my tree for a while but never got around implementing correct way to enable it only during bootstrap since host compiler may not support it. It would be nice to have it in since it should reduce WPA memory use and also test this code path. I also think it is the case where partial linking makes the symbol to be pulled into LTO binary at the initial link time. It should be optimized away if linker was not complaining.
[Bug middle-end/108410] New: x264 averaging loop not optimized well for avx512
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 Bug ID: 108410 Summary: x264 averaging loop not optimized well for avx512 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- x264 benchmark has a loop averaging two unsigned char arrays that is executed with relatively low trip counts that does not play well with our vectorized code. For AVX512 most time is spent in unvectorized variant since the average number of iterations is too small to reach the vector code. This table shows runtimes of averaging given block size with scalar loop, vectorized loop for individual vector sizes and aocc codegen: size scalar 128 256 512aocc 28.139.499.499.499.49 45.796.106.107.456.78 65.445.435.426.785.87 85.192.715.316.445.42 125.143.175.336.104.97 164.851.191.535.931.36 204.822.031.906.101.90 244.600.962.586.102.26 284.511.552.976.002.55 324.520.680.600.600.77 344.770.960.880.800.96 384.421.361.371.171.29 424.400.841.821.731.63 So for sizes 2-8 scalar loop wins. For sizes 12-16 128bit vectorization wins, 20-28 behaves funily. However avx512 vectorization is a huge loss for all sizes up to 31 bytes. aocc seems to win for 16 bytes. Note that one problem is that for 256bit vector we peel the epilogue loop (since trip counts fits in max-completely-peeled-insns and max-completely-peel-times. Bumping both twice makes avx512 prologue unrolled too but it does not seem to help x264 benchmark itself. bmk.c: #include unsigned char a[1]; unsigned char b[1]; unsigned char c[1]; __attribute__ ((weak)) void avg (unsigned char *a, unsigned char *b, unsigned char *c, int size) { for (int i = 0; i > 1; } } int main(int argc, char**argv) { int size = atoi (argv[1]); for (long i = 0 ; i < 100/size; i++) { avg (a,b,c,size); } return 0; } #include unsigned char a[1]; unsigned char b[1]; unsigned char c[1]; __attribute__ ((weak)) void avg (unsigned char *a, unsigned char *b, unsigned char *c, int size) { for (int i = 0; i > 1; } } int main(int argc, char**argv) { int size = atoi (argv[1]); for (long i = 0 ; i < 100/size; i++) { avg (a,b,c,size); } return 0; } bmk.sh: gcc -Ofast -march=native bmk.c -fno-tree-vectorize -o bmk.scalar gcc -Ofast -march=native bmk.c -mprefer-vector-width=128 -o bmk.128 gcc -Ofast -march=native bmk.c -mprefer-vector-width=256 -o bmk.256 gcc -Ofast -march=native bmk.c -mprefer-vector-width=512 -o bmk.512 ~/aocc-compiler-4.0.0//bin/clang -Ofast -march=native bmk.c -o bmk.aocc echo "size scalar 128 256 512aocc" for size in 2 4 6 8 12 16 20 24 28 32 34 38 42 do scalar=`time -f "%e" ./bmk.scalar $size 2>&1` v128=`time -f "%e" ./bmk.128 $size 2>&1` v256=`time -f "%e" ./bmk.256 $size 2>&1` v512=`time -f "%e" ./bmk.512 $size 2>&1` aocc=`time -f "%e" ./bmk.aocc $size 2>&1` printf "%5i %7.2f %7.2f %7.2f %7.2f %7.2f\n" $size $scalar $v128 $v256 $v512 $aocc done aocc codegen: # %bb.0:# %entry pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset %rbx, -16 testl %ecx, %ecx jle .LBB0_15 # %bb.1:# %iter.check movl%ecx, %r8d cmpl$16, %ecx jae .LBB0_3 # %bb.2: xorl%eax, %eax jmp .LBB0_14 .LBB0_3:# %vector.memcheck leaq(%rsi,%r8), %r9 leaq(%rdi,%r8), %rax leaq(%rdx,%r8), %r10 cmpq%rdi, %r9 seta%r11b cmpq%rsi, %rax seta%bl cmpq%rdi, %r10 seta%r9b cmpq%rdx, %rax seta%r10b xorl%eax, %eax testb %bl, %r11b jne .LBB0_14 # %bb.4:# %vector.memcheck andb%r10b, %r9b jne .LBB0_14 # %bb.5:# %vector.main.loop.iter.check cmpl$128, %ecx jae .LBB0_7 # %bb.6: xorl%eax, %eax jmp .LBB0_11 .LBB0_7:# %vector.ph movl%r8d, %eax andl$-128, %eax xorl%ecx, %ecx .p2align4, 0x90 .LBB0_8:# %vector.body # =>This Inner Loop Header: Depth=1 vmovdqu (%rdx,%rcx), %ymm0 vmovdqu 32(%rdx,%rcx), %ymm1
[Bug c++/108407] SegFault with structured binding and OpenMP without optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108407 --- Comment #4 from Matthias Möller --- Thank you, I have changed the code as suggested and it compiles and runs fine in all optimization levels including '-O0'.
[Bug libstdc++/108409] std::chrono::current_zone() doesn't work on AIX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108409 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2023-01-14 --- Comment #1 from Jonathan Wakely --- https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03 describes the format of the TZ variable when it's not an IANA name. When TZ does not name an IANA zone, we could potentially create a new chrono::time_zone object, generated from the std and dst names, with the appropriate offsets and DST transitions. Then current_zone() would return a pointer to that custom zone.
[Bug libstdc++/108409] New: std::chrono::current_zone() doesn't work on AIX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108409 Bug ID: 108409 Summary: std::chrono::current_zone() doesn't work on AIX Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: redi at gcc dot gnu.org Target Milestone: --- Target: *-*aix* terminate called after throwing an instance of 'std::runtime_error' what(): tzdb: cannot determine current zone FAIL: std/time/tzdb/1.cc execution test terminate called after throwing an instance of 'std::runtime_error' what(): tzdb: cannot determine current zone FAIL: std/time/zoned_time/custom.cc execution test The std::chrono::current_zone() function is supposed to determine the machine's time zone. As noted in libstdc++-v3/src/c++20/tzdb.cc: // TODO AIX stores current zone in $TZ in /etc/environment but the value // is typically a POSIX time zone name, not IANA zone. // https://developer.ibm.com/articles/au-aix-posix/ // https://www.ibm.com/support/pages/managing-time-zone-variable-posix __throw_runtime_error("tzdb: cannot determine current zone"); How should we solve this? We should parse the TZ env var and see if it is already an IANA name, and handle a few other special cases. E.g. gcc119 in the cfarm hax TZ=CUT0 which means a time zone named "CUT" (coordinated universal time) with a 0 offset from UTC. So map to UTC. More generally, "FOOn" is a time zone called "FOO" with a -n offset, so we could map any such string to "Etc/GMT-n" We could add some AIX-specific extension point, so programs can tell the library the IANA (aka Olson) name of the current time zone. Maybe read it from another file, something configurable and controlled by the user/program. But if we handle the TZ variable, users can just set that in their program's env and another extension point probably isn't needed.
[Bug tree-optimization/92342] [10/11/12/13 Regression] a small missed transformation into x?b:0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92342 --- Comment #29 from Gabriel Ravier --- Looks like the patch fixes this bug, unless I'm missing something.
[Bug c++/108407] SegFault with structured binding and OpenMP without optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108407 --- Comment #3 from Andrew Pinski --- If you do: return std::tuple(a,b); You don't get the reference.
[Bug c++/108407] SegFault with structured binding and OpenMP without optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108407 --- Comment #2 from Andrew Pinski --- > return std::tie(a,b); That returns a reference to the two local variables. Both have now gone out of scope.
[Bug c++/108407] SegFault with structured binding and OpenMP without optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108407 Andrew Pinski changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #1 from Andrew Pinski --- With -fsanitize=undefined,address we get: = ==1==ERROR: AddressSanitizer: stack-use-after-return on address 0x7ff991800030 at pc 0x004016c4 bp 0x7ffd675ea150 sp 0x7ffd675ea148 READ of size 4 at 0x7ff991800030 thread T0 #0 0x4016c3 in main /app/example.cpp:19 #1 0x7ff993d50082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) #2 0x40115d in _start (/app/output.s+0x40115d) (BuildId: c6fef22ac59389c6ed0248b91200737f3dfa67d0) Address 0x7ff991800030 is located in stack of thread T0 at offset 48 in frame #0 0x401225 in create() /app/example.cpp:7 This frame has 2 object(s): [48, 52) 'a' (line 8) <== Memory access at offset 48 is inside this variable [64, 72) 'b' (line 9) HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) SUMMARY: AddressSanitizer: stack-use-after-return /app/example.cpp:19 in main Shadow bytes around the buggy address: 0x7ff9917ffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ff9917ffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ff9917ffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ff9917fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ff9917fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x7ff99180: f5 f5 f5 f5 f5 f5[f5]f5 f5 f5 f5 f5 00 00 00 00 0x7ff991800080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ff991800100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ff991800180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ff991800200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ff991800280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user:f7 Container overflow: fc Array cookie:ac Intra object redzone:bb ASan internal: fe Left alloca redzone: ca Right alloca redzone:cb ==1==ABORTING
[Bug d/108408] New: libphobos: Support building on *-*-cygwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108408 Bug ID: 108408 Summary: libphobos: Support building on *-*-cygwin Product: gcc Version: 11.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: d Assignee: ibuclaw at gdcproject dot org Reporter: nightstrike at gmail dot com Target Milestone: --- See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99794 for reference. This PR is tracking the state of building libphobos on cygwin using 11.3, the last compiler that can be bootstrapped natively. Currently, it fails due to missing definitions for FILE, snprintf, time_t, and clock_t. I'm currently trying to add them to libphobos/libdruntime/core/stdc/stdio.d, because the path for CRuntime_Newlib is currently missing. I seem to be falling down a rabbit hole of needing to define struct after struct, so I'm trying to gather the work here in hopes that some kind soul can help. I should clarify that I'm unfamiliar with D, so what I'm putting here definitely needs someone with experience to finish the work. My hope is to just get it far enough along that others can do so. Whoever put in the gcc warning that automatically converts function pointers for you, please accept my thanks! I started by lifting the structure definitions from cygwin's newlib, which unfortunately have quite a few conditional components. I don't know how important this is if for instance a structure has: struct S { #ifdef A int a; #else void * b; }; or some variation of all manner of conditions that change the struct layout and the total size. Guidance as to whether this is the right approach or a total waste of time would be appreciated :). With what I have so far, I'm down to just the following: /cygdrive/k/gcc/src/gcc-git/libphobos/libdruntime/core/sys/posix/stdc/time.d:52:15: error: module core.sys.posix.sys.types import 'time_t' not found 52 | public import core.sys.posix.sys.types : time_t, clock_t; | ^ /cygdrive/k/gcc/src/gcc-git/libphobos/libdruntime/core/sys/posix/stdc/time.d:52:15: error: module core.sys.posix.sys.types import 'clock_t' not found 52 | public import core.sys.posix.sys.types : time_t, clock_t; | ^ /cygdrive/k/gcc/src/gcc-git/libphobos/libdruntime/core/stdc/stdio.d:1514:9: error: undefined identifier 'fpos_t', did you mean alias '_fpos_t'? 1514 | int fgetpos(FILE* stream, scope fpos_t * pos); | ^ /cygdrive/k/gcc/src/gcc-git/libphobos/libdruntime/core/stdc/stdio.d:1516:9: error: undefined identifier 'fpos_t', did you mean alias '_fpos_t'? 1516 | int fsetpos(FILE* stream, scope const fpos_t* pos); | ^ ../../../../libphobos/libdruntime/core/demangle.d:2615:16: error: module core.stdc.stdio import 'snprintf' not found, did you mean function 'core.stdc.stdio.sprintf'? 2615 | import core.stdc.stdio : snprintf; |^ This is the diff so far: diff --git a/libphobos/libdruntime/core/stdc/stdio.d b/libphobos/libdruntime/core/stdc/stdio.d index c76b922a3eb..52bcc9d7cdd 100644 --- a/libphobos/libdruntime/core/stdc/stdio.d +++ b/libphobos/libdruntime/core/stdc/stdio.d @@ -397,6 +397,196 @@ else version (CRuntime_Microsoft) /// alias shared(_iobuf) FILE; } +else version (CRuntime_Newlib) +{ +alias long _off64_t; +alias long _fpos_t; +alias long _fpos64_t; +alias int _float_t; + +struct __sbuf { +char* _base; +int _size; +} + +struct _mbstate_t { +int _count; +union { +dchar _wch; +char[4] _wchb; +} +} + +struct _rand48 { +ushort[3] _seed; +ushort[3] _mult; +ushort _add; +} + +struct __tm { +int __tm_sec; +int __tm_min; +int __tm_hour; +int __tm_mday; +int __tm_mon; +int __tm_year; +int __tm_wday; +int __tm_yday; +int __tm_isdst; +} + +struct __lc_cats { +const void* ptr; +char*buf; +} + +struct lconv { +char* decimal_point; +char* thousands_sep; +char* grouping; +char* int_curr_symbol; +char* currency_symbol; +char* mon_decimal_point; +char* mon_thousands_sep; +char* mon_grouping; +char* positive_sign; +char* negative_sign; +char int_frac_digits; +char frac_digits; +char p_cs_precedes; +char p_sep_by_space; +char n_cs_precedes; +char n_sep_by_space; +char p_sign_posn; +char n_sign_posn; +char int_n_cs_precedes; +char int_n_sep_by_space; +char int_n_sign_posn; +char int_p_cs_precedes; +char int_p_sep_by_space; +char int_p_sign_posn; +} + +struct __locale_t { +char[7][31 + 1] categories;
[Bug c++/108407] New: SegFault with structured binding and OpenMP without optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108407 Bug ID: 108407 Summary: SegFault with structured binding and OpenMP without optimization Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: mmoelle1 at gmail dot com Target Milestone: --- The following code snippet compiles well but the binary stops with a segmentation fault when compiled with 'g++ -O0 -std=c++17'. When compiled with optimization (-O1 or better) turned on the binary works fine. #include #ifdef OPENMMP_ #include #endif auto create() { inta = 10; double b = 1.0; return std::tie(a,b); } int main() { auto [a, b] = create(); double vector[100]; #pragma omp parallel for for (int i=0; i
[Bug middle-end/108300] `abort()` macro cause bootstrap failure on *-w64-mingw32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108300 nightstrike changed: What|Removed |Added CC||nightstrike at gmail dot com --- Comment #15 from nightstrike --- Someone on irc (jakub?) suggested just changing all of the aborts to gcc_unreachable. Is that a viable option?
[Bug libstdc++/107189] Inconsistent range insertion implementations in std::_Rb_tree in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107189 François Dumont changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED Target Milestone|--- |13.0 --- Comment #3 from François Dumont --- I am making this bug resolved for the useless _Alloc_node instance. Regarding the inconsistent implementation feel free to open another issue with more explanations. Thanks
[Bug target/82028] Windows x86_64 should not pass float aggregates in xmm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82028 nightstrike changed: What|Removed |Added CC||nightstrike at gmail dot com --- Comment #5 from nightstrike --- (In reply to jon_y from comment #4) > I can't seem to change the bug status to confirmed. "NEW" is confirmed
[Bug target/90256] Optimizer with interrupt routines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90256 --- Comment #4 from Andrew Pinski --- The reason why it is target specific is because the attribute interrupt is target specific and ipa-icf code has no knowledge of it. Basically the x86_64 backend when it sees interrupt attribute it should also add no_icf attribute.
[Bug target/90256] Optimizer with interrupt routines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90256 --- Comment #3 from Andrew Pinski --- Easy work around is add to the attribute, noipa.
[Bug target/90256] Optimizer with interrupt routines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90256 nightstrike changed: What|Removed |Added CC||nightstrike at gmail dot com --- Comment #2 from nightstrike --- This is not target specific (or at least it also happens on x86_64-pc-linux).
[Bug tree-optimization/106103] ICE in binds_to_current_def_p when source object files are compiled with -flto -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106103 --- Comment #2 from Ivan --- Putting -fno-declone-ctor-dtor in the flags "fixes" the bug.
[Bug ipa/108383] g++ ICE with -O3 and -flto and -fdeclone-ctor-dtor on simple function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108383 Ivan changed: What|Removed |Added CC||ivanka2012 at gmail dot com --- Comment #5 from Ivan --- This is related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106103
[Bug c++/80561] Missed optimization: std::array data is aligned if array is aligned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80561 John Zwinck changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #7 from John Zwinck --- This was fixed in GCC 8. Thank you.
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 80561, which changed state. Bug 80561 Summary: Missed optimization: std::array data is aligned if array is aligned https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80561 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug modula2/108405] modula-2: Testsuite fails: concurrentstore.mod, contimer.mod, tinytimer.mod on Darwin (and likely elsewhere)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108405 --- Comment #3 from Andreas Schwab --- NPTL does not have the alignment restriction.
[Bug tree-optimization/108406] New: Missed integer optimization on x86-64 unless -fwrapv is used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108406 Bug ID: 108406 Summary: Missed integer optimization on x86-64 unless -fwrapv is used Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jzwinck at gmail dot com Target Milestone: --- Consider this C++ code: #include // returns a if less than b or if b is INT32_MIN int32_t special_min(int32_t a, int32_t b) { return a < b || b == INT32_MIN ? a : b; } GCC with -fwrapv correctly realizes that subtracting 1 from b can eliminate the special case, and it generates this code for x86-64: lea edx, [rsi-1] mov eax, edi cmp edi, edx cmovg eax, esi ret But without -fwrapv it generates worse code: mov eax, esi cmp edi, esi jl .L4 cmp esi, -2147483648 je .L4 ret .L4: mov eax, edi ret If I wrote "hand optimized" C++ code trying to implement that optimization, I understand -fwrapv would be required, otherwise the compiler could decide the signed overflow is UB. But here the compiler is in control, it knows the behavior of integer overflow on x86-64, and so it should not matter whether -fwrapv is used. Demo: https://godbolt.org/z/o881Mdqoa Stack Overflow discussion: https://stackoverflow.com/questions/75110108/gcc-wont-use-its-own-optimization-trick-without-fwrapv This is somewhat related to #102032 in the sense that it's an optimization missed without -fwrapv, but the type of optimization is different. It is possible there's a single solution that would solve both problems (and others).
[Bug modula2/108405] modula-2: Testsuite fails: concurrentstore.mod, contimer.mod, tinytimer.mod on Darwin (and likely elsewhere)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108405 --- Comment #2 from Iain Sandoe --- (In reply to Iain Sandoe from comment #1) > note that a default size of 8Mb is not enough for either Linux or Arm64 > Darwin (both have PTHREAD_STACK_MIN of 16384). this is, of course, rubbish .. the default is 8Mb not 8k (which is fine for both 4096 and 16384 page sizes).
[Bug libgcc/108279] Improved speed for float128 routines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #10 from Thomas Koenig --- What we would need for incorporation into gcc is to have several functions, which would then called depending on which floating point options are in force at the time of invocation. So, let's go through the gcc options, to see what would fit where. Walking down the options tree, depth first. >From the gcc docs: '-ffast-math' Sets the options '-fno-math-errno', '-funsafe-math-optimizations', '-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans', '-fcx-limited-range' and '-fexcess-precision=fast'. -fno-math-errno is irrelevant in this context, no need to look at that. '-funsafe-math-optimizations' Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards. When used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. This option is not turned on by any '-O' option since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. Enables '-fno-signed-zeros', '-fno-trapping-math', '-fassociative-math' and '-freciprocal-math'. '-fno-signed-zeros' Allow optimizations for floating-point arithmetic that ignore the signedness of zero. IEEE arithmetic specifies the behavior of distinct +0.0 and -0.0 values, which then prohibits simplification of expressions such as x+0.0 or 0.0*x (even with '-ffinite-math-only'). This option implies that the sign of a zero result isn't significant. The default is '-fsigned-zeros'. I don't think this options is relevant. '-fno-trapping-math' Compile code assuming that floating-point operations cannot generate user-visible traps. These traps include division by zero, overflow, underflow, inexact result and invalid operation. This option requires that '-fno-signaling-nans' be in effect. Setting this option may allow faster code if one relies on "non-stop" IEEE arithmetic, for example. This option should never be turned on by any '-O' option since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. The default is '-ftrapping-math'. Relevant. '-ffinite-math-only' Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs. This option is not turned on by any '-O' option since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. This does not have further suboptions. Relevant. '-fassociative-math' Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result. NOTE: re-ordering may change the sign of zero as well as ignore NaNs and inhibit or create underflow or overflow (and thus cannot be used on code that relies on rounding behavior like '(x + 2**52) - 2**52'. May also reorder floating-point comparisons and thus may not be used when ordered comparisons are required. This option requires that both '-fno-signed-zeros' and '-fno-trapping-math' be in effect. Moreover, it doesn't make much sense with '-frounding-math'. For Fortran the option is automatically enabled when both '-fno-signed-zeros' and '-fno-trapping-math' are in effect. The default is '-fno-associative-math'. Not relevant, I think - this influences compiler optimizations. '-freciprocal-math' Allow the reciprocal of a value to be used instead of dividing by the value if this enables optimizations. For example 'x / y' can be replaced with 'x * (1/y)', which is useful if '(1/y)' is subject to common subexpression elimination. Note that this loses precision and increases the number of flops operating on the value. The default is '-fno-reciprocal-math'. Again, not relevant. '-frounding-math' Disable transformations and optimizations that assume default floating-point rounding behavior. This is round-to-zero for all floating point to integer conversions, and round-to-nearest for all other arithmetic truncations. This option should be specified for programs that change the FP rounding mode dynamically, or that may be executed with a non-default rounding mode. This option disables
[Bug modula2/108405] modula-2: Testsuite fails: concurrentstore.mod, contimer.mod, tinytimer.mod on Darwin (and likely elsewhere)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108405 Iain Sandoe changed: What|Removed |Added Target||x86_64-darwin Keywords||testsuite-fail --- Comment #1 from Iain Sandoe --- note that a default size of 8Mb is not enough for either Linux or Arm64 Darwin (both have PTHREAD_STACK_MIN of 16384).
[Bug modula2/108405] New: modula-2: Testsuite fails: concurrentstore.mod, contimer.mod, tinytimer.mod on Darwin (and likely elsewhere)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108405 Bug ID: 108405 Summary: modula-2: Testsuite fails: concurrentstore.mod, contimer.mod, tinytimer.mod on Darwin (and likely elsewhere) Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: modula2 Assignee: gaius at gcc dot gnu.org Reporter: iains at gcc dot gnu.org Target Milestone: --- The test cases in the subject all fail on Darwin for the same reason, there is an attempt to set a stack size that violates the constraints of pthread_attr_setstacksize. On Darwin; pthread_attr_setstacksize() will fail if: [EINVAL] stacksize is less than PTHREAD_STACK_MIN [EINVAL] stacksize is not a multiple of the system page size. On Linux: pthread_attr_setstacksize() can fail with the following error: EINVAL The stack size is less than PTHREAD_STACK_MIN (16384) bytes. On some systems, pthread_attr_setstacksize() can fail with the error EINVAL if stacksize is not a multiple of the system page size. --- So the report reported on Darwin might well occur also on (at least some) Linux systems. The problem is in PROCEDURE initPreemptive (seconds, microsecs: CARDINAL) ; which tries to call Create (timer, 1000, MAX (Urgency), NIL, timerId) ; Where 1000 violates the constraints on stack size (definitely on Darwin, maybe on some Linux). So .. the short-term solution is to fix initPreemptive to use a suitable value (patch to be posted). However: 1. We should have detected the bad user value earlier and thrown an exception? 2. It is not clear to me how these magic numbers (embedded in the library) have been chosen (there is 8Mb as defaultSize and then here we add 10Mb) perhaps this is something that should be configured or at least set according to a target query?
[Bug modula2/108404] New: M2RTS_Halt fails with a segv (it should emit a diagnostic and exit).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108404 Bug ID: 108404 Summary: M2RTS_Halt fails with a segv (it should emit a diagnostic and exit). Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: modula2 Assignee: gaius at gcc dot gnu.org Reporter: iains at gcc dot gnu.org Target Milestone: --- On Darwin several tests fail because there is an invalid stack size set (that is a separate bug). The fault should have been reported by M2RTS_Halt (it is detected correctly in Rico.cc). Setting a break point on the entry to M2RTS_Halt : * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x000154f0 concurrentstore.x0`M2RTS_Halt at M2RTS.mod:296:1 293 294 PROCEDURE Halt (file: ARRAY OF CHAR; line: CARDINAL; 295 function: ARRAY OF CHAR; description: ARRAY OF CHAR) ; -> 296 BEGIN 297 ErrorMessage (description, file, line, function) ; 298 HALT 299 END Halt ; examining the registers: (lldb) reg read General Purpose Registers: rax = 0x0016 rbx = 0x62c08118 rcx = 0x000100014c00 "failed to set stack size attribute" rdx = 0x000100014bf2 "initThread" rdi = 0x000100014b00 "/src-local/gcc-master/libgm2/libm2iso/RTco.cc" rsi = 0x0172 this is correct ABI - RDI - file, RSI = line number, RDX = function, RCX = message. (four integer/pointer arguments). however if we continue from this point ... * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x3014e7178) frame #0: 0x0001554f concurrentstore.x0`M2RTS_Halt at M2RTS.mod:296:1 293 294 PROCEDURE Halt (file: ARRAY OF CHAR; line: CARDINAL; 295 function: ARRAY OF CHAR; description: ARRAY OF CHAR) ; -> 296 BEGIN 297 ErrorMessage (description, file, line, function) ; I cannot (at present) debug this further since I do not have an installed debugger that supports Module-2 (but it might well repeat on Linux - the ABI is basically the same). In any case, it seems likely that the problem is in the prologue or very early in the function since the break line is on BEGIN in both cases.
[Bug c++/108365] [9/10/11/12 Regression] Wrong code with -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108365 Jakub Jelinek changed: What|Removed |Added Summary|[9/10/11/12/13 Regression] |[9/10/11/12 Regression] |Wrong code with -O0 |Wrong code with -O0 --- Comment #8 from Jakub Jelinek --- Fixed on the trunk so far. Guess for backports we want instead a minimal change (i.e. just the +&& INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (op0, 0))) and +&& (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (op0, 0))) +< TYPE_PRECISION (type0))) additions for C++ FE).
[Bug libgcc/108279] Improved speed for float128 routines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279 --- Comment #9 from Thomas Koenig --- Created attachment 54273 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54273=edit matmul_r16.i Here is matmul_r16.i from a relatively recent trunk.
[Bug c++/108365] [9/10/11/12/13 Regression] Wrong code with -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108365 --- Comment #7 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:5b3a88640f962d4ffca31ae651bed2d8672f1a8c commit r13-5163-g5b3a88640f962d4ffca31ae651bed2d8672f1a8c Author: Jakub Jelinek Date: Sat Jan 14 10:17:14 2023 +0100 c++: Avoid incorrect shortening of divisions [PR108365] The following testcase is miscompiled, because we shorten the division in a case where it should not be shortened. Divisions (and modulos) can be shortened if it is unsigned division/modulo, or if it is signed division/modulo where we can prove the dividend will not be the minimum signed value or divisor will not be -1, because e.g. on sizeof(long long)==sizeof(int)*2 && __INT_MAX__ == 0x7fff targets (-2147483647 - 1) / -1 is UB but (int) (-2147483648LL / -1LL) is not, it is -2147483648. The primary aim of both the C and C++ FE division/modulo shortening I assume was for the implicit integral promotions of {,signed,unsigned} {char,short} and because at this point we have no VRP information etc., the shortening is done if the integral promotion is from unsigned type for the divisor or if the dividend is an integer constant other than -1. This works fine for char/short -> int promotions when char/short have smaller precision than int - unsigned char -> int or unsigned short -> int will always be a positive int, so never the most negative. Now, the C FE checks whether orig_op0 is TYPE_UNSIGNED where op0 is either the same as orig_op0 or that promoted to int, I think that works fine, if it isn't promoted, either the division/modulo common type will have the same precision as op0 but then the division/modulo is unsigned and so without UB, or it will be done in wider precision (e.g. because op1 has wider precision), but then op0 can't be minimum signed value. Or it has been promoted to int, but in that case it was again from narrower type and so never minimum signed int. But the C++ FE was checking if op0 is a NOP_EXPR from TYPE_UNSIGNED. First of all, not sure if the operand of NOP_EXPR couldn't be non-integral type where TYPE_UNSIGNED wouldn't be meaningful, but more importantly, even if it is a cast from unsigned integral type, we only know it can't be minimum signed value if it is a widening cast, if it is same precision or narrowing cast, we know nothing. So, the following patch for the NOP_EXPR cases checks just in case that it is from integral type and more importantly checks it is a widening conversion, and then next to it also allows op0 to be just unsigned, promoted or not, as that is what the C FE will do for those cases too and I believe it must work - either the division/modulo common type will be that unsigned type, then we can shorten and don't need to worry about UB, or it will be some wider signed type but then it can't be most negative value of the wider type. And changes both the C and C++ FEs to do the same thing, using a helper function in c-family. 2023-01-14 Jakub Jelinek PR c++/108365 * c-common.h (may_shorten_divmod): New static inline function. * c-typeck.cc (build_binary_op): Use may_shorten_divmod for integral division or modulo. * typeck.cc (cp_build_binary_op): Use may_shorten_divmod for integral division or modulo. * c-c++-common/pr108365.c: New test. * g++.dg/opt/pr108365.C: New test. * g++.dg/warn/pr108365.C: New test.
[Bug debug/106746] [13 Regression] '-fcompare-debug' failure (length) with -O2 -fsched2-use-superblocks since r13-2041-g6624ad73064de241
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106746 --- Comment #18 from Jakub Jelinek --- Thanks for looking into this.
[Bug debug/106746] [13 Regression] '-fcompare-debug' failure (length) with -O2 -fsched2-use-superblocks since r13-2041-g6624ad73064de241
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106746 --- Comment #17 from Alexandre Oliva --- Created attachment 54272 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54272=edit patch that fixes the problem for reasons not fully understood It seems that looking up the MEM exprs in DEBUG_INSNs disturbs something in cselib and causes pending MEMs to conflict that, in the non-debug case, don't. There's no need for these lookups in debug insns, the results aren't used, and I thought I'd just queue up this improvement but, to my surprise, it made the problem go away.