[Bug target/115789] gcc miscompile itself with CFLAGS -O3 -march=rv64gcv_zvl256b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115789 --- Comment #5 from Craig Topper --- Isn’t -mstrict-align the default? It is in LLVM.
[Bug target/114963] New: RISCV -msave-restore -fno-omit-frame-pointer does not emit save/restore library calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114963 Bug ID: 114963 Summary: RISCV -msave-restore -fno-omit-frame-pointer does not emit save/restore library calls Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- It appears that combining -msave-restore and -fno-omit-frame-pointer results in no save/restore library calls being emitted. I also found this blog post that seems to imply it worked at some point, but broke unwinding. https://www.codethink.co.uk/articles/2023/riscv-stack-unwinding-bug/ I checked a few previous versions on compiler explorer and was not able to get the library calls.
[Bug target/113095] RISC-V: movcc no longer used for coremark crc functions with -mtune=sifive-7-series
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113095 --- Comment #3 from Craig Topper --- Our FPGA data is showing this as a 5% regression. I'll try to check on an Unmatched board to confirm.
[Bug target/113095] RISC-V: movcc no longer used for coremark crc functions with -mtune=sifive-7-series
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113095 --- Comment #2 from Craig Topper --- The branch+mv macrofusion should execute together. The visible latency to other instructions is 1 cycle. The hardware can predicate most ALU instructions, not just mv. So even better would be putting the xor after the branch instead of a mv.
[Bug target/113095] New: RISC-V: movcc no longer used for coremark crc functions with -mtune=sifive-7-series
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113095 Bug ID: 113095 Summary: RISC-V: movcc no longer used for coremark crc functions with -mtune=sifive-7-series Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- In gcc 12, the crc functions in coremark used the "movcc" macrofusion pattern of a branch over a single mv. In gcc 13, a longer branchless sequence is used. https://godbolt.org/z/9xvcxo5Y9
[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201 --- Comment #7 from Craig Topper --- Here is my attempt and defining scalar crypto intrinsics https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44
[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201 --- Comment #3 from Craig Topper --- I don't have a testsuite. I saw that gcc had crypto builtins and I happened to noticed the tests in gcc weren't passing constant arguments. We also have a divergence in names between clang and gcc for some crypto builtins. We really need to define a scalar crypto intrinsic header file.
[Bug target/110201] New: RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201 Bug ID: 110201 Summary: RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- The __builtin_riscv_sm4ks and __builtin_riscv_sm4ed builtins don't enforce that the byte select should be an immediate. if an immediate provided they still use a register. int32_t foo1(int32_t rs1, int32_t rs2, int bs) { return __builtin_riscv_sm4ks(rs1,rs2,bs); } int32_t foo2(int32_t rs1, int32_t rs2, int bs) { return __builtin_riscv_sm4ed(rs1,rs2,bs); } int32_t foo3(int32_t rs1, int32_t rs2, int bs) { return __builtin_riscv_sm4ks(rs1,rs2,0); } int32_t foo4(int32_t rs1, int32_t rs2, int bs) { return __builtin_riscv_sm4ed(rs1,rs2,0); } https://godbolt.org/z/jadKva9M9
[Bug target/109972] New: RISC-V: Could use umodsi3/udivsi3/divsi3 libcalls for 32-bit division/remainder on RV64 without M extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109972 Bug ID: 109972 Summary: RISC-V: Could use umodsi3/udivsi3/divsi3 libcalls for 32-bit division/remainder on RV64 without M extension Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- There's an opportunity to improve code size for 32-bit division and remainder on RV64 without the M extension. Currently gcc calls the umoddi3/udivdi3/divdi3 functions by zero/sign extending the operands. In the case of the unsigned functions this requires two shifts to zero the upper bits. For signed, it's two sext.w if the operands are not already sign extended. All 3 functions are followed by a sext.w if the result needs be sign extended. libgcc contains umodsi3/udivsi3/divsi3 functions that handle the zero extending of inputs and sign extending the result. Internally they call the di3 functions to do the computation. These functions could be used to reduce code size at the caller. There is no signed modsi3 function in libgcc. Probably because umoddi3(sext(X), sext(Y)) is guaranteed to produce a result that is sign extended. gcc seems to not know this and still emits a sext.w after the call to umoddi3. godbolt https://godbolt.org/z/ax3Khc6cM unsigned divu(unsigned x, unsigned y) { return x / y; } unsigned remu(unsigned x, unsigned y) { return x % y; } int div(int x, int y) { return x / y; } int rem(int x, int y) { return x % y; }
[Bug target/95774] New: __builtin_cpu_is can't detect cooperlake
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95774 Bug ID: 95774 Summary: __builtin_cpu_is can't detect cooperlake Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- Cooperlake appears to be defined the enum in libgcc for __builtin_cpu_is, but there is no code to use that enum value when identifying the cpu in libgcc.
[Bug target/95660] New: get_intel_cpu in cpuinfo.c contains unnecessary check for brand_id
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95660 Bug ID: 95660 Summary: get_intel_cpu in cpuinfo.c contains unnecessary check for brand_id Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- Brand id or brand index was a feature that briefly existed in some Pentium III and Pentium 4 CPUs. The code will only look at family/model if that field is 0. I believe the CPUs that had brand_id would still have had valid family/model. The brand_id just gives a marketing name for the CPU. I don't think any of the CPUs that have a non-zero brand_id are supported by the switch that its guarding. So this doesn't really matter its just extra code.
[Bug inline-asm/95121] Wrong code generated: low-byte registers are silently used in place of their corresponding high-byte registers (ah, bh, ch, dh)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95121 Craig Topper changed: What|Removed |Added CC||craig.topper at gmail dot com --- Comment #6 from Craig Topper --- Could gcc at least emit an error?
[Bug target/94977] New: Some X86 inline assembly modifiers are not documented in the web documentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94977 Bug ID: 94977 Summary: Some X86 inline assembly modifiers are not documented in the web documentation Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- gcc supports some modifiers for inline assembly on X86 that are not documented in the table at 6.47.2.8 x86 Operand Modifiers here https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html For example gcc supports %x0 in inline assembly to indicate to always print a register as its xmm name. Similar there is also %t0 and %g0 for ymm and zmm respectively.
[Bug target/91704] New: [X86] Codegen for _mm256_cmpgt_epi8 is affected by -funsigned-char
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91704 Bug ID: 91704 Summary: [X86] Codegen for _mm256_cmpgt_epi8 is affected by -funsigned-char Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- This intrinsic should always do a signed compare, but it uses __v32qi in its implementation which uses "char" rather than "signed char" in its typedef. This cause it to follow the -funsigned-char/-fsigned-char setting. The 128-bit equivalent uses a separate __v16qs typedef to avoid this.
[Bug target/91696] [X86] AVX512 intrinsics that only support SAE should allow (_MM_FOUND_NO_EXC|_MM_FROUND_CUR_DIRECTION) to match icc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91696 --- Comment #1 from Craig Topper --- I've also submitted a patch to clang to do the same. https://reviews.llvm.org/D67289
[Bug target/91696] New: [X86] AVX512 intrinsics that only support SAE should allow (_MM_FOUND_NO_EXC|_MM_FROUND_CUR_DIRECTION) to match icc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91696 Bug ID: 91696 Summary: [X86] AVX512 intrinsics that only support SAE should allow (_MM_FOUND_NO_EXC|_MM_FROUND_CUR_DIRECTION) to match icc Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- The intrinsics that only support SAE like _mm512_cmp_round_ps_mask currently only allow __MM_FROUND_NO_EXC or _MM_FROUND_CUR_DIRECTION to be specified in the sae immediate parameter. _MM_FROUND_NO_EXC isn't really the opposite of _MM_FROUND_CUR_DIRECTION based on the format of this type of immediate. _MM_FROUND_NO_EXC by itself is equivalent to (_MM_FROUND_NO_EXC|_MM_FROUND_TO_NEAREST_INT) since _MM_FROUND_TO_NEAREST_INT is 0. But these instructions don't perform any rounding so the rounding bits don't matter. It's a nice convenience that the single constants _MM_FROUND_NO_EXC and _MM_FROUND_CUR_DIRECTION can be used. But since the rounding mode doesn't matter (_MM_FROUND_CUR_DIRECTION|_MM_FROUND_NO_EXC) should also be allowed.
[Bug libgcc/91695] New: [X86] get_available_features only sets FEATURE_GFNI and FEATURE_VPCLMULQDQ when avx512_usable is true
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91695 Bug ID: 91695 Summary: [X86] get_available_features only sets FEATURE_GFNI and FEATURE_VPCLMULQDQ when avx512_usable is true Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- GFNI has instructions that have legacy SSE-like encodings. It also has VEX and EVEX encodings. VPCLMULQDQ has a VEX encoding for 256-bit and EVEX encoding 256-bit and 512-bit. So these features are usable without AVX512. It probably makes sense to qualify VPCLMULQDQ with avx_usable since it requires at least ymm registers.
[Bug target/86466] New: [X86] gcc checks the range of the immediate to _mm_blend_ps, but not _mm_blend_epi32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86466 Bug ID: 86466 Summary: [X86] gcc checks the range of the immediate to _mm_blend_ps, but not _mm_blend_epi32 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- These intrinsics are both blends of four 32-bit values. gcc seems to check the range for the floating point version, but not the integer version. Perhaps this was overlooked when _mm_blend_epi32 was added since it was added with avx2 while the floating point was added with sse4.1 Test case https://godbolt.org/g/9QGm9j
[Bug target/86444] New: [X86] Implementation of SSE comi/ucomi intrinsics does not match recent versions of icc, clang, or MSVC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86444 Bug ID: 86444 Summary: [X86] Implementation of SSE comi/ucomi intrinsics does not match recent versions of icc, clang, or MSVC Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- It looks like gcc does not match the behavior of the most recent versions of icc, clang, and MSVC with respect to the behavior or NaNs in the COMI intrinsics. The other compilers are all returning 0 when the compare result is unordered. As can be seen here: https://godbolt.org/g/xxEKqg Clang changed to this behavior in version 3.9. According to this comment from https://bugs.llvm.org/show_bug.cgi?id=28510#c10, the original icc behavior was the same as gcc’s current behavior, but it was changed at least 10 years ago.
[Bug target/85530] New: [X86] _mm512_mullox_epi64 and _mm512_mask_mullox_epi64 not implemented
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85530 Bug ID: 85530 Summary: [X86] _mm512_mullox_epi64 and _mm512_mask_mullox_epi64 not implemented Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- icc has these intrinsics which emulate a v8di multiply using multiple pmuludqs when avx512f is enabled, but avx512dq is not enabled. If avx512dq is enabled it uses vpmullq. I just added support to clang in r330923. Would be good if gcc could implement it too.
[Bug target/85511] [X86] Using __builtin_ia32_writeeflags_u32 in 64-bit mode causes internal compiler error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85511 --- Comment #2 from Craig Topper --- Should this builtin even be allowed in 64-bit mode?
[Bug target/85511] New: [X86] Using __builtin_ia32_writeeflags_u32 in 64-bit mode causes internal compiler error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85511 Bug ID: 85511 Summary: [X86] Using __builtin_ia32_writeeflags_u32 in 64-bit mode causes internal compiler error Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- This code void foo(unsigned bar) { return __builtin_ia32_writeeflags_u32(bar); } Throws this error in 64-bit mode during RTL pass: expand : In function 'foo': :2:10: internal compiler error: in copy_to_mode_reg, at explow.c:630 return __builtin_ia32_writeeflags_u32(bar); ^~~ mmap: Invalid argument Please submit a full bug report, with preprocessed source if appropriate. See <https://gcc.gnu.org/bugs/> for instructions. Compiler returned: 1
[Bug target/83618] New: _rdpid_u32 doesn't work on 64-bit targets as gas expects the 64-bit register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83618 Bug ID: 83618 Summary: _rdpid_u32 doesn't work on 64-bit targets as gas expects the 64-bit register Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- Trying to compile the _rdpid_u32 intrinsic on x86-64 causes the assembler to print this /tmp/ccbdTr5q.s: Assembler messages: /tmp/ccbdTr5q.s:13: Error: operand type mismatch for `rdpid' It appears that the assembler expects a 64-bit register in 64-bit mode. This seems to be due to an odd quirk of Intel's documentation that says the instruction writes a 64-bit register in 64-bit mode and a 32-bit register in 32-bit mode. But in reality its reading the TSC_AUX_MSR which is only 32-bit so I suspect it always zeros the upper bits of the register in 64-bit mode. Which would be the expected behavior if it had been documented as always using a 32-bit register. So I don't know why the docs made this distinction. Not sure if this should be fixed in gcc or if gas should be taught to accept both a 32-bit or 64-bit register.
[Bug target/83546] New: -march=silvermont doesn't enable rdrnd by default despite what docs say
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83546 Bug ID: 83546 Summary: -march=silvermont doesn't enable rdrnd by default despite what docs say Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- The documentation https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html says 'silvermont' enables rdrnd, but that doesn't appear to happen. I think it may have worked in r205275 when 'slm' listed all of its features separately. But I think it was then broken in r206178 when PTA_SILVERMONT was introduced and defined like this #define PTA_SILVERMONT \ (PTA_WESTMERE | PTA_MOVBE) PTA_WESTMERE doesn't and shouldn't include RDRND.
[Bug middle-end/80042] gcc thinks sin/cos don't set errno
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80042 --- Comment #3 from Craig Topper --- No -fmath-errno has no effect. It does have effect on other functions such as cosh or acos.
[Bug middle-end/80042] New: gcc thinks sin/cos don't set errno
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80042 Bug ID: 80042 Summary: gcc thinks sin/cos don't set errno Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: craig.topper at gmail dot com Target Milestone: --- Created attachment 40974 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40974&action=edit Test that uses sin and errno As of glibc version 2.10, sin and cos set errno when the input is infinity. gcc thinks sin/cos never write errno and will move them around relative to reads of errno. The attached test case will return a failing error code at O0 and a passing error code at O2 when the errno read after the sin call is optimized out. glibc commit https://sourceware.org/git/?p=glibc.git;a=commit;f=sysdeps/ieee754/dbl-64/s_sin.c;h=0c59a1963e948c546e0d3e34de974c7e71de1134 I believe sincos was also changed to update errno in 2015 https://sourceware.org/bugzilla/show_bug.cgi?id=15467
[Bug driver/50740] New: CPUID leaf 7 for BMI/BMI2/AVX2 feature detection not qualified with max_level and doesn't use subleaf
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50740 Bug #: 50740 Summary: CPUID leaf 7 for BMI/BMI2/AVX2 feature detection not qualified with max_level and doesn't use subleaf Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver AssignedTo: unassig...@gcc.gnu.org ReportedBy: craig.top...@gmail.com The cpuid code for detecting BMI, BMI2, and AVX2 support needs to be qualified with max_level >= 7. Additionally, it should use __cpuid_count instead of __cpuid because leaf 7 uses subleafs just like leaf 4. Relevant code from i386-driver.c __cpuid (0x7, eax, ebx, ecx, edx); has_bmi = ebx & bit_BMI; has_avx2 = ebx & bit_AVX2; has_bmi2 = ebx & bit_BMI2;