[Bug target/115789] gcc miscompile itself with CFLAGS -O3 -march=rv64gcv_zvl256b

2024-07-04 Thread craig.topper at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115789

--- Comment #5 from Craig Topper  ---
Isn’t -mstrict-align the default? It is in LLVM.

[Bug target/114963] New: RISCV -msave-restore -fno-omit-frame-pointer does not emit save/restore library calls

2024-05-06 Thread craig.topper at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114963

Bug ID: 114963
   Summary: RISCV -msave-restore -fno-omit-frame-pointer does not
emit save/restore library calls
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

It appears that combining -msave-restore and -fno-omit-frame-pointer results in
no save/restore library calls being emitted.

I also found this blog post that seems to imply it worked at some point, but
broke unwinding.
https://www.codethink.co.uk/articles/2023/riscv-stack-unwinding-bug/ I checked
a few previous versions on compiler explorer and was not able to get the
library calls.

[Bug target/113095] RISC-V: movcc no longer used for coremark crc functions with -mtune=sifive-7-series

2023-12-20 Thread craig.topper at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113095

--- Comment #3 from Craig Topper  ---
Our FPGA data is showing this as a 5% regression. I'll try to check on an
Unmatched board to confirm.

[Bug target/113095] RISC-V: movcc no longer used for coremark crc functions with -mtune=sifive-7-series

2023-12-20 Thread craig.topper at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113095

--- Comment #2 from Craig Topper  ---
The branch+mv macrofusion should execute together. The visible latency to other
instructions is 1 cycle.

The hardware can predicate most ALU instructions, not just mv. So even better
would be putting the xor after the branch instead of a mv.

[Bug target/113095] New: RISC-V: movcc no longer used for coremark crc functions with -mtune=sifive-7-series

2023-12-20 Thread craig.topper at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113095

Bug ID: 113095
   Summary: RISC-V: movcc no longer used for coremark crc
functions with -mtune=sifive-7-series
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

In gcc 12, the crc functions in coremark used the "movcc" macrofusion pattern
of a branch over a single mv. In gcc 13, a longer branchless sequence is used.

https://godbolt.org/z/9xvcxo5Y9

[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-07-05 Thread craig.topper at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

--- Comment #7 from Craig Topper  ---
Here is my attempt and defining scalar crypto intrinsics
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44

[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-19 Thread craig.topper at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

--- Comment #3 from Craig Topper  ---
I don't have a testsuite. I saw that gcc had crypto builtins and I happened to
noticed the tests in gcc weren't passing constant arguments.

We also have a divergence in names between clang and gcc for some crypto
builtins. We really need to define a scalar crypto intrinsic header file.

[Bug target/110201] New: RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-09 Thread craig.topper at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

Bug ID: 110201
   Summary: RISC-V: __builtin_riscv_sm4ks and
__builtin_riscv_sm4ed produce invalid assembly
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

The __builtin_riscv_sm4ks and __builtin_riscv_sm4ed builtins don't enforce that
the byte select should be an immediate. if an immediate provided they still use
a register.

int32_t foo1(int32_t rs1, int32_t rs2, int bs)
{
return __builtin_riscv_sm4ks(rs1,rs2,bs);
}

int32_t foo2(int32_t rs1, int32_t rs2, int bs)
{
return __builtin_riscv_sm4ed(rs1,rs2,bs);
}

int32_t foo3(int32_t rs1, int32_t rs2, int bs)
{
return __builtin_riscv_sm4ks(rs1,rs2,0);
}

int32_t foo4(int32_t rs1, int32_t rs2, int bs)
{
return __builtin_riscv_sm4ed(rs1,rs2,0);
}

https://godbolt.org/z/jadKva9M9

[Bug target/109972] New: RISC-V: Could use umodsi3/udivsi3/divsi3 libcalls for 32-bit division/remainder on RV64 without M extension

2023-05-25 Thread craig.topper at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109972

Bug ID: 109972
   Summary: RISC-V: Could use umodsi3/udivsi3/divsi3 libcalls for
32-bit division/remainder on RV64 without M extension
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

There's an opportunity to improve code size for 32-bit division and remainder
on RV64 without the M extension.

Currently gcc calls the umoddi3/udivdi3/divdi3 functions by zero/sign extending
the operands. In the case of the unsigned functions this requires two shifts to
zero the upper bits. For signed, it's two sext.w if the operands are not
already sign extended.

All 3 functions are followed by a sext.w if the result needs be sign extended.

libgcc contains umodsi3/udivsi3/divsi3 functions that handle the zero extending
of inputs and sign extending the result. Internally they call the di3 functions
to do the computation. These functions could be used to reduce code size at the
caller.

There is no signed modsi3 function in libgcc. Probably because umoddi3(sext(X),
sext(Y)) is guaranteed to produce a result that is sign extended. gcc seems to
not know this and still emits a sext.w after the call to umoddi3.

godbolt https://godbolt.org/z/ax3Khc6cM

unsigned divu(unsigned x, unsigned y) {
  return x / y;
}

unsigned remu(unsigned x, unsigned y) {
  return x % y;
}

int div(int x, int y) {
  return x / y;
}

int rem(int x, int y) {
  return x % y;
}

[Bug target/95774] New: __builtin_cpu_is can't detect cooperlake

2020-06-19 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95774

Bug ID: 95774
   Summary: __builtin_cpu_is can't detect cooperlake
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

Cooperlake appears to be defined the enum in libgcc for __builtin_cpu_is, but
there is no code to use that enum value when identifying the cpu in libgcc.

[Bug target/95660] New: get_intel_cpu in cpuinfo.c contains unnecessary check for brand_id

2020-06-12 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95660

Bug ID: 95660
   Summary: get_intel_cpu in cpuinfo.c contains unnecessary check
for brand_id
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

Brand id or brand index was a feature that briefly existed in some Pentium III
and Pentium 4 CPUs. The code will only look at family/model if that field is 0.
I believe the CPUs that had brand_id would still have had valid family/model.
The brand_id just gives a marketing name for the CPU. I don't think any of the
CPUs that have a non-zero brand_id are supported by the switch that its
guarding. So this doesn't really matter its just extra code.

[Bug inline-asm/95121] Wrong code generated: low-byte registers are silently used in place of their corresponding high-byte registers (ah, bh, ch, dh)

2020-05-14 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95121

Craig Topper  changed:

   What|Removed |Added

 CC||craig.topper at gmail dot com

--- Comment #6 from Craig Topper  ---
Could gcc at least emit an error?

[Bug target/94977] New: Some X86 inline assembly modifiers are not documented in the web documentation

2020-05-06 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94977

Bug ID: 94977
   Summary: Some X86 inline assembly modifiers are not documented
in the web documentation
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

gcc supports some modifiers for inline assembly on X86 that are not documented
in the table at 6.47.2.8 x86 Operand Modifiers here
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

For example gcc supports %x0 in inline assembly to indicate to always print a
register as its xmm name. Similar there is also %t0 and %g0 for ymm and zmm
respectively.

[Bug target/91704] New: [X86] Codegen for _mm256_cmpgt_epi8 is affected by -funsigned-char

2019-09-08 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91704

Bug ID: 91704
   Summary: [X86] Codegen for _mm256_cmpgt_epi8 is affected by
-funsigned-char
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

This intrinsic should always do a signed compare, but it uses __v32qi in its
implementation which uses "char" rather than "signed char" in its typedef. This
cause it to follow the -funsigned-char/-fsigned-char setting. The 128-bit
equivalent uses a separate __v16qs typedef to avoid this.

[Bug target/91696] [X86] AVX512 intrinsics that only support SAE should allow (_MM_FOUND_NO_EXC|_MM_FROUND_CUR_DIRECTION) to match icc

2019-09-06 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91696

--- Comment #1 from Craig Topper  ---
I've also submitted a patch to clang to do the same.
https://reviews.llvm.org/D67289

[Bug target/91696] New: [X86] AVX512 intrinsics that only support SAE should allow (_MM_FOUND_NO_EXC|_MM_FROUND_CUR_DIRECTION) to match icc

2019-09-06 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91696

Bug ID: 91696
   Summary: [X86] AVX512 intrinsics that only support SAE should
allow (_MM_FOUND_NO_EXC|_MM_FROUND_CUR_DIRECTION) to
match icc
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

The intrinsics that only support SAE like _mm512_cmp_round_ps_mask currently
only allow __MM_FROUND_NO_EXC or _MM_FROUND_CUR_DIRECTION to be specified in
the sae immediate parameter.

_MM_FROUND_NO_EXC isn't really the opposite of _MM_FROUND_CUR_DIRECTION based
on the format of this type of immediate. _MM_FROUND_NO_EXC by itself is
equivalent to (_MM_FROUND_NO_EXC|_MM_FROUND_TO_NEAREST_INT) since
_MM_FROUND_TO_NEAREST_INT is 0. But these instructions don't perform any
rounding so the rounding bits don't matter. It's a nice convenience that the
single constants _MM_FROUND_NO_EXC and _MM_FROUND_CUR_DIRECTION can be used.
But since the rounding mode doesn't matter
(_MM_FROUND_CUR_DIRECTION|_MM_FROUND_NO_EXC) should also be allowed.

[Bug libgcc/91695] New: [X86] get_available_features only sets FEATURE_GFNI and FEATURE_VPCLMULQDQ when avx512_usable is true

2019-09-06 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91695

Bug ID: 91695
   Summary: [X86] get_available_features only sets FEATURE_GFNI
and FEATURE_VPCLMULQDQ when avx512_usable is true
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

GFNI has instructions that have legacy SSE-like encodings. It also has VEX and
EVEX encodings. VPCLMULQDQ has a VEX encoding for 256-bit and EVEX encoding
256-bit and 512-bit. So these features are usable without AVX512.

It probably makes sense to qualify VPCLMULQDQ with avx_usable since it requires
at least ymm registers.

[Bug target/86466] New: [X86] gcc checks the range of the immediate to _mm_blend_ps, but not _mm_blend_epi32

2018-07-10 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86466

Bug ID: 86466
   Summary: [X86] gcc checks the range of the immediate to
_mm_blend_ps, but not _mm_blend_epi32
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

These intrinsics are both blends of four 32-bit values. gcc seems to check the
range for the floating point version, but not the integer version.

Perhaps this was overlooked when _mm_blend_epi32 was added since it was added
with avx2 while the floating point was added with sse4.1

Test case
https://godbolt.org/g/9QGm9j

[Bug target/86444] New: [X86] Implementation of SSE comi/ucomi intrinsics does not match recent versions of icc, clang, or MSVC

2018-07-09 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86444

Bug ID: 86444
   Summary: [X86] Implementation of SSE comi/ucomi intrinsics does
not match recent versions of icc, clang, or MSVC
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

It looks like gcc does not match the behavior of the most recent versions of
icc, clang, and MSVC with respect to the behavior or NaNs in the COMI
intrinsics. The other compilers are all returning 0 when the compare result is
unordered. As can be seen here: https://godbolt.org/g/xxEKqg

Clang changed to this behavior in version 3.9. According to this comment from 
https://bugs.llvm.org/show_bug.cgi?id=28510#c10, the original icc behavior was
the same as gcc’s current behavior, but it was changed at least 10 years ago.

[Bug target/85530] New: [X86] _mm512_mullox_epi64 and _mm512_mask_mullox_epi64 not implemented

2018-04-25 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85530

Bug ID: 85530
   Summary: [X86] _mm512_mullox_epi64 and _mm512_mask_mullox_epi64
not implemented
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

icc has these intrinsics which emulate a v8di multiply using multiple pmuludqs
when avx512f is enabled, but avx512dq is not enabled. If avx512dq is enabled it
uses vpmullq.

I just added support to clang in r330923. Would be good if gcc could implement
it too.

[Bug target/85511] [X86] Using __builtin_ia32_writeeflags_u32 in 64-bit mode causes internal compiler error

2018-04-23 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85511

--- Comment #2 from Craig Topper  ---
Should this builtin even be allowed in 64-bit mode?

[Bug target/85511] New: [X86] Using __builtin_ia32_writeeflags_u32 in 64-bit mode causes internal compiler error

2018-04-23 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85511

Bug ID: 85511
   Summary: [X86] Using __builtin_ia32_writeeflags_u32 in 64-bit
mode causes internal compiler error
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

This code

void foo(unsigned bar) {
  return __builtin_ia32_writeeflags_u32(bar);
}



Throws this error in 64-bit mode

during RTL pass: expand

: In function 'foo':

:2:10: internal compiler error: in copy_to_mode_reg, at explow.c:630

   return __builtin_ia32_writeeflags_u32(bar);

  ^~~

mmap: Invalid argument

Please submit a full bug report,

with preprocessed source if appropriate.

See <https://gcc.gnu.org/bugs/> for instructions.

Compiler returned: 1

[Bug target/83618] New: _rdpid_u32 doesn't work on 64-bit targets as gas expects the 64-bit register

2017-12-28 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83618

Bug ID: 83618
   Summary: _rdpid_u32 doesn't work on 64-bit targets as gas
expects the 64-bit register
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

Trying to compile the _rdpid_u32 intrinsic on x86-64 causes the assembler to
print this

/tmp/ccbdTr5q.s: Assembler messages:
/tmp/ccbdTr5q.s:13: Error: operand type mismatch for `rdpid'


It appears that the assembler expects a 64-bit register in 64-bit mode.

This seems to be due to an odd quirk of Intel's documentation that says the
instruction writes a 64-bit register in 64-bit mode and a 32-bit register in
32-bit mode. But in reality its reading the TSC_AUX_MSR which is only 32-bit so
I suspect it always zeros the upper bits of the register in 64-bit mode. Which
would be the expected behavior if it had been documented as always using a
32-bit register. So I don't know why the docs made this distinction.

Not sure if this should be fixed in gcc or if gas should be taught to accept
both a 32-bit or 64-bit register.

[Bug target/83546] New: -march=silvermont doesn't enable rdrnd by default despite what docs say

2017-12-21 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83546

Bug ID: 83546
   Summary: -march=silvermont doesn't enable rdrnd by default
despite what docs say
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

The documentation https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html says
'silvermont' enables rdrnd, but that doesn't appear to happen. 

I think it may have worked in r205275 when 'slm' listed all of its features
separately.

But I think it was then broken in r206178 when PTA_SILVERMONT was introduced
and defined like this

  #define PTA_SILVERMONT \
(PTA_WESTMERE | PTA_MOVBE)


PTA_WESTMERE doesn't and shouldn't include RDRND.

[Bug middle-end/80042] gcc thinks sin/cos don't set errno

2017-03-15 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80042

--- Comment #3 from Craig Topper  ---
No -fmath-errno has no effect. It does have effect on other functions such as
cosh or acos.

[Bug middle-end/80042] New: gcc thinks sin/cos don't set errno

2017-03-14 Thread craig.topper at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80042

Bug ID: 80042
   Summary: gcc thinks sin/cos don't set errno
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: craig.topper at gmail dot com
  Target Milestone: ---

Created attachment 40974
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40974&action=edit
Test that uses sin and errno

As of glibc version 2.10, sin and cos set errno when the input is infinity. gcc
thinks sin/cos never write errno and will move them around relative to reads of
errno.

The attached test case will return a failing error code at O0 and a passing
error code at O2 when the errno read after the sin call is optimized out.


glibc commit
https://sourceware.org/git/?p=glibc.git;a=commit;f=sysdeps/ieee754/dbl-64/s_sin.c;h=0c59a1963e948c546e0d3e34de974c7e71de1134

I believe sincos was also changed to update errno in 2015
https://sourceware.org/bugzilla/show_bug.cgi?id=15467

[Bug driver/50740] New: CPUID leaf 7 for BMI/BMI2/AVX2 feature detection not qualified with max_level and doesn't use subleaf

2011-10-15 Thread craig.topper at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50740

 Bug #: 50740
   Summary: CPUID leaf 7 for BMI/BMI2/AVX2 feature detection not
qualified with max_level and doesn't use subleaf
Classification: Unclassified
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: craig.top...@gmail.com


The cpuid code for detecting BMI, BMI2, and AVX2 support needs to be qualified
with max_level >= 7. Additionally, it should use __cpuid_count instead of
__cpuid because leaf 7 uses subleafs just like leaf 4.

Relevant code from i386-driver.c

  __cpuid (0x7, eax, ebx, ecx, edx);

  has_bmi = ebx & bit_BMI;
  has_avx2 = ebx & bit_AVX2;
  has_bmi2 = ebx & bit_BMI2;