[Bug target/97016] New: [i386] _MM_CMPINT_ENUM type is missing

2020-09-10 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97016

Bug ID: 97016
   Summary: [i386] _MM_CMPINT_ENUM type is missing
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

The _mm{,256,512}_cmp_epi{8,16,32,64}_mask functions take a _MM_CMPINT_ENUM
according to the Intel Intrinsics Guide (e.g.,
<https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_cmp_epi32_mask=697>),
but in GCC they take an int.

FWIW ICC and clang both define an enum.  I haven't checked MSVC.

Using an int is more consistent with other Intel SIMD APIs, but it seems Intel
has chosen to break tradition here and use an enum.

[Bug target/53784] Scalar vector binary operation - compilation fails with -std=c90/c99/c11 (-fexcess-precision=standard)

2020-09-03 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53784

Evan Nemerson  changed:

   What|Removed |Added

 CC||e...@coeus-group.com

--- Comment #4 from Evan Nemerson  ---
This also occurs on s390x.  Just like ot i686, -std=c99 or
-fexcess-precision=standard triggers it.

U

[Bug target/96476] New: [Request] expose preferred vector width to preprocessor

2020-08-04 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96476

Bug ID: 96476
   Summary: [Request] expose preferred vector width to
preprocessor
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

It would be nice if the value passed to -mprefer-vector-width=* were exposed to
the preprocessor.  Something like `#define __PREFERRED_VECTOR_WIDTH 256` when
-mprefer-vector-width=256 is passed.

I'd be really happy if it was always present when vectorization is enabled
(i.e., 128 with SSE, 256 with AVX), but satisfied if it were only defined when
-mprefer-vector-width is specified.

I'm filing this under the "target" component since -mprefer-vector-width is
x86-specific according to the manual, but it seems like it could be useful
elsewhere, too.

My use case is a bit niche; I'd like to use it in SIMD Everywhere
(<https://github.com/simd-everywhere/simde>) to limit the vector size even when
you call "larger" functions.  For example, simde_mm512_add_ps would be
implemented with two calls to simde_mm256_add_ps with -mprefer-vector-width=256
even if AVX-512F support is enabled.

That said, I think this could be useful for any code which mixes
auto-vectorization with intrinsics.

[Bug target/96313] New: [AArch64] vqmovun* return types should be unsigned

2020-07-24 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96313

Bug ID: 96313
   Summary: [AArch64] vqmovun* return types should be unsigned
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

GCC has signed return values for the vqmovun* functions, but they should be
unsigned.  See
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?search=vqmovun

Trivial test case:

  #include 

  uint16_t foo(int32_t v) {
return vqmovuns_s32(v);
  }

[Bug target/96174] New: AVX-512 functions missing when compiled without optimization

2020-07-12 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96174

Bug ID: 96174
   Summary: AVX-512 functions missing when compiled without
optimization
   Product: gcc
   Version: 10.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

The avx512fintrin.h header sometimes uses different implementations depending
on whether __OPTIMIZE__ is defined, but many functions are missing if
__OPTIMIZE__ is not defined.

Here is a trivial test case:

  #include 

  __mmask16 foo(__m512 a, __m512 b) {
return _mm512_cmplt_ps_mask(a, b);
  }

On Compiler Explorer: https://godbolt.org/z/83jP63

I ran into this with _mm512_cmplt_ps_mask, but it looks like this all the
_mm512_cmp*_{pd,ps}_mask functions have the same problem.

[Bug preprocessor/95782] New: [ppc64le] ICE in _cpp_pop_context

2020-06-20 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95782

Bug ID: 95782
   Summary: [ppc64le] ICE in _cpp_pop_context
   Product: gcc
   Version: 10.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: preprocessor
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

I'm running into an ICE on ppc64le:

  internal compiler error: in _cpp_pop_context, at libcpp/macro.c:2644

Here is a reproducer:

  #define a
  #define b(d) d
  #if defined(a)  
  b(vector double)
  #endif

Just running `gcc -E test.c` when targeting ppc64le triggers the issue.  It
happens with at least GCC 9 and 10.

[Bug target/95471] vrndvq_f32 not supported on armv8

2020-06-02 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95471

--- Comment #2 from Evan Nemerson  ---
In that case shouldn't the header be adjusted to not define the vrndvq_f32
function unless it is enabled?

It is already guarded by a check for __ARM_ARCH >= 8 (see
,
but obviously that isn't sufficient.  If nothing else it would help document
the actual requirements, which would be great since the flags and macros on ARM
are a bit of a mess.

[Bug target/95483] New: [i386] Missing SIMD functions

2020-06-02 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95483

Bug ID: 95483
   Summary: [i386] Missing SIMD functions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Created attachment 48663
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48663=edit
Script to generate the list

I noticed the _mm_loadu_epi* functions were missing, so I threw together a
trivial script to look for missing functions based on the XML data backing the
Intel Intrinsics Guide
<https://software.intel.com/sites/landingpage/IntrinsicsGuide/>.  After
filtering out the the SVML and Other functions it came up with 122 results.

I attached the script in case anyone wants to reproduce.  It's ugly and slow
(it was really meant to be a throw-away), but it seems to work.  Just run it in
gcc/config/i386, and pass it the location of the IIG XML.

Here is the list:

  AVX _mm256_cvtsi256_si32
  AVX2 _mm_broadcastsd_pd
  AVX2 _mm_broadcastsi128_si256
  AVX-512 _mm512_storeu_epi16
  AVX-512 _mm512_storeu_epi8
  AVX-512 _mm256_storeu_epi16
  AVX-512 _mm256_storeu_epi8
  AVX-512 _mm_storeu_epi16
  AVX-512 _mm_storeu_epi8
  AVX-512 _mm512_loadu_epi16
  AVX-512 _mm512_loadu_epi8
  AVX-512 _mm256_loadu_epi16
  AVX-512 _mm256_loadu_epi8
  AVX-512 _mm_loadu_epi16
  AVX-512 _mm_loadu_epi8
  AVX-512 _mm512_mask_reduce_round_pd
  AVX-512 _mm512_maskz_reduce_round_pd
  AVX-512 _mm512_reduce_round_pd
  AVX-512 _mm512_mask_reduce_round_ps
  AVX-512 _mm512_maskz_reduce_round_ps
  AVX-512 _mm512_reduce_round_ps
  AVX-512 _mm_mask_reduce_round_sd
  AVX-512 _mm_maskz_reduce_round_sd
  AVX-512 _mm_reduce_round_sd
  AVX-512 _mm_mask_reduce_round_ss
  AVX-512 _mm_maskz_reduce_round_ss
  AVX-512 _mm_reduce_round_ss
  AVX-512 _mm_mask_rcp28_round_sd
  AVX-512 _mm_mask_rcp28_sd
  AVX-512 _mm_maskz_rcp28_round_sd
  AVX-512 _mm_maskz_rcp28_sd
  AVX-512 _mm_mask_rcp28_round_ss
  AVX-512 _mm_mask_rcp28_ss
  AVX-512 _mm_maskz_rcp28_round_ss
  AVX-512 _mm_maskz_rcp28_ss
  AVX-512 _mm_mask_rsqrt28_round_sd
  AVX-512 _mm_mask_rsqrt28_sd
  AVX-512 _mm_maskz_rsqrt28_round_sd
  AVX-512 _mm_maskz_rsqrt28_sd
  AVX-512 _mm_mask_rsqrt28_round_ss
  AVX-512 _mm_mask_rsqrt28_ss
  AVX-512 _mm_maskz_rsqrt28_round_ss
  AVX-512 _mm_maskz_rsqrt28_ss
  AVX-512 _mm256_mask_cvt_roundps_ph
  AVX-512 _mm256_maskz_cvt_roundps_ph
  AVX-512 _mm_mask_cvt_roundps_ph
  AVX-512 _mm_maskz_cvt_roundps_ph
  AVX-512 _mm256_store_epi32
  AVX-512 _mm_store_epi32
  AVX-512 _mm256_loadu_epi64
  AVX-512 _mm256_loadu_epi32
  AVX-512 _mm_loadu_epi64
  AVX-512 _mm_loadu_epi32
  AVX-512 _mm256_load_epi64
  AVX-512 _mm256_load_epi32
  AVX-512 _mm_load_epi64
  AVX-512 _mm_load_epi32
  AVX-512 _mm_cvtsd_i32
  AVX-512 _mm_cvtsd_i64
  AVX-512 _mm_mask_cvt_roundsd_ss
  AVX-512 _mm_mask_cvtsd_ss
  AVX-512 _mm_maskz_cvt_roundsd_ss
  AVX-512 _mm_maskz_cvtsd_ss
  AVX-512 _mm_cvti32_sd
  AVX-512 _mm_cvti64_sd
  AVX-512 _mm_cvti32_ss
  AVX-512 _mm_cvti64_ss
  AVX-512 _mm_mask_cvt_roundss_sd
  AVX-512 _mm_mask_cvtss_sd
  AVX-512 _mm_maskz_cvt_roundss_sd
  AVX-512 _mm_maskz_cvtss_sd
  AVX-512 _mm_cvtss_i32
  AVX-512 _mm_cvtss_i64
  AVX-512 _mm_mask_scalef_sd
  AVX-512 _mm_maskz_scalef_sd
  AVX-512 _mm_mask_scalef_ss
  AVX-512 _mm_maskz_scalef_ss
  AVX-512 _mm_mask_sqrt_sd
  AVX-512 _mm_maskz_sqrt_sd
  AVX-512 _mm_mask_sqrt_ss
  AVX-512 _mm_maskz_sqrt_ss
  AVX-512 _mm512_cvtsi512_si32
  AVX-512/KNC _mm512_mask_permutevar_epi32
  AVX-512/KNC _mm512_permutevar_epi32
  AVX-512/KNC _mm512_cvtpslo_pd
  AVX-512/KNC _mm512_mask_cvtpslo_pd
  AVX-512/KNC _mm512_cvtepi32lo_pd
  AVX-512/KNC _mm512_mask_cvtepi32lo_pd
  AVX-512/KNC _mm512_cvtepu32lo_pd
  AVX-512/KNC _mm512_mask_cvtepu32lo_pd
  AVX-512/KNC _mm512_i32extgather_epi32
  AVX-512/KNC _mm512_mask_i32extgather_epi32
  AVX-512/KNC _mm512_i32loextgather_epi64
  AVX-512/KNC _mm512_mask_i32loextgather_epi64
  AVX-512/KNC _mm512_i32extgather_ps
  AVX-512/KNC _mm512_mask_i32extgather_ps
  AVX-512/KNC _mm512_i32loextgather_pd
  AVX-512/KNC _mm512_mask_i32loextgather_pd
  AVX-512/KNC _mm512_i32extscatter_ps
  AVX-512/KNC _mm512_mask_i32extscatter_ps
  AVX-512/KNC _mm512_i32loextscatter_pd
  AVX-512/KNC _mm512_mask_i32loextscatter_pd
  AVX-512/KNC _mm512_i32loextscatter_epi64
  AVX-512/KNC _mm512_mask_i32loextscatter_epi64
  AVX-512/KNC _mm512_cvtpd_pslo
  AVX-512/KNC _mm512_mask_cvtpd_pslo
  AVX-512/KNC _mm512_i32logather_epi64
  AVX-512/KNC _mm512_mask_i32logather_epi64
  AVX-512/KNC _mm512_i32logather_pd
  AVX-512/KNC _mm512_mask_i32logather_pd
  AVX-512/KNC _mm512_i32loscatter_pd
  AVX-512/KNC _mm512_mask_i32loscatter_pd
  AVX-512/KNC _mm512_i32extscatter_epi32
  AVX-512/KNC _mm512_mask_i32extscatter_epi32
  AVX-512/KNC _mm512_prefetch_i32extgather_ps
  AVX-512/KNC _mm512_mask_prefetch_i32extgather_ps
  AVX-512/KNC _mm512_prefetch_i32exts

[Bug target/95471] New: vrndvq_f32 not supported on armv8

2020-06-01 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95471

Bug ID: 95471
   Summary: vrndvq_f32 not supported on armv8
   Product: gcc
   Version: 9.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

With -march=armv8-a -mfpu=neon, arm_neon.h exposes vrndnq_f32, but the
intrinsic used to implement it (__builtin_neon_vrintnv4sf) emits an "error:
this builtin is not supported for this target".

Here is a trivial test case:

  #include 

  #if defined(__ARM_NEON)
  float32x4_t foo(float32x4_t a) {
return vrndnq_f32(a);
  }
  #endif

On Compiler Explorer: https://godbolt.org/z/ThfJQe

Relevant documentation from ARM:
<https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?search=vrndnq_f32>.

[Bug target/95399] [ARM] 32/64-bit vcvtnq_* functions are missing

2020-05-29 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95399

Evan Nemerson  changed:

   What|Removed |Added

  Attachment #48635|0   |1
is obsolete||

--- Comment #4 from Evan Nemerson  ---
Created attachment 48637
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48637=edit
List of functions missing from 32-bit arm_{neon,fp16,bf16}.h

Sure, I just filed #95421 for AArch64.

Thanks for the note about arm_fp16.h and arm_bf16.h; I hadn't realized those
functions were in separate headers.  That bring the total down to 264
functions, of which 236 are present in the AArch64 version.  Here is the
updated list.

[Bug target/95421] [AArch64] Missing NEON functions documented on ARM's web site

2020-05-29 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95421

--- Comment #1 from Evan Nemerson  ---
> Several functions are actually present in arm but not aarch64, I'm guessing
> that will be an easy place to start.  Here is that list:

I pasted the wrong list here; that is actually the list of functions which are
missing from both arm and aarch64.  The attached list is accurate (AFAICT).  It
looks like the majority of functions missing from aarch64 are present in arm
(and vice versa), so hopefully this should be a bit easier to fix than I
thought.

[Bug target/95421] New: [AArch64] Missing NEON functions documented on ARM's web site

2020-05-29 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95421

Bug ID: 95421
   Summary: [AArch64] Missing NEON functions documented on ARM's
web site
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Created attachment 48636
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48636=edit
Full list of missing functions

This is a companion to #95399 (which is for the arm headers instead of
aarch64).

Quite a few functions listed in ARM's documentation
(<https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics>)
don't seem to be included in GCC's AArch64 arm_{neon,bf16,fp16}.h.

The attached list of 253 functions was generated by just grepping
arm_{neon,bf16,fp16}.h for each function in ARM's documentation so it's
possible there are some false positives, but the ones I've checked manually
seem correct. I'm also not sure how accurate ARM's documentation is.

Several functions are actually present in arm but not aarch64, I'm guessing
that will be an easy place to start.  Here is that list:

  vadd_p16
  vadd_p64
  vadd_p8
  vaddq_p128
  vaddq_p16
  vaddq_p64
  vaddq_p8
  vceqq_p64
  vceqz_p64
  vceqzq_p64
  vcvt_high_bf16_f32
  vcvt_low_bf16_f32
  vld2_lane_bf16
  vld2q_lane_bf16
  vld3_lane_bf16
  vld3q_lane_bf16
  vld4_lane_bf16
  vld4q_lane_bf16
  vrndns_f32
  vst2_lane_bf16
  vst2q_lane_bf16
  vst3_lane_bf16
  vst3q_lane_bf16
  vst3q_lane_p8
  vst3q_lane_s8
  vst3q_lane_u8
  vst4_lane_bf16
  vst4q_lane_bf16

[Bug target/95399] [ARM, AArch64] 32/64-bit vcvtnq_* functions are missing

2020-05-29 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95399

--- Comment #2 from Evan Nemerson  ---
Created attachment 48635
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48635=edit
List of functions missing from 32-bit arm_neon.h

You're right, sorry.  I'm not sure why I was thinking that header was shared.

It looks like there are a *lot* more of these.  Based on a quick script to grep
arm_neon.h for each function listed on ARM's web site
()
as v7/A32 or v7/A32/A64 there are 47 functions missing (plus another 108 if you
include the *_x1/2/3/4 functions, and another 145 if you include the
poly/f16/bf16 stuff).

Here is a list of those first 47:

  vcvtaq_s32_f32
  vcvtaq_u32_f32
  vcvta_s32_f32
  vcvta_u32_f32
  vcvtmq_s32_f32
  vcvtmq_u32_f32
  vcvtm_s32_f32
  vcvtm_u32_f32
  vcvtnq_s32_f32
  vcvtnq_u32_f32
  vcvtn_s32_f32
  vcvtn_u32_f32
  vcvtpq_s32_f32
  vcvtpq_u32_f32
  vcvtp_s32_f32
  vcvtp_u32_f32
  vfma_n_f32
  vfmaq_n_f32
  vld2q_dup_f32
  vld2q_dup_s16
  vld2q_dup_s32
  vld2q_dup_s8
  vld2q_dup_u16
  vld2q_dup_u32
  vld2q_dup_u8
  vld3q_dup_f32
  vld3q_dup_s16
  vld3q_dup_s32
  vld3q_dup_s8
  vld3q_dup_u16
  vld3q_dup_u32
  vld3q_dup_u8
  vld4q_dup_f32
  vld4q_dup_s16
  vld4q_dup_s32
  vld4q_dup_s8
  vld4q_dup_u16
  vld4q_dup_u32
  vld4q_dup_u8
  vreinterpretq_f64_u64
  vrndi_f32
  vrndiq_f32
  vrndn_f64
  vrndnq_f64
  vrndns_f32
  vst3q_lane_s8
  vst3q_lane_u8

I'm not sure how reliable ARM's documentation is... I see that there are
several f64 functions in that list, and I always thought those were supposed to
be exclusive to AArch64.  Assuming ARM's documentation is accurate, though, all
the functions I've checked do seem to be legitimately missing (i.e., I haven't
seen any false positives from my script).

I'm attaching the full list (300 functions), not sure how you want me to handle
this.  Should I file separate bugs for each group (i.e., this one could be for
vcvt*, another one for vrnd*, another for vfma*, etc.)?  One for all of them? 
Or just use this bug for all of them?

[Bug target/95399] New: [ARM, AArch64] 32/64-bit vcvtnq_* functions are missing

2020-05-28 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95399

Bug ID: 95399
   Summary: [ARM, AArch64] 32/64-bit vcvtnq_* functions are
missing
   Product: gcc
   Version: 10.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Documentation:
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?search=vcvtnq

Clang supports them: https://godbolt.org/z/xsMfSz

It looks like vcvtnq_s32_f32, vcvtnq_u32_f32, vcvtnq_s64_f64, and
vcvtnq_u64_f64 are all missing, though vcvtnq_s16_f16 and vcvtnq_u16_f16 are
present.

[Bug c/95239] New: Unable to ignore -Wattribute-warning in macro

2020-05-20 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95239

Bug ID: 95239
   Summary: Unable to ignore -Wattribute-warning in macro
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Created attachment 48573
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48573=edit
Test case

I'm trying to create a macro which evaluates an expression while ignoring
warnings generated by the warning attribute.  Basically, a slightly simplified
version of what I want is:

  #define IGNORE_WARNING_ATTR(expr) (__extension__({ \
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wattribute-warning\"") \
int tmp = expr; \
_Pragma("GCC diagnostic pop") \
tmp; \
  }))

However, when I use it I still see the warning.

If I don't use a macro, but instead just do 

int c = (__extension__({
_Pragma("GCC diagnostic push")
_Pragma("GCC diagnostic ignored \"-Wattribute-warning\"")
int tmp = foo(argc);
_Pragma("GCC diagnostic pop")
tmp;
  }));

It works as intended; no warning.

If I use the macro version and preprocess the source file first, then compile
the preprocessed file separately, it works.

If I compile with g++ it works.

Using the attached test case, I get:

$ gcc -E -o warn-pp.c warn.c && gcc -o warn warn-pp.c
$ g++ -o warn warn.c
$ gcc -o warn warn.c
warn.c: In function ‘main’:
warn.c:23:31: warning: call to ‘foo’ declared with attribute warning: Calling
foo [-Wattribute-warning]
   23 |   int b = IGNORE_WARNING_ATTR(foo(argc));
  |   ^
warn.c:4:15: note: in definition of macro ‘IGNORE_WARNING_ATTR’
4 | int tmp = expr; \
  |   ^~~~

[Bug target/95227] New: vec_extract doesn't mark input as used in C++ mode

2020-05-19 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95227

Bug ID: 95227
   Summary: vec_extract doesn't mark input as used in C++ mode
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Pretty straightforward.  With -maltivec -Wunused-but-set-parameter, this works
in C but emits a diagnostic in C++:

  #include 
  int f(vector int b) {
return vec_extract(b, 0);
  }

FWIW, the same problem happens if b is a local variable not a parameter, though
of course it will emit an unused-but-set-variable diagnostic instead.

[Bug target/95144] Many AVX-512 functions take an int instead of unsigned int

2020-05-14 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95144

--- Comment #1 from Evan Nemerson  ---
Godbolt link with corrected flags for MSVC: https://godbolt.org/z/M9sgxe

Sorry about that.

[Bug target/95144] New: Many AVX-512 functions take an int instead of unsigned int

2020-05-14 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95144

Bug ID: 95144
   Summary: Many AVX-512 functions take an int instead of unsigned
int
   Product: gcc
   Version: 10.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

There are a bunch of functions in AVX-512F which, according to Intel's
documentation (https://software.intel.com/sites/landingpage/IntrinsicsGuide/),
take an unsigned integer as an argument but GCC's header has the type as a
signed integer.

This causes problems with -Wsign-conversion.  Here is an example which will
generate a warning with -Wsign-conversion (or https://godbolt.org/z/kTBTAD if
you prefer):

  #include 

  static __m256i foo (__m256i a, unsigned int imm8) {
return _mm256_srai_epi64(a, imm8);
  }

  __m256i bar(__m256i a) {
return foo(a, 7);
  }


AFAICT all the functions which take unsigned imm8 arguments have this problem
in clang.  Here is a quick list:

 * _mm256_mask_slli_epi16
 * _mm256_mask_slli_epi32
 * _mm256_mask_slli_epi64
 * _mm256_mask_srai_epi16
 * _mm256_mask_srai_epi32
 * _mm256_mask_srai_epi64
 * _mm256_mask_srli_epi32
 * _mm256_mask_srli_epi64
 * _mm256_maskz_slli_epi16
 * _mm256_maskz_slli_epi32
 * _mm256_maskz_slli_epi64
 * _mm256_maskz_srai_epi16
 * _mm256_maskz_srai_epi32
 * _mm256_maskz_srai_epi64
 * _mm256_maskz_srli_epi32
 * _mm256_maskz_srli_epi64
 * _mm256_srai_epi64
 * _mm512_mask_slli_epi16
 * _mm512_mask_slli_epi32
 * _mm512_mask_slli_epi64
 * _mm512_mask_srai_epi16
 * _mm512_mask_srai_epi32
 * _mm512_mask_srai_epi64
 * _mm512_mask_srli_epi16
 * _mm512_mask_srli_epi32
 * _mm512_mask_srli_epi64
 * _mm512_maskz_slli_epi16
 * _mm512_maskz_slli_epi32
 * _mm512_maskz_slli_epi64
 * _mm512_maskz_srai_epi16
 * _mm512_maskz_srai_epi32
 * _mm512_maskz_srai_epi64
 * _mm512_maskz_srli_epi32
 * _mm512_maskz_srli_epi64
 * _mm512_slli_epi16
 * _mm512_slli_epi32
 * _mm512_slli_epi64
 * _mm512_srai_epi16
 * _mm512_srai_epi32
 * _mm512_srai_epi64
 * _mm512_srli_epi16
 * _mm512_srli_epi32
 * _mm512_srli_epi64
 * _mm_mask_slli_epi16
 * _mm_mask_slli_epi32
 * _mm_mask_slli_epi64
 * _mm_mask_srai_epi16
 * _mm_mask_srai_epi32
 * _mm_mask_srai_epi64
 * _mm_mask_srli_epi32
 * _mm_mask_srli_epi64
 * _mm_maskz_slli_epi16
 * _mm_maskz_slli_epi32
 * _mm_maskz_slli_epi64
 * _mm_maskz_srai_epi16
 * _mm_maskz_srai_epi32
 * _mm_maskz_srai_epi64
 * _mm_maskz_srli_epi32
 * _mm_maskz_srli_epi64

It looks like clang has the same problem, though ICC and MSVC do not.  Here is
an almost identical bug report I filed against LLVM:
https://bugs.llvm.org/show_bug.cgi?id=45931

[Bug tree-optimization/94482] [8/9 Regression] Inserting into vector with optimization enabled on x86 generates incorrect result

2020-04-09 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94482

--- Comment #25 from Evan Nemerson  ---
Created attachment 48253
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48253=edit
Similar test which fails on armv7

I'm also getting an error on armv7-a for the same original code
()
when compiling with -O1 or above and -fstack-protector-strong.  I'm not sure if
it's the same issue or not; Jakub's test case from comment #12 doesn't abort
with the same target and flags.

I'm attaching a test test case which does trigger the issue on armv7.  If it
would be better to open a new bug just let me know, and if it has already been
fixed sorry for the noise :(

Here is the output from GCC with -v:

Using built-in specs.
COLLECT_GCC=arm-linux-gnueabihf-g++-10
COLLECT_LTO_WRAPPER=/usr/lib/gcc-cross/arm-linux-gnueabihf/10/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Debian 10-20200324-1'
--with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs
--enable-languages=c,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-10 --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/
--enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm
--disable-libquadmath --disable-libquadmath-support --enable-plugin
--enable-default-pie --with-system-zlib --without-target-system-zlib
--enable-multiarch --disable-sjlj-exceptions --with-arch=armv7-a
--with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --disable-werror
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=arm-linux-gnueabihf --program-prefix=arm-linux-gnueabihf-
--includedir=/usr/arm-linux-gnueabihf/include
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.0.1 20200324 (experimental) [master revision
596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536] (Debian
10-20200324-1) 
COLLECT_GCC_OPTIONS='-v' '-Wall' '-Werror' '-O1' '-fstack-protector-strong' 
'-o' 'insert-pp' '-shared-libgcc' '-mfloat-abi=hard' '-mfpu=vfpv3-d16'
'-mthumb' '-mtls-dialect=gnu' '-march=armv7-a+fp'
 /usr/lib/gcc-cross/arm-linux-gnueabihf/10/cc1plus -quiet -v -imultilib .
-imultiarch arm-linux-gnueabihf -D_GNU_SOURCE insert-pp.c -quiet -dumpbase
insert-pp.c -mfloat-abi=hard -mfpu=vfpv3-d16 -mthumb -mtls-dialect=gnu
-march=armv7-a+fp -auxbase insert-pp -O1 -Wall -Werror -version
-fstack-protector-strong -o /tmp/ccwvIVRJ.s
GNU C++14 (Debian 10-20200324-1) version 10.0.1 20200324 (experimental) [master
revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536]
(arm-linux-gnueabihf)
compiled by GNU C version 10.0.1 20200324 (experimental) [master
revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536], GMP
version 6.2.0, MPFR version 4.0.2, MPC version 1.1.0, isl version
isl-0.22.1-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include/arm-linux-gnueabihf"
ignoring nonexistent directory
"/usr/lib/gcc-cross/arm-linux-gnueabihf/10/include-fixed"
#include "..." search starts here:
#include <...> search starts here:

/usr/lib/gcc-cross/arm-linux-gnueabihf/10/../../../../arm-linux-gnueabihf/include/c++/10

/usr/lib/gcc-cross/arm-linux-gnueabihf/10/../../../../arm-linux-gnueabihf/include/c++/10/arm-linux-gnueabihf/.

/usr/lib/gcc-cross/arm-linux-gnueabihf/10/../../../../arm-linux-gnueabihf/include/c++/10/backward
 /usr/lib/gcc-cross/arm-linux-gnueabihf/10/include

/usr/lib/gcc-cross/arm-linux-gnueabihf/10/../../../../arm-linux-gnueabihf/include
 /usr/include/arm-linux-gnueabihf
 /usr/include
End of search list.
GNU C++14 (Debian 10-20200324-1) version 10.0.1 20200324 (experimental) [master
revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536]
(arm-linux-gnueabihf)
compiled by GNU C version 10.0.1 20200324 (experimental) [master
revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536], GMP
version 6.2.0, MPFR version 4.0.2, MPC version 1.1.0, isl version
isl-0.22.1-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: f8090281bdf780936f7dd6668f41be1f
COLLECT_GCC_OPTIONS='-v' '-Wall' '-Werror' '-O1' '-fstack-protector-strong' 
'-o' 'insert-pp' '-shared-libgcc' '-mfloat-abi=hard' '-mfpu=vfpv3-d16'
'-mthumb' '-mtls-dialect=gnu' '-march=armv7-a+fp'

/usr/lib/gcc-cross/arm-linux-gnueabihf/10/../../../../arm-linux-gnueabihf/bin/as
-v -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -meabi=5 -o /tmp/cck1klAL.o
/tmp/ccwvIVRJ.s
GNU assembler version 2.34 (arm-linux-gnueabihf) using BFD version (GNU
Binutils for Debian) 2.34

[Bug tree-optimization/94488] [AArch64] ICE on right shift of V2DImode by DImode shift

2020-04-05 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94488

--- Comment #3 from Evan Nemerson  ---
Thanks for looking into this.

Left shift instead of right also seems to be a problem.  The backtrace is a bit
different, but I figure it's probably the same issue; if not I can open up a
new report.

I actually have something similar in my code with a note that it failed on GCC
≤ 7
().
 My guess is that GCC 7 fails all the time but GCC 8+ requires optimization,
but I don't have convenient access to GCC 7 on AArch64 so I'm not certain.

Here is the output from left shift:

during RTL pass: expand
foo.c: In function ‘foo’:
foo.c:4:12: internal compiler error: in copy_to_mode_reg, at explow.c:632
4 |   return x << y;
  |  ~~^~~~
0x613b07 copy_to_mode_reg(machine_mode, rtx_def*)
../../src/gcc/explow.c:632
0xe19ea3 aarch64_expand_vector_init(rtx_def*, rtx_def*)
../../src/gcc/config/aarch64/aarch64.c:17670
0x10ed6fc ???
../../src/gcc/config/aarch64/aarch64-simd.md:6140
0xa62722 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
../../src/gcc/recog.h:317
0xa62722 expand_vector_broadcast(machine_mode, rtx_def*)
../../src/gcc/optabs.c:438
0xa641b0 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*, rtx_def*,
int, optab_methods)
../../src/gcc/optabs.c:1300
0x83d69f expand_shift_1
../../src/gcc/expmed.c:2624
0x83dce5 expand_variable_shift(tree_code, machine_mode, rtx_def*, tree_node*,
rtx_def*, int)
../../src/gcc/expmed.c:2695
0x85053b expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
../../src/gcc/expr.c:9477
0x85725d expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../src/gcc/expr.c:10049
0x75cd2a expand_expr
../../src/gcc/expr.h:282
0x75cd2a expand_return
../../src/gcc/cfgexpand.c:3611
0x75cd2a expand_gimple_stmt_1
../../src/gcc/cfgexpand.c:3720
0x75cd2a expand_gimple_stmt
../../src/gcc/cfgexpand.c:3847
0x7627ea expand_gimple_basic_block
../../src/gcc/cfgexpand.c:5887
0x7627ea execute
../../src/gcc/cfgexpand.c:6542
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/94482] Inserting into vector with optimization enabled on x86 generates incorrect result

2020-04-05 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94482

Evan Nemerson  changed:

   What|Removed |Added

  Attachment #48193|0   |1
is obsolete||

--- Comment #8 from Evan Nemerson  ---
Created attachment 48204
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48204=edit
Reduced test case, ASan/UBSan clean

Here is the reduced test case which works with -fsanitize=address,undefined
-Wno-psabi -Wall -Werror.

This one is self-contained, and instead of using assert the return value is 0
on success and 1 on failure.

[Bug target/94482] Inserting into vector with optimization enabled on x86 generates incorrect result

2020-04-05 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94482

--- Comment #7 from Evan Nemerson  ---
Created attachment 48203
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48203=edit
Non-reduced test case

Thanks for looking into this.

ASan didn't have any issues with the original, non-reduced test.  Here is a
compressed copy.

I'm generating a new reduced version now, checking ASan and UBSan along the way
(as well as using -Wall -Werror to make sure the result compiles cleanly), I'll
upload it as soon as it's ready.

[Bug rtl-optimization/94488] New: [AArch64] ICE from OpenMP SIMD right shift of uint64_t

2020-04-04 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94488

Bug ID: 94488
   Summary: [AArch64] ICE from OpenMP SIMD right shift of uint64_t
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Created attachment 48199
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48199=edit
Test case

On AArch64 with optimizations enabled (-O1 is enough), attempting to
right-shift an unsigned 64-bit value in an OpenMP SIMD loop generates an
internal compiler error.

This happens on at least GCC 9 and 10, and I've tried it cross-compiling to
AArch64 and natively (on a Raspberry Pi running Fedora 31 with gcc 9.3.1).

I'm attaching a test case.  Here is the full output from attempting to compile
it with `aarch64-linux-gnu-gcc-10 -v -fopenmp-simd -O2 -c -o test.o srl.c`:

Using built-in specs.
COLLECT_GCC=aarch64-linux-gnu-gcc-10
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 10-20200324-1'
--with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs
--enable-languages=c,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-10 --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/
--enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-libquadmath --disable-libquadmath-support --enable-plugin
--enable-default-pie --with-system-zlib --without-target-system-zlib
--enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu-
--includedir=/usr/aarch64-linux-gnu/include
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.0.1 20200324 (experimental) [master revision
596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536] (Debian
10-20200324-1) 
COLLECT_GCC_OPTIONS='-v' '-fopenmp-simd' '-O2' '-c' '-o' 'test.o'
'-mlittle-endian' '-mabi=lp64'
 /usr/lib/gcc-cross/aarch64-linux-gnu/10/cc1 -quiet -v -imultiarch
aarch64-linux-gnu srl.c -quiet -dumpbase srl.c -mlittle-endian -mabi=lp64
-auxbase-strip test.o -O2 -version -fopenmp-simd -fasynchronous-unwind-tables
-o /tmp/ccGROOBh.s
GNU C17 (Debian 10-20200324-1) version 10.0.1 20200324 (experimental) [master
revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536]
(aarch64-linux-gnu)
compiled by GNU C version 10.0.1 20200324 (experimental) [master
revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536], GMP
version 6.2.0, MPFR version 4.0.2, MPC version 1.1.0, isl version
isl-0.22.1-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include/aarch64-linux-gnu"
ignoring nonexistent directory
"/usr/lib/gcc-cross/aarch64-linux-gnu/10/include-fixed"
ignoring nonexistent directory "/usr/include/aarch64-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc-cross/aarch64-linux-gnu/10/include
 /usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include
 /usr/include
End of search list.
GNU C17 (Debian 10-20200324-1) version 10.0.1 20200324 (experimental) [master
revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536]
(aarch64-linux-gnu)
compiled by GNU C version 10.0.1 20200324 (experimental) [master
revision 596c90d3559:023579257f5:906b3eb9df6c577d3f6e9c3ea5c9d7e4d1e90536], GMP
version 6.2.0, MPFR version 4.0.2, MPC version 1.1.0, isl version
isl-0.22.1-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: b59507ef9cd435e859f115f5f55f1a57
during RTL pass: expand
srl.c: In function ‘l’:
srl.c:14:15: internal compiler error: in expand_shift_1, at expmed.c:2654
   14 |   aj.e[i] = ak.e[i] >> k;
  |   ^~
0x613d01 expand_shift_1
../../src/gcc/expmed.c:2654
0x83dce5 expand_variable_shift(tree_code, machine_mode, rtx_def*, tree_node*,
rtx_def*, int)
../../src/gcc/expmed.c:2695
0x85053b expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
../../src/gcc/expr.c:9477
0x85725d expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../src/gcc/expr.c:10049
0x864dc1 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
../../src/gcc/expr.c:8353
0x864dc1 expand_normal
../../src/gcc/expr.h:288
0x864dc1 store_field
../../src/gcc/expr.c:7097

[Bug target/94482] Inserting into vector with optimization enabled on x86 generates incorrect result

2020-04-04 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94482

--- Comment #2 from Evan Nemerson  ---
Created attachment 48195
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48195=edit
Output from cc -v

Sure.  It's just -O2, and of course if you're on x86_64 you'll need to pass
-m32.  For example:

  cc -m32 -O2 -o 94482 94482.c

I've attached the output when adding -v.

If you drop either -m32 or -O2 from the flags, the program runs successfully. 
Otherwise, you'll get an assertion failure:

  94482: 94482.c:46: main: Assertion `r_.i64[0] == 1729' failed.
  Aborted (core dumped)

[Bug target/94482] New: Inserting into vector with optimization enabled on x86 generates incorrect result

2020-04-03 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94482

Bug ID: 94482
   Summary: Inserting into vector with optimization enabled on x86
generates incorrect result
   Product: gcc
   Version: 9.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Created attachment 48193
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48193=edit
Test case

I'm tyring to implement _mm_insert_epi64 without relying on intrinsics.  The
GCC-generated executable fails on x86 (but not x86_64) at -O2 and above. 
AFAICT it works on every other architecture and optimiaztion level I've tried. 
It happens on every version of GCC I've tested (7 - 9.3.0), in both C and C++
modes.

I've attached a test case (generated with C-Reduce, slightly modified to remove
some unnecessary macros) which reproduces the issue.  Line 4 is interesting;
the j field isn't used anywhere but if you remove it the code works
(unfortunately not an option in my project).

Please let me know if you need any additional information.

[Bug c++/94385] New: Internal compiler error for __builtin_convertvector + statement expr

2020-03-28 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94385

Bug ID: 94385
   Summary: Internal compiler error for __builtin_convertvector +
statement expr
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

One of my projects is hitting an internal compiler error on GCC 10.  It works
for g++ 9, and it works in C mode.

Here is a test case:

  #include 

  typedef int32_t vec32 __attribute__((__vector_size__(16)));
  typedef float vecf __attribute__((__vector_size__(16)));

  vec32 foo(vecf bar) {
return (__extension__({
  __builtin_convertvector(bar, vec32);
}));
  }

Compiler Explorer link: https://godbolt.org/z/FMbXgs

[Bug c++/93557] New: __builtin_convertvector doesn't mak input as used

2020-02-03 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93557

Bug ID: 93557
   Summary: __builtin_convertvector doesn't mak input as used
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Created attachment 47771
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47771=edit
test case (pass -Wextra to enable the warnings)

I'm getting some false positives with -Wunused-but-set-parameter and
-Wunused-but-variable which I've narrowed down to __builtin_convertvector. 
There is a quick test case attached, or ot godbolt at
https://godbolt.org/z/RSFLGB

It works as expected with gcc, but with g++ I get:

cv.cc: In function ‘vecf conv(veci)’:
cv.cc:4:16: warning: parameter ‘a’ set but not used
[-Wunused-but-set-parameter]
4 | vecf conv(veci a) {
  |   ~^
cv.cc: In function ‘vecf conv2(veci)’:
cv.cc:11:8: warning: variable ‘tmp’ set but not used
[-Wunused-but-set-variable]
   11 |   veci tmp = a;
  |^~~

In case anyone else finds this, as a (hopefully temporary) workaround I'm
planning to modify my macro wrapper to use a statement expr to create a
temporary variable which I mark as used using the ((void) foo) trick:

#define SIMDE__CONVERT_VECTOR(to, from) ((to) = (__extension__({ \
 __typeof__(from) from_ = (from); \
 ((void) from_); \
 __builtin_convertvector(from_, __typeof__(to)); \
   })))

[Bug target/92502] New: AVX missing _mm256_storeu2_* functions

2019-11-13 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92502

Bug ID: 92502
   Summary: AVX missing _mm256_storeu2_* functions
   Product: gcc
   Version: 9.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

GCC doesn't implement _mm256_storeu2_m128, _mm256_storeu2_m128d, or
_mm256_storeu2_m128i.

It's not hard to work around their absence, but it would be nice to have them
just to match other compilers which support them (clang, ICC, MSVC, PGI…).

FWIW, I'm just doing

  _mm_storeu_ps(lo_addr, _mm256_castps256_ps128(a));
  _mm_storeu_ps(hi_addr, _mm256_extractf128_ps(a, 1));

for the ps version. For pd and si just replace ps with the appropriate
characters.

[Bug c/80502] Provide macro to indicate OpenMP SIMD support

2018-02-20 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80502

--- Comment #3 from Evan Nemerson <e...@coeus-group.com> ---
(In reply to Jakub Jelinek from comment #2)
> _OPENMP_SIMD is a bad idea, that namespace is reserved for OpenMP, so unless
> it shows up in the OpenMP standard, it shouldn't be added.

Fair enough, I'll propose it to the OpenMP people:
http://forum.openmp.org/forum/viewtopic.php?f=23=2031

> Why do you need a macro?  Just use #pragma omp simd etc. unconditionally,
> compilers that don't have support for such pragmas will just ignore those.

Not necessarily; often they'll emit warnings (for GCC, -Wall even includes
-Wunknown-pragmas). I'd much rather use the preprocessor in my code than teach
people to disable warnings.

I need to support alternatives in my code. For example, for SIMDe
(<https://github.com/nemequ/simde>), I try to support OpenMP SIMD and Cilk
Plus, as well as compiler-specific pragmas for GCC (GCC ivdep), ICC (simd), and
clang (clang loop ...), and I'd be happy to add more as necessary. I'd rather
not end up with something like

  #pragma omp simd
  #pragma simd
  #pragma GCC ivdep
  #pragma clang loop vectorize(enable)
  for (...) { ... }

I'd much rather just have a few macros which will expand to the right pragma
based on preprocessor macros. Right now I'm stuck using the much less
expressive ivdep syntax for GCC unless *full* OpenMP support is enabled (or
someone defines a macro manually to indicate OpenMP SIMD support).

[Bug c/80502] New: Provide macro to indicate OpenMP SIMD support

2017-04-24 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80502

Bug ID: 80502
   Summary: Provide macro to indicate OpenMP SIMD support
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Created attachment 41250
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41250=edit
define _OPENMP_SIMD when -fopenmp-simd or -fopenmp is passed

_OPENMP is (correctly) only defined for full OpenMP; when only -fopenmp-simd is
used there doesn't seem to be any way to detect in the preprocessor that the
compiler supports OpenMP SIMD pragmas.

It would be nice if there were a macro to determine whether they are supported
so we could do something like

  #if (defined(_OPENMP) && (_OPENMP >= 201307L)) || \
(defined(_OPENMP_SIMD) && (_OPENMP_SIMD >= 201307L))
  #pragma omp simd
  #endif

AFAIK ICC is the only compiler with a feature like -fopenmp-simd (they call it
-openmp-simd), and they don't define anything either.  I've reported the issue
to them, but in the meantime it seems GCC can choose whatever it wants without
creating any compatibility issues.  Defining _OPENMP_SIMD to the version of
OpenMP supported (just like _OPENMP does) seems logicial to me.  I've attached
a fairly trivial patch.

[Bug c/79518] __builtin_assume_aligned should mark argument as aligned

2017-02-15 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79518

--- Comment #4 from Evan Nemerson <e...@coeus-group.com> ---
I agree that GCC's implementation makes more sense, but unfortunately it makes
it hard to write portable code.

I'm not suggesting the current behavior be abandoned, only that
__builtin_assume_aligned be enhanced so the argument is marked as aligned when
possible.

Think of it as an optimization opportunity; given something like

  void* foo = __builtin_assume_aligned(bar, 16);
  /* Use bar for something instead of foo */

GCC could see that bar is 16-byte aligned and generate better code accordingly,
even though the programmer did something dumb.

I won't reopen this, but I hope you will…  AFAICT there aren't any
disadvantages, and it would be very helpful for people with code already
optimized using MSVC, ICC, etc.

[Bug c/79518] __builtin_assume_aligned should mark argument as aligned

2017-02-14 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79518

--- Comment #2 from Evan Nemerson <e...@coeus-group.com> ---
(In reply to Andrew Pinski from comment #1)
> Why can't you use:
> #define __assume_aligned(arg, align)  arg = __builtin_assume_aligned
> (arg, align)
> 
> ?

arg may be read-only.  void* const arg = ...; __assume_aligned(arg, 16)

arg is also evaluated twice, though I'm not sure how much of a problem that
would be in this situation, or how helpful __assume_aligned would be for other
compilers where it would matter…

[Bug c/79518] New: __builtin_assume_aligned should mark argument as aligned

2017-02-14 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79518

Bug ID: 79518
   Summary: __builtin_assume_aligned should mark argument as
aligned
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

TL;DR: It would be very helpful if __builtin_assume_aligned() would mark its
first argument as aligned (assuming it represents a variable).

I'm trying to implement a macro to abstract away compiler differences, but
__builtin_assume_aligned works very differently than other compiler's
alternatives, and it's making it impossible to support GCC.

ICC's __assume_aligned marks the argument as aligned:

__assume_aligned(arg, 16)
/* Compiler knows arg is 16-byte aligned */

MSVC wants you to use __assume, which is ugly, but can be made to work the same
way (people often wrap it up in a macro that looks like the Intel version):

__assumechar*) arg) - ((char*) 0)) % (16) == 0)
/* Compiler knows arg is 16-byte aligned */

clang can also (I believe) be made to work similarly.  They have an internal
macro
(http://llvm.org/docs/doxygen/html/Compiler_8h.html#a2fd576fb00a760ba803c8a171bff051a)
called LLVM_ASSUME_ALIGNED which is defined as

# define LLVM_ASSUME_ALIGNED(p, a) \
   (((uintptr_t(p) % (a)) == 0) ? (p) : (LLVM_BUILTIN_UNREACHABLE,
(p)))

This doesn't seem to have an effect on GCC (using __builtin_unreachable and a C
style cast), though.  FWIW, I'd consider making that work to be an acceptable
solution, too, though I think improving __builtin_assume_aligned would be much
better as it would be more discoverable.

Unfortunately, it's not easy to smooth out this particular difference.  There
are some details at <https://github.com/nemequ/hedley/issues/1>, but basically
I can't find a good solution which works with both styles, mostly due to the
standard multiple-evaluation of macro arguments SNAFU.

[Bug c/67914] New: Unrecognized command line argument warning not shown unless there is another warning for -Wno-*

2015-10-09 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67914

Bug ID: 67914
   Summary: Unrecognized command line argument warning not shown
unless there is another warning for -Wno-*
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

If you pass an unrecognized -Wno-* option to gcc (or g++) with -Wall -Wextra
also set, you should get a warning, but that seems to only be true if something
*else* emits a warning first.  This makes it difficult to check in a build
system whether or not a flag is supported.

nemequ@hoplite:~/t$ cat t.c
int main(void)
{
#if defined(TRIGGER_A_DIFFERENT_WARNING)
  int unused;
#endif
  return 0;
}
nemequ@hoplite:~/t$ gcc -Wall -Wextra -Werror -What-the-fuck -o t -c t.c
gcc: error: unrecognized command line option ‘-What-the-fuck’
nemequ@hoplite:~/t$ gcc -Wall -Wextra -Werror -Wno-hat-the-fuck -o t -c t.c
nemequ@hoplite:~/t$ gcc -DTRIGGER_A_DIFFERENT_WARNING -Wall -Wextra -Werror
-Wno-hat-the-fuck -o t -c t.c
t.c: In function ‘main’:
t.c:4:7: error: unused variable ‘unused’ [-Werror=unused-variable]
   int unused;
   ^
t.c: At top level:
cc1: error: unrecognized command line option ‘-Wno-hat-the-fuck’ [-Werror]
cc1: all warnings being treated as errors
nemequ@hoplite:~/t$ gcc --version
gcc (GCC) 5.1.1 20150618 (Red Hat 5.1.1-4)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug tree-optimization/65709] [5 Regression] Bad code for LZ4 decompression with -O3 on x86_64

2015-04-09 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709

--- Comment #1 from Evan Nemerson e...@coeus-group.com ---
Created attachment 35267
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35267action=edit
preprocessed test case


[Bug tree-optimization/65709] New: [5 Regression] Bad code for LZ4 decompression with -O3 on x86_64

2015-04-09 Thread e...@coeus-group.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709

Bug ID: 65709
   Summary: [5 Regression] Bad code for LZ4 decompression with -O3
on x86_64
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com

Created attachment 35266
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35266action=edit
Test data

With GCC 5 at -O3 on x86_64 some inputs to LZ4 will generate a segfault when
decompressing.  This doesn't happen with GCC 4.9, or -O2.  It does happen with
-O2 -ftree-loop-vectorize -fvect-cost-model=dynamic.

I have tested gcc-5.0.0-0.21.fc22.x86_64 from Fedora 22, as well as SVN
revision 221940.

To trigger the segfault pass the attached input (sum.lz4) as the only command
line argument to the preprocessed test case.