https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65427

            Bug ID: 65427
           Summary: ICE in emit_move_insn with wide vector types
           Product: gcc
           Version: 4.9.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: solar-gcc at openwall dot com

Created attachment 35037
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35037&action=edit
testcase

GCC 4.7.0 through at least 4.9.2 and 5.0 20150215 snapshot (I haven't tested
newer ones) fails with ICE when compiling the attached md5slice.c testcase on
and for Linux x86_64:

$ gcc md5slice.c -o md5slice -O2 -DVECTOR -Wno-attributes -ftree-loop-vectorize
md5slice.c: In function 'GG':
md5slice.c:302:27: internal compiler error: in emit_move_insn, at expr.c:3609
 static MAYBE_INLINE3 void GG(a, b, c, d, x, s, ac)
                           ^
0x6974d2 emit_move_insn(rtx_def*, rtx_def*)
        ../../gcc/expr.c:3608
0x5e5294 expand_gimple_stmt_1
        ../../gcc/cfgexpand.c:3288
0x5e5294 expand_gimple_stmt
        ../../gcc/cfgexpand.c:3322
0x5e589b expand_gimple_basic_block
        ../../gcc/cfgexpand.c:5162
0x5e7b56 gimple_expand_cfg
        ../../gcc/cfgexpand.c:5741
0x5e7b56 execute
        ../../gcc/cfgexpand.c:5961

Without -ftree-loop-vectorize, compilation succeeds.  With -O3, it fails
slightly differently:

$ gcc md5slice.c -o md5slice -O3 -DVECTOR -Wno-attributes 
md5slice.c: In function 'II.constprop':
md5slice.c:328:27: internal compiler error: in emit_move_insn, at expr.c:3609
 static MAYBE_INLINE3 void II(a, b, c, d, x, s, ac)
                           ^
0x6974d2 emit_move_insn(rtx_def*, rtx_def*)
        ../../gcc/expr.c:3608
0x5e5294 expand_gimple_stmt_1
        ../../gcc/cfgexpand.c:3288
0x5e5294 expand_gimple_stmt
        ../../gcc/cfgexpand.c:3322
0x5e589b expand_gimple_basic_block
        ../../gcc/cfgexpand.c:5162
0x5e7b56 gimple_expand_cfg
        ../../gcc/cfgexpand.c:5741
0x5e7b56 execute
        ../../gcc/cfgexpand.c:5961

With -mavx or -mavx2, it succeeds (despite of -O3).

GCC 4.7.0 does not have the -ftree-loop-vectorize option, but a similar problem
is seen with -O3:

$ gcc md5slice.c -o md5slice -O3 -DVECTOR -Wno-attributes
md5slice.c: In function 'GG':
md5slice.c:302:27: internal compiler error: in emit_move_insn, at expr.c:3435

So far, all of this is with:

typedef element vector __attribute__ ((vector_size (32)));

on line 41.  Reducing the vector width to 16 makes the plain SSE2 compilation
succeed with any optimizations.  Conversely, increasing the vector width to 64
makes compilation to fail even with AVX/AVX2 enabled.

Ideally, when the vector type width is in excess of the current target
architecture's native SIMD vector width, GCC should transparently split it into
multiple sub-vectors of the natively supported width.  This is useful not only
for being able to build/use wider-vector source code for/on older CPUs, but
also to hide instruction latencies by having the compiler interleave operations
on the sub-vectors due to the extra parallelism the excessive vector width
provides.  For example, once this is supported 32 could actually work faster
than 16 on SSE2, and 64 faster than 32 on AVX2, for some applications (as long
as the register pressure does not become too high).

Failing that, at least the compiler should report that this is unsupported,
rather than fail with an ICE.

With GCC 4.6.2 and older, the ICE does not occur, for the rather unfortunate
reason that (at least for me) these versions generate scalar code (so ~10x
slower) when the type's vector width exceeds what's supported natively.

Reply via email to