[Bug target/78007] Important loop from 482.sphinx3 is not vectorized

2016-11-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78007

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Richard Biener  ---
Fixed.

[Bug target/78007] Important loop from 482.sphinx3 is not vectorized

2016-11-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78007

--- Comment #6 from Richard Biener  ---
Author: rguenth
Date: Wed Nov  9 08:19:05 2016
New Revision: 241992

URL: https://gcc.gnu.org/viewcvs?rev=241992&root=gcc&view=rev
Log:
2016-11-09  Richard Biener  

PR tree-optimization/78007
* tree-vect-stmts.c (vectorizable_bswap): New function.
(vectorizable_call): Call vectorizable_bswap for
BUILT_IN_BSWAP{16,32,64} if arguments are not promoted.

* gcc.dg/vect/vect-bswap32.c: Adjust.
* gcc.dg/vect/vect-bswap64.c: Likewise.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/vect/vect-bswap32.c
trunk/gcc/testsuite/gcc.dg/vect/vect-bswap64.c
trunk/gcc/tree-vect-stmts.c

[Bug target/78007] Important loop from 482.sphinx3 is not vectorized

2016-11-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78007

Richard Biener  changed:

   What|Removed |Added

  Attachment #39827|0   |1
is obsolete||

--- Comment #5 from Richard Biener  ---
Created attachment 39990
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39990&action=edit
patch I am testing

[Bug target/78007] Important loop from 482.sphinx3 is not vectorized

2016-11-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78007

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

[Bug target/78007] Important loop from 482.sphinx3 is not vectorized

2016-10-18 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78007

--- Comment #4 from Richard Biener  ---
Probably handling should be moved after
targetm.vectorize.builtin_vectorized_function handling to allow arms
builtin-bswap vectorization via vrev to apply (not sure if its permutation
handling selects vrev for a bswap permutation).

[Bug target/78007] Important loop from 482.sphinx3 is not vectorized

2016-10-18 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78007

--- Comment #3 from Richard Biener  ---
Created attachment 39827
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39827&action=edit
untested patch

Mostly untested prototype.  For -mavx2 we get from the testcase innermost loop

.L6:
vmovdqa (%r9,%rdx), %ymm0
addl$1, %r8d
vperm2i128  $0, %ymm0, %ymm0, %ymm0
vpshufb %ymm1, %ymm0, %ymm0
vmovdqa %ymm0, (%r9,%rdx)
addq$32, %rdx
cmpl%r11d, %r8d
jb  .L6

with -msse4:

.L6:
movdqa  (%rax,%rdx), %xmm0
addl$1, %r8d
pshufb  %xmm1, %xmm0
movaps  %xmm0, (%rax,%rdx)
addq$16, %rdx
cmpl%r10d, %r8d
jb  .L6

not sure if I got the bswap permutation vector constant correct either ;) 
(quick hack)

  vect_load_dst_8.13_63 = MEM[(u32 *)vectp_b.11_61];
  load_dst_8 = *_3;
  _64 = VIEW_CONVERT_EXPR(vect_load_dst_8.13_63);
  _65 = VEC_PERM_EXPR <_64, _64, { 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1,
0 }>;
  _66 = VIEW_CONVERT_EXPR(_65);
  _13 = __builtin_bswap32 (load_dst_8);
  MEM[(u32 *)vectp_b.14_69] = _66;

[Bug target/78007] Important loop from 482.sphinx3 is not vectorized

2016-10-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78007

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-10-17
 Blocks||53947
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
Should be relatively easy to handle with a VIEW_CONVERT, VEC_PERM_EXPR,
VIEW_CONVERT sequence.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug target/78007] Important loop from 482.sphinx3 is not vectorized

2016-10-17 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78007

--- Comment #1 from Yuri Rumyantsev  ---
Created attachment 39821
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39821&action=edit
test-case to reproduce

It is sufficient to compiler it with -Ofast option on x86 platform.