Fwd: [PATCH i386 AVX512] [22/n] Extend unaligned loads & stores.

2014-08-26 Thread Uros Bizjak
On Tue, Aug 26, 2014 at 9:55 PM, Kirill Yukhin  wrote:
> Hello Uroš,
> On 23 Aug 09:44, Uros Bizjak wrote:
>> On Fri, Aug 22, 2014 at 1:51 PM, Kirill Yukhin  
>> wrote:
>>
>> > This patch extends unaligned loads and stores patterns.
>> At this stage, I'd still prefer simple constraints (the solution,
>> proposed above), even for the price of additional patterns. Looking at
>> the patterns, it is quite hard to calculate final condition for the
>> particular mode/target combo, even without enable attribute and
>> conditional operand constraints/predicates. With the solution above,
>> the complexity is conveniently pushed to mask define_subst attribute.
> In the bottom patch which splits unaligned ld/st patterns.
> Bootstrapped and avx512-regtested on simulator.
>
> gcc/
> * config/i386/sse.md
> (define_mode_iterator VI48_AVX512VL): New.
> (define_mode_iterator VI_UNALIGNED_LOADSTORE): Delete.
> (define_mode_iterator VI_ULOADSTORE_AVX512BW): New.
> (define_mode_iterator VI_ULOADSTORE_AVX512F): Ditto.
> (define_expand "_loaddqu"
> with VI1): Change mode iterator.
> (define_expand "_loaddqu"
> with VI_ULOADSTORE_AVX512BW): New.
> (define_expand "_loaddqu"
> with VI_ULOADSTORE_AVX512F): Ditto.
> (define_insn "*_loaddqu"
> with VI1): Change mode iterator.
> (define_insn "*_loaddqu"
> with VI_ULOADSTORE_AVX512BW): New.
> (define_insn "*_loaddqu"
> with VI_ULOADSTORE_AVX512F): Ditto.
> (define_insn "_storedqu
> with VI1): Change mode iterator.
> (define_insn "_storedqu
> with VI_ULOADSTORE_AVX512BW): New.
> (define_insn "_storedqu
> with VI_ULOADSTORE_AVX512F): Ditto.
> (define_insn "avx512f_storedqu_mask"): Delete.
> (define_insn "_storedqu_mask" with
> VI48_AVX512VL): New.
> (define_insn "_storedqu_mask" with
> VI12_AVX512VL): Ditto.
>
> Is it ok for trunk?

OK with a renamed mode iterator as suggested below.

Thanks,
Uros.


> -(define_mode_iterator VI_UNALIGNED_LOADSTORE
> -  [(V32QI "TARGET_AVX") V16QI
> -   (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")])
> +(define_mode_iterator VI_ULOADSTORE_AVX512BW
> +  [V64QI
> +   V32HI (V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL")])
> +
> +(define_mode_iterator VI_ULOADSTORE_AVX512F
> +  [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
> +   V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])

Please name these two VI_ULOADSTORE_BW_AVX512VL and
VI_ULOADSTORE_F_AVX512VL to be consistent with other names and put
these nearby loadstore patterns.


Re: [PATCH i386 AVX512] [22/n] Extend unaligned loads & stores.

2014-08-26 Thread Kirill Yukhin
Hello Uroš,
On 23 Aug 09:44, Uros Bizjak wrote:
> On Fri, Aug 22, 2014 at 1:51 PM, Kirill Yukhin  
> wrote:
> 
> > This patch extends unaligned loads and stores patterns.
> At this stage, I'd still prefer simple constraints (the solution,
> proposed above), even for the price of additional patterns. Looking at
> the patterns, it is quite hard to calculate final condition for the
> particular mode/target combo, even without enable attribute and
> conditional operand constraints/predicates. With the solution above,
> the complexity is conveniently pushed to mask define_subst attribute.
In the bottom patch which splits unaligned ld/st patterns.
Bootstrapped and avx512-regtested on simulator.

gcc/
* config/i386/sse.md
(define_mode_iterator VI48_AVX512VL): New.
(define_mode_iterator VI_UNALIGNED_LOADSTORE): Delete.
(define_mode_iterator VI_ULOADSTORE_AVX512BW): New.
(define_mode_iterator VI_ULOADSTORE_AVX512F): Ditto.
(define_expand "_loaddqu"
with VI1): Change mode iterator.
(define_expand "_loaddqu"
with VI_ULOADSTORE_AVX512BW): New.
(define_expand "_loaddqu"
with VI_ULOADSTORE_AVX512F): Ditto.
(define_insn "*_loaddqu"
with VI1): Change mode iterator.
(define_insn "*_loaddqu"
with VI_ULOADSTORE_AVX512BW): New.
(define_insn "*_loaddqu"
with VI_ULOADSTORE_AVX512F): Ditto.
(define_insn "_storedqu
with VI1): Change mode iterator.
(define_insn "_storedqu
with VI_ULOADSTORE_AVX512BW): New.
(define_insn "_storedqu
with VI_ULOADSTORE_AVX512F): Ditto.
(define_insn "avx512f_storedqu_mask"): Delete.
(define_insn "_storedqu_mask" with
VI48_AVX512VL): New.
(define_insn "_storedqu_mask" with
VI12_AVX512VL): Ditto.

Is it ok for trunk?

--
Thanks, K


diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 0624582..0245ec4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -235,6 +235,10 @@
 (define_mode_iterator VF_512
   [V16SF V8DF])
 
+(define_mode_iterator VI48_AVX512VL
+  [V16SI (V8SI  "TARGET_AVX512VL") (V4SI  "TARGET_AVX512VL")
+   V8DI  (V4DI  "TARGET_AVX512VL") (V2DI  "TARGET_AVX512VL")])
+
 (define_mode_iterator VF2_AVX512VL
   [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
@@ -259,9 +263,13 @@
 (define_mode_iterator VI1
   [(V32QI "TARGET_AVX") V16QI])
 
-(define_mode_iterator VI_UNALIGNED_LOADSTORE
-  [(V32QI "TARGET_AVX") V16QI
-   (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")])
+(define_mode_iterator VI_ULOADSTORE_AVX512BW
+  [V64QI
+   V32HI (V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL")])
+
+(define_mode_iterator VI_ULOADSTORE_AVX512F
+  [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+   V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
 ;; All DImode vector integer modes
 (define_mode_iterator VI8
@@ -1172,18 +1180,18 @@
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
+/* For AVX, normal *mov_internal pattern will handle unaligned loads
+   just fine if misaligned_operand is true, and without the UNSPEC it can
+   be combined with arithmetic instructions.  If misaligned_operand is
+   false, still emit UNSPEC_LOADU insn to honor user's request for
+   misaligned load.  */
 (define_expand "_loaddqu"
-  [(set (match_operand:VI_UNALIGNED_LOADSTORE 0 "register_operand")
-   (unspec:VI_UNALIGNED_LOADSTORE
- [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand")]
+  [(set (match_operand:VI1 0 "register_operand")
+   (unspec:VI1
+ [(match_operand:VI1 1 "nonimmediate_operand")]
  UNSPEC_LOADU))]
-  "TARGET_SSE2 && "
+  "TARGET_SSE2 &&  && "
 {
-  /* For AVX, normal *mov_internal pattern will handle unaligned loads
- just fine if misaligned_operand is true, and without the UNSPEC it can
- be combined with arithmetic instructions.  If misaligned_operand is
- false, still emit UNSPEC_LOADU insn to honor user's request for
- misaligned load.  */
   if (TARGET_AVX
   && misaligned_operand (operands[1], mode))
 {
@@ -1197,25 +1205,61 @@
 }
 })
 
+(define_expand "_loaddqu"
+  [(set (match_operand:VI_ULOADSTORE_AVX512BW 0 "register_operand")
+   (unspec:VI_ULOADSTORE_AVX512BW
+ [(match_operand:VI_ULOADSTORE_AVX512BW 1 "nonimmediate_operand")]
+ UNSPEC_LOADU))]
+  "TARGET_AVX512BW"
+{
+  if (misaligned_operand (operands[1], mode))
+{
+  rtx src = operands[1];
+  if ()
+   src = gen_rtx_VEC_MERGE (mode, operands[1],
+operands[2 * ],
+operands[3 * ]);
+  emit_insn (gen_rtx_SET (VOIDmode, operands[0], src));
+  DONE;
+}
+})
+
+(define_expand "_loaddqu"
+  [(set (match_operand:VI_ULOADSTORE_AVX512F 0 "register_operand")
+   (unspec:VI_ULOADSTORE_AVX512F
+ [(match_operand:VI_ULOADSTORE_AVX512F 1 "nonimmediate_operand")]
+ UNSPEC_LO

Re: [PATCH i386 AVX512] [22/n] Extend unaligned loads & stores.

2014-08-23 Thread Uros Bizjak
On Fri, Aug 22, 2014 at 1:51 PM, Kirill Yukhin  wrote:

> This patch extends unaligned loads and stores patterns.
>
> I've refactored original patch (stored on SVN's branch)
> toward reducing complexity of conditions in
>define_insn "_storedqu_mask"
>
> It seems like such a trick won't work for:
>_loaddqu
> Problem is V[32|16]QI modes, which enabled for SSE/AVX
> w/o masking and for AVX-512BW & AVX-512VL when masking is
> on.
>
> Of course, I can split the define_insn & define_expand
> into 3 patterns w/ mode iterators of:
>   1. V16QI, V32QI - baseline is SSE2, masks enabled for AVX-512BW&VL
>   2. V64QI, V8HI, V16HI, V32HI - baseline is AVX-512BW, masks enabled
>  for AVX-512VL
>   3. V8DI, V4DI, V2DI, V16SI, V8SI, V4SI - baseline is AVX-512F, masks
>  enabled for AVX-512VL.
>
> But such approach will lead to 6 patterns instead of 2 (with non-trivial
> asm emit). I have doubts if it is useful...

At this stage, I'd still prefer simple constraints (the solution,
proposed above), even for the price of additional patterns. Looking at
the patterns, it is quite hard to calculate final condition for the
particular mode/target combo, even without enable attribute and
conditional operand constraints/predicates. With the solution above,
the complexity is conveniently pushed to mask define_subst attribute.

Uros.


[PATCH i386 AVX512] [22/n] Extend unaligned loads & stores.

2014-08-22 Thread Kirill Yukhin
Hello,
This patch extends unaligned loads and stores patterns.

I've refactored original patch (stored on SVN's branch)
toward reducing complexity of conditions in
   define_insn "_storedqu_mask"

It seems like such a trick won't work for:
   _loaddqu
Problem is V[32|16]QI modes, which enabled for SSE/AVX
w/o masking and for AVX-512BW & AVX-512VL when masking is
on.

Of course, I can split the define_insn & define_expand
into 3 patterns w/ mode iterators of:
  1. V16QI, V32QI - baseline is SSE2, masks enabled for AVX-512BW&VL
  2. V64QI, V8HI, V16HI, V32HI - baseline is AVX-512BW, masks enabled
 for AVX-512VL
  3. V8DI, V4DI, V2DI, V16SI, V8SI, V4SI - baseline is AVX-512F, masks
 enabled for AVX-512VL.

But such approach will lead to 6 patterns instead of 2 (with non-trivial
asm emit). I have doubts if it is useful...


Current patch passess bootstrap and shows now regiressions under
simulator.

What do you think?

gcc/
* config/i386/sse.md
(define_mode_iterator VI48_AVX512VL): New.
(define_mode_iterator VI_UNALIGNED_LOADSTORE): Add V64QI, V32HI, V16HI,
V8HI, V4SI, V4DI, V2DI modes.
(define_expand "_loaddqu"): Update
condition.
(define_insn "*_loaddqu"): Update
condition, handle new modes.
(define_insn "_storedqu"): Handle new modes.
(define_insn "avx512f_storedqu_mask"): Delete.
(define_insn "_storedqu_mask" with
VI48_AVX512VL): New.
(define_insn "_storedqu_mask" with
VI12_AVX512VL): Ditto.

--
Thanks, K


diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index cd0c08e..51cfada 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -235,6 +235,10 @@
 (define_mode_iterator VF_512
   [V16SF V8DF])
 
+(define_mode_iterator VI48_AVX512VL
+  [V16SI (V8SI  "TARGET_AVX512VL") (V4SI  "TARGET_AVX512VL")
+   V8DI  (V4DI  "TARGET_AVX512VL") (V2DI  "TARGET_AVX512VL")])
+
 (define_mode_iterator VF2_AVX512VL
   [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
@@ -260,8 +264,12 @@
   [(V32QI "TARGET_AVX") V16QI])
 
 (define_mode_iterator VI_UNALIGNED_LOADSTORE
-  [(V32QI "TARGET_AVX") V16QI
-   (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")])
+  [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512BW")
+   (V16HI "TARGET_AVX512BW && TARGET_AVX512VL")
+   (V8HI "TARGET_AVX512BW && TARGET_AVX512VL")
+   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")
+   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")])
 
 ;; All DImode vector integer modes
 (define_mode_iterator VI8
@@ -1172,7 +1180,10 @@
(unspec:VI_UNALIGNED_LOADSTORE
  [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand")]
  UNSPEC_LOADU))]
-  "TARGET_SSE2 && "
+  "TARGET_SSE2 
+   && (!
+   || (TARGET_AVX512BW && TARGET_AVX512VL)
+   || (mode != V32QImode && (mode != V16QImode)))"
 {
   /* For AVX, normal *mov_internal pattern will handle unaligned loads
  just fine if misaligned_operand is true, and without the UNSPEC it can
@@ -1197,20 +1208,27 @@
(unspec:VI_UNALIGNED_LOADSTORE
  [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand" "vm")]
  UNSPEC_LOADU))]
-  "TARGET_SSE2 && "
+  "TARGET_SSE2
+   && (!
+   || (TARGET_AVX512BW && TARGET_AVX512VL)
+   || (mode != V32QImode && (mode != V16QImode)))"
 {
   switch (get_attr_mode (insn))
 {
+case MODE_V16SF:
 case MODE_V8SF:
 case MODE_V4SF:
   return "%vmovups\t{%1, %0|%0, %1}";
-case MODE_XI:
-  if (mode == V8DImode)
-   return "vmovdqu64\t{%1, %0|%0, %1}";
-  else
-   return "vmovdqu32\t{%1, %0|%0, %1}";
 default:
-  return "%vmovdqu\t{%1, %0|%0, %1}";
+  switch (mode)
+  {
+  case V32QImode:
+  case V16QImode:
+   if (!(TARGET_AVX512VL && TARGET_AVX512BW))
+ return "%vmovdqu\t{%1, %0|%0, %1}";
+  default:
+   return "vmovdqu\t{%1, 
%0|%0, %1}";
+  }
 }
 }
   [(set_attr "type" "ssemov")
@@ -1246,13 +1264,16 @@
 case MODE_V8SF:
 case MODE_V4SF:
   return "%vmovups\t{%1, %0|%0, %1}";
-case MODE_XI:
-  if (mode == V8DImode)
-   return "vmovdqu64\t{%1, %0|%0, %1}";
-  else
-   return "vmovdqu32\t{%1, %0|%0, %1}";
 default:
-  return "%vmovdqu\t{%1, %0|%0, %1}";
+  switch (mode)
+  {
+  case V32QImode:
+  case V16QImode:
+   if (!(TARGET_AVX512VL && TARGET_AVX512BW))
+ return "%vmovdqu\t{%1, %0|%0, %1}";
+  default:
+ return "vmovdqu\t{%1, %0|%0, %1}";
+  }
 }
 }
   [(set_attr "type" "ssemov")
@@ -1276,21 +1297,32 @@
  ]
  (const_string "")))])
 
-(define_insn "avx512f_storedqu_mask"
-  [(set (match_operand:VI48_512 0 "memory_operand" "=m")
-   (vec_merge:VI48_512
- (unspec:VI48_512
-   [(match_operand:VI48_512 1 "register_operand" "v")]
+(define_insn "_storedqu_mask"
+  [(set (match_