Fwd: [PATCH i386 AVX512] [22/n] Extend unaligned loads & stores.
On Tue, Aug 26, 2014 at 9:55 PM, Kirill Yukhin wrote: > Hello Uroš, > On 23 Aug 09:44, Uros Bizjak wrote: >> On Fri, Aug 22, 2014 at 1:51 PM, Kirill Yukhin >> wrote: >> >> > This patch extends unaligned loads and stores patterns. >> At this stage, I'd still prefer simple constraints (the solution, >> proposed above), even for the price of additional patterns. Looking at >> the patterns, it is quite hard to calculate final condition for the >> particular mode/target combo, even without enable attribute and >> conditional operand constraints/predicates. With the solution above, >> the complexity is conveniently pushed to mask define_subst attribute. > In the bottom patch which splits unaligned ld/st patterns. > Bootstrapped and avx512-regtested on simulator. > > gcc/ > * config/i386/sse.md > (define_mode_iterator VI48_AVX512VL): New. > (define_mode_iterator VI_UNALIGNED_LOADSTORE): Delete. > (define_mode_iterator VI_ULOADSTORE_AVX512BW): New. > (define_mode_iterator VI_ULOADSTORE_AVX512F): Ditto. > (define_expand "_loaddqu" > with VI1): Change mode iterator. > (define_expand "_loaddqu" > with VI_ULOADSTORE_AVX512BW): New. > (define_expand "_loaddqu" > with VI_ULOADSTORE_AVX512F): Ditto. > (define_insn "*_loaddqu" > with VI1): Change mode iterator. > (define_insn "*_loaddqu" > with VI_ULOADSTORE_AVX512BW): New. > (define_insn "*_loaddqu" > with VI_ULOADSTORE_AVX512F): Ditto. > (define_insn "_storedqu > with VI1): Change mode iterator. > (define_insn "_storedqu > with VI_ULOADSTORE_AVX512BW): New. > (define_insn "_storedqu > with VI_ULOADSTORE_AVX512F): Ditto. > (define_insn "avx512f_storedqu_mask"): Delete. > (define_insn "_storedqu_mask" with > VI48_AVX512VL): New. > (define_insn "_storedqu_mask" with > VI12_AVX512VL): Ditto. > > Is it ok for trunk? OK with a renamed mode iterator as suggested below. Thanks, Uros. > -(define_mode_iterator VI_UNALIGNED_LOADSTORE > - [(V32QI "TARGET_AVX") V16QI > - (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")]) > +(define_mode_iterator VI_ULOADSTORE_AVX512BW > + [V64QI > + V32HI (V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL")]) > + > +(define_mode_iterator VI_ULOADSTORE_AVX512F > + [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") > + V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) Please name these two VI_ULOADSTORE_BW_AVX512VL and VI_ULOADSTORE_F_AVX512VL to be consistent with other names and put these nearby loadstore patterns.
Re: [PATCH i386 AVX512] [22/n] Extend unaligned loads & stores.
Hello Uroš, On 23 Aug 09:44, Uros Bizjak wrote: > On Fri, Aug 22, 2014 at 1:51 PM, Kirill Yukhin > wrote: > > > This patch extends unaligned loads and stores patterns. > At this stage, I'd still prefer simple constraints (the solution, > proposed above), even for the price of additional patterns. Looking at > the patterns, it is quite hard to calculate final condition for the > particular mode/target combo, even without enable attribute and > conditional operand constraints/predicates. With the solution above, > the complexity is conveniently pushed to mask define_subst attribute. In the bottom patch which splits unaligned ld/st patterns. Bootstrapped and avx512-regtested on simulator. gcc/ * config/i386/sse.md (define_mode_iterator VI48_AVX512VL): New. (define_mode_iterator VI_UNALIGNED_LOADSTORE): Delete. (define_mode_iterator VI_ULOADSTORE_AVX512BW): New. (define_mode_iterator VI_ULOADSTORE_AVX512F): Ditto. (define_expand "_loaddqu" with VI1): Change mode iterator. (define_expand "_loaddqu" with VI_ULOADSTORE_AVX512BW): New. (define_expand "_loaddqu" with VI_ULOADSTORE_AVX512F): Ditto. (define_insn "*_loaddqu" with VI1): Change mode iterator. (define_insn "*_loaddqu" with VI_ULOADSTORE_AVX512BW): New. (define_insn "*_loaddqu" with VI_ULOADSTORE_AVX512F): Ditto. (define_insn "_storedqu with VI1): Change mode iterator. (define_insn "_storedqu with VI_ULOADSTORE_AVX512BW): New. (define_insn "_storedqu with VI_ULOADSTORE_AVX512F): Ditto. (define_insn "avx512f_storedqu_mask"): Delete. (define_insn "_storedqu_mask" with VI48_AVX512VL): New. (define_insn "_storedqu_mask" with VI12_AVX512VL): Ditto. Is it ok for trunk? -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 0624582..0245ec4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -235,6 +235,10 @@ (define_mode_iterator VF_512 [V16SF V8DF]) +(define_mode_iterator VI48_AVX512VL + [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") + V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) + (define_mode_iterator VF2_AVX512VL [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) @@ -259,9 +263,13 @@ (define_mode_iterator VI1 [(V32QI "TARGET_AVX") V16QI]) -(define_mode_iterator VI_UNALIGNED_LOADSTORE - [(V32QI "TARGET_AVX") V16QI - (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")]) +(define_mode_iterator VI_ULOADSTORE_AVX512BW + [V64QI + V32HI (V8HI "TARGET_AVX512VL") (V16HI "TARGET_AVX512VL")]) + +(define_mode_iterator VI_ULOADSTORE_AVX512F + [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") + V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) ;; All DImode vector integer modes (define_mode_iterator VI8 @@ -1172,18 +1180,18 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +/* For AVX, normal *mov_internal pattern will handle unaligned loads + just fine if misaligned_operand is true, and without the UNSPEC it can + be combined with arithmetic instructions. If misaligned_operand is + false, still emit UNSPEC_LOADU insn to honor user's request for + misaligned load. */ (define_expand "_loaddqu" - [(set (match_operand:VI_UNALIGNED_LOADSTORE 0 "register_operand") - (unspec:VI_UNALIGNED_LOADSTORE - [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand")] + [(set (match_operand:VI1 0 "register_operand") + (unspec:VI1 + [(match_operand:VI1 1 "nonimmediate_operand")] UNSPEC_LOADU))] - "TARGET_SSE2 && " + "TARGET_SSE2 && && " { - /* For AVX, normal *mov_internal pattern will handle unaligned loads - just fine if misaligned_operand is true, and without the UNSPEC it can - be combined with arithmetic instructions. If misaligned_operand is - false, still emit UNSPEC_LOADU insn to honor user's request for - misaligned load. */ if (TARGET_AVX && misaligned_operand (operands[1], mode)) { @@ -1197,25 +1205,61 @@ } }) +(define_expand "_loaddqu" + [(set (match_operand:VI_ULOADSTORE_AVX512BW 0 "register_operand") + (unspec:VI_ULOADSTORE_AVX512BW + [(match_operand:VI_ULOADSTORE_AVX512BW 1 "nonimmediate_operand")] + UNSPEC_LOADU))] + "TARGET_AVX512BW" +{ + if (misaligned_operand (operands[1], mode)) +{ + rtx src = operands[1]; + if () + src = gen_rtx_VEC_MERGE (mode, operands[1], +operands[2 * ], +operands[3 * ]); + emit_insn (gen_rtx_SET (VOIDmode, operands[0], src)); + DONE; +} +}) + +(define_expand "_loaddqu" + [(set (match_operand:VI_ULOADSTORE_AVX512F 0 "register_operand") + (unspec:VI_ULOADSTORE_AVX512F + [(match_operand:VI_ULOADSTORE_AVX512F 1 "nonimmediate_operand")] + UNSPEC_LO
Re: [PATCH i386 AVX512] [22/n] Extend unaligned loads & stores.
On Fri, Aug 22, 2014 at 1:51 PM, Kirill Yukhin wrote: > This patch extends unaligned loads and stores patterns. > > I've refactored original patch (stored on SVN's branch) > toward reducing complexity of conditions in >define_insn "_storedqu_mask" > > It seems like such a trick won't work for: >_loaddqu > Problem is V[32|16]QI modes, which enabled for SSE/AVX > w/o masking and for AVX-512BW & AVX-512VL when masking is > on. > > Of course, I can split the define_insn & define_expand > into 3 patterns w/ mode iterators of: > 1. V16QI, V32QI - baseline is SSE2, masks enabled for AVX-512BW&VL > 2. V64QI, V8HI, V16HI, V32HI - baseline is AVX-512BW, masks enabled > for AVX-512VL > 3. V8DI, V4DI, V2DI, V16SI, V8SI, V4SI - baseline is AVX-512F, masks > enabled for AVX-512VL. > > But such approach will lead to 6 patterns instead of 2 (with non-trivial > asm emit). I have doubts if it is useful... At this stage, I'd still prefer simple constraints (the solution, proposed above), even for the price of additional patterns. Looking at the patterns, it is quite hard to calculate final condition for the particular mode/target combo, even without enable attribute and conditional operand constraints/predicates. With the solution above, the complexity is conveniently pushed to mask define_subst attribute. Uros.
[PATCH i386 AVX512] [22/n] Extend unaligned loads & stores.
Hello, This patch extends unaligned loads and stores patterns. I've refactored original patch (stored on SVN's branch) toward reducing complexity of conditions in define_insn "_storedqu_mask" It seems like such a trick won't work for: _loaddqu Problem is V[32|16]QI modes, which enabled for SSE/AVX w/o masking and for AVX-512BW & AVX-512VL when masking is on. Of course, I can split the define_insn & define_expand into 3 patterns w/ mode iterators of: 1. V16QI, V32QI - baseline is SSE2, masks enabled for AVX-512BW&VL 2. V64QI, V8HI, V16HI, V32HI - baseline is AVX-512BW, masks enabled for AVX-512VL 3. V8DI, V4DI, V2DI, V16SI, V8SI, V4SI - baseline is AVX-512F, masks enabled for AVX-512VL. But such approach will lead to 6 patterns instead of 2 (with non-trivial asm emit). I have doubts if it is useful... Current patch passess bootstrap and shows now regiressions under simulator. What do you think? gcc/ * config/i386/sse.md (define_mode_iterator VI48_AVX512VL): New. (define_mode_iterator VI_UNALIGNED_LOADSTORE): Add V64QI, V32HI, V16HI, V8HI, V4SI, V4DI, V2DI modes. (define_expand "_loaddqu"): Update condition. (define_insn "*_loaddqu"): Update condition, handle new modes. (define_insn "_storedqu"): Handle new modes. (define_insn "avx512f_storedqu_mask"): Delete. (define_insn "_storedqu_mask" with VI48_AVX512VL): New. (define_insn "_storedqu_mask" with VI12_AVX512VL): Ditto. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index cd0c08e..51cfada 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -235,6 +235,10 @@ (define_mode_iterator VF_512 [V16SF V8DF]) +(define_mode_iterator VI48_AVX512VL + [V16SI (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") + V8DI (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) + (define_mode_iterator VF2_AVX512VL [V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) @@ -260,8 +264,12 @@ [(V32QI "TARGET_AVX") V16QI]) (define_mode_iterator VI_UNALIGNED_LOADSTORE - [(V32QI "TARGET_AVX") V16QI - (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")]) + [(V64QI "TARGET_AVX512BW") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512BW") + (V16HI "TARGET_AVX512BW && TARGET_AVX512VL") + (V8HI "TARGET_AVX512BW && TARGET_AVX512VL") + (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL") + (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL")]) ;; All DImode vector integer modes (define_mode_iterator VI8 @@ -1172,7 +1180,10 @@ (unspec:VI_UNALIGNED_LOADSTORE [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand")] UNSPEC_LOADU))] - "TARGET_SSE2 && " + "TARGET_SSE2 + && (! + || (TARGET_AVX512BW && TARGET_AVX512VL) + || (mode != V32QImode && (mode != V16QImode)))" { /* For AVX, normal *mov_internal pattern will handle unaligned loads just fine if misaligned_operand is true, and without the UNSPEC it can @@ -1197,20 +1208,27 @@ (unspec:VI_UNALIGNED_LOADSTORE [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand" "vm")] UNSPEC_LOADU))] - "TARGET_SSE2 && " + "TARGET_SSE2 + && (! + || (TARGET_AVX512BW && TARGET_AVX512VL) + || (mode != V32QImode && (mode != V16QImode)))" { switch (get_attr_mode (insn)) { +case MODE_V16SF: case MODE_V8SF: case MODE_V4SF: return "%vmovups\t{%1, %0|%0, %1}"; -case MODE_XI: - if (mode == V8DImode) - return "vmovdqu64\t{%1, %0|%0, %1}"; - else - return "vmovdqu32\t{%1, %0|%0, %1}"; default: - return "%vmovdqu\t{%1, %0|%0, %1}"; + switch (mode) + { + case V32QImode: + case V16QImode: + if (!(TARGET_AVX512VL && TARGET_AVX512BW)) + return "%vmovdqu\t{%1, %0|%0, %1}"; + default: + return "vmovdqu\t{%1, %0|%0, %1}"; + } } } [(set_attr "type" "ssemov") @@ -1246,13 +1264,16 @@ case MODE_V8SF: case MODE_V4SF: return "%vmovups\t{%1, %0|%0, %1}"; -case MODE_XI: - if (mode == V8DImode) - return "vmovdqu64\t{%1, %0|%0, %1}"; - else - return "vmovdqu32\t{%1, %0|%0, %1}"; default: - return "%vmovdqu\t{%1, %0|%0, %1}"; + switch (mode) + { + case V32QImode: + case V16QImode: + if (!(TARGET_AVX512VL && TARGET_AVX512BW)) + return "%vmovdqu\t{%1, %0|%0, %1}"; + default: + return "vmovdqu\t{%1, %0|%0, %1}"; + } } } [(set_attr "type" "ssemov") @@ -1276,21 +1297,32 @@ ] (const_string "")))]) -(define_insn "avx512f_storedqu_mask" - [(set (match_operand:VI48_512 0 "memory_operand" "=m") - (vec_merge:VI48_512 - (unspec:VI48_512 - [(match_operand:VI48_512 1 "register_operand" "v")] +(define_insn "_storedqu_mask" + [(set (match_