On Wed, Jul 24, 2013 at 8:27 PM, Kirill Yukhin <[email protected]> wrote:
> Hello,
> By this patch I am starting series of patches toward Intel (R) AVX-512 and
> SHA (see [1])
> extensions enabling in GCC.
> I've already submitted corresponding patches to BinUtils (see [2],[3]).
>
> This patch adds comand-line options for avx512* use and relevant cpuid bits
> detection. Vector registers are now 512-bit wide, so support for new modes
> (e.g. V16SF) is added. AVX512F introduce new 16 registers zmm16-zmm31. Some
> instructions are now have EVEX encoding and can now use those new registers
> while
> old instructions can't. We introduce new register class for them. We also add
> new constraint "v" which allows zmm0-zmm31. We can't extend "x" constraint
> because it's exposed in inline asm, and so may break some inline asm if we
> assign e. g. xmm21 to non-evex encodable instruction. Idea is to replace all
> uses of "x" for evex-encodable instructions with "v". And allow only scalar
> and
> 512-bit modes for registers 16+ in ix86_hard_regno_mode_ok. We update move
> instructions to use evex-encodable versions of instructions for AVX512F to
> allow usage of new registers. Main problem is with vector mov<mode>_internal
> in sse.md. In AVX512F we have some instructions reading/writing e. g. ymm16+
> (for exmaple vinsert64x4/vextract64x4),but there in no ymm mov instruction
> with evex encoding, so we have to use insert/extract instead.
Some comments while going through the mega-patch:
+(define_insn "*movxi_internal_avx512f"
+ [(set (match_operand:XI 0 "nonimmediate_operand" "=x,x ,m")
+ (match_operand:XI 1 "vector_move_operand" "C ,xm,x"))]
+ "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+{
+ switch (which_alternative)
+ {
+ case 0:
+ return standard_sse_constant_opcode (insn, operands[1]);
+ case 1:
+ case 2:
+ return "vmovdqu32\t{%1, %0|%0, %1}";
+ default:
+ gcc_unreachable ();
+ }
+}
+ [(set_attr "type" "sselog1,ssemov,ssemov")
+ (set_attr "prefix" "evex")
+ (set_attr "mode" "XI")])
Even with the fact, that the cost of unaligned move is equal to
aligned one, I would rather see these moves split to aligned and
unaligned case, in the same way as movoi_internal_avx.
(define_insn "*mov<mode>_internal"
- [(set (match_operand:V16 0 "nonimmediate_operand" "=x,x ,m")
- (match_operand:V16 1 "nonimmediate_or_sse_const_operand" "C ,xm,x"))]
+ [(set (match_operand:V16 0 "nonimmediate_operand" "=v,v ,m")
+ (match_operand:V16 1 "nonimmediate_or_sse_const_operand" "C ,vm,v"))]
...
+ /* Reg -> reg move is always aligned. Just use wider move. */
+ switch (mode)
+ {
+ case MODE_V8SF:
+ case MODE_V4SF:
+ return "vmovaps\t{%g1, %g0|%g0, %g1}";
+ case MODE_V4DF:
+ case MODE_V2DF:
+ return "%vmovapd\t{%g1, %g0|%g0, %g1}";
No need for % prefix. This is AVX only insn.
+ case MODE_OI:
+ case MODE_TI:
+ return "vmovdqu64\t{%g1, %g0|%g0, %g1}";
vmovdqa64, we are operating on registers only.
+ case MODE_XI:
+ if (<MODE>mode == V8DImode)
+ return "vmovdqu64\t{%1, %0|%0, %1}";
+ else
+ return "vmovdqu32\t{%1, %0|%0, %1}";
Please put mode calculations to mode attribute. It looks that
sseinsnmode is not working correctly here, and needs to be updated to
correctly select OI and XI mode.
Uros.