[PATCH] D105268: [X86] AVX512FP16 instructions enabling 5/6

Craig Topper via Phabricator via cfe-commits Thu, 19 Aug 2021 08:23:16 -0700

craig.topper added inline comments.


================
Comment at: clang/include/clang/Basic/BuiltinsX86.def:2014
+
+TARGET_BUILTIN(__builtin_ia32_vfmaddsh3_mask, "V8xV8xV8xV8xUcIi", "ncV:128:", 
"avx512fp16")
+TARGET_BUILTIN(__builtin_ia32_vfmaddsh3_maskz, "V8xV8xV8xV8xUcIi", "ncV:128:", 
"avx512fp16")
----------------
LuoYuanke wrote:
> What does "3" stand for?
The 3 is there because AMD's 4 operand fma used vfmaddss/vfmaddsd. So Intel's 3 
operand used vfmaddss3/vfmaddsd3. That naming is being carried forward here.


================
Comment at: clang/lib/Headers/avx512vlfp16intrin.h:1385
+                                                                  __m128h __C) 
{
+  return (__m128h)__builtin_ia32_selectph_128(
+      (__mmask8)__U,
----------------
LuoYuanke wrote:
> Sorry, I'm confused sometimes we use mask builtin, sometimes we use select 
> builtin. Any guideline on it?
Ideally FP should never use select because it doesn't convey that exceptions 
should be masked for strictfp. But the mistake was already made for 
add/sub/mul/div/fma/etc years ago before strictfp support existed in llvm. fp16 
is intentionally following float/double for consistency.


================
Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:5709
+
+  def int_x86_avx512fp16_vfmadd_ph_512
+      : Intrinsic<[ llvm_v32f16_ty ],
----------------
LuoYuanke wrote:
> I notice there is no builtin bound to this intrinsic. What is it used for?
It is manually selected in CGBuiltin.cpp


================
Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:5727
+                  [ IntrNoMem, ImmArg<ArgIndex<3>> ]>;
+  def int_x86_avx512fp16_vfmadd_f16
+      : Intrinsic<[ llvm_half_ty ],
----------------
LuoYuanke wrote:
> ph?
It's scalar so it shouldn't be ph. This matches int_x86_avx512_vfmadd_f32 and 
int_x86_avx512_vfmadd_f64. They don't use ss/sd because the ss/sd names are 
usually used for intrinsics that 128-bit operands and only modify the lower 
element. int_x86_avx512_vfmadd_f32 and int_x86_avx512_vfmadd_f64 have 
float/double inputs and produce float/double results.


================
Comment at: llvm/test/CodeGen/X86/avx512fp16-fma-commute.ll:9
+
+define half @fma_123_f16(half %x, half %y, half %z) {
+; CHECK-LABEL: fma_123_f16:
----------------
LuoYuanke wrote:
> The name 123 is not the same with the generated instruction (213sh). Is it 
> expected?
123 represents how the 3 arguments to the fucntion are mapped to the 3 
intrinsic arguments that it calls. There are 6 possible permutations which are 
all tested here, but only 3 instruction mnemonics.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105268/new/

https://reviews.llvm.org/D105268

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105268: [X86] AVX512FP16 instructions enabling 5/6

Reply via email to