[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

David Spickett via Phabricator via cfe-commits Fri, 16 Apr 2021 04:06:01 -0700

DavidSpickett added a comment.

Ok, it's behind the aarch64 guard because there's a giant #ifdef around it that 
I didn't see. If you move it in `arm_neon.td` down to outside that #ifdef you 
get the definitions. Next problem is that the `poly128_t` type isn't 
implemented for AArch32. Not sure why but for example in the bf16 tests:


  clang/test/CodeGen/arm-bf16-reinterpret-intrinsics.c:
  // TODO: poly128_t not implemented on aarch32
  // CHCK-LABEL: @test_vreinterpretq_p128_bf16

If you split the def in two, one for the non Q, one for the Q parts that can 
work and you get the non poly128_t definitions for Arm and then I can compile 
my test program without modifications to the header.

  --- a/clang/include/clang/Basic/arm_neon.td
  +++ b/clang/include/clang/Basic/arm_neon.td
  @@ -1160,7 +1160,8 @@ def SM4E : SInst<"vsm4e", "...", "QUi">;
   def SM4EKEY : SInst<"vsm4ekey", "...", "QUi">;
   }
  
  -def VADDP   : WInst<"vadd", "...", "PcPsPlQPcQPsQPlQPk">;
  +// TODO: poly128_t isn't implemented for AArch32
  +def VADDP   : WInst<"vadd", "...", "QPcQPsQPlQPk">;
  
   
////////////////////////////////////////////////////////////////////////////////
   // Float -> Int conversions with explicit rounding mode
  @@ -1630,6 +1631,9 @@ def SCALAR_VDUP_LANEQ : IInst<"vdup_laneq", "1QI", 
"ScSsSiSlSfSdSUcSUsSUiSUlSPcS
   }
   }
  
  +// Everything that doesn't use poly128_t
  +def VADDP_Q   : WInst<"vadd", "...", "PcPsPl">;
  +
   // ARMv8.2-A FP16 vector intrinsics for A32/A64.
   let ArchGuard = "defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)" in {

However I ran into issues trying to get the relevant bits of aarc64-poly-add.c 
to compile for Arm and don't have time time to pursue it myself. If you can get 
those to run (or make an arm specific test file, some others do already) then 
all you'd need to do is remove the `vaddq_p128`  in `CGBuiltin.cpp` to prevent 
anyone thinking it is implemented for AArch32. (no one would be able to hit it 
without a modified header anyway)

My conclusion being that this group of intrinsics should be for A32 and A64 as 
the ACLE says. However we can't do all of them on A32 without poly128_t.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100499/new/

https://reviews.llvm.org/D100499

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D100499: [AArch64] Neon Polynomial vadd Intrinsic Fix

Reply via email to