[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-15 Thread Pavel Iliin via Phabricator via cfe-commits
ilinpv created this revision.
ilinpv added reviewers: samparker, dmgreen, SjoerdMeijer.
Herald added subscribers: cfe-commits, danielkiss, hiraditya, kristof.beyls.
Herald added a project: clang.

FMLA/FMLS 8H duplane indexed patterns added.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45467


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D78252

Files:
  clang/test/CodeGen/aarch64-v8.2a-neon-intrinsics-constrained.c
  llvm/lib/Target/AArch64/AArch64InstrFormats.td
  llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll


Index: llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
===
--- llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
+++ llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
@@ -29,8 +29,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmla v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> 
zeroinitializer
@@ -57,8 +56,7 @@
 ; CHECK:   .Lt_vfmaq_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v1.8h, v2.8h
+; CHECK-NEXT:fmla v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> 
zeroinitializer
@@ -148,9 +146,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:fneg v1.8h, v1.8h
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmls v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <8 x half> , %b
@@ -179,8 +175,7 @@
 ; CHECK:   .Lt_vfmsq_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmls v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmls v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <8 x half> , %b
Index: llvm/lib/Target/AArch64/AArch64InstrFormats.td
===
--- llvm/lib/Target/AArch64/AArch64InstrFormats.td
+++ llvm/lib/Target/AArch64/AArch64InstrFormats.td
@@ -8052,6 +8052,15 @@
 }
 
 multiclass SIMDFPIndexedTiedPatterns {
+  let Predicates = [HasNEON, HasFullFP16] in {
+  // 1 variant for the .8h version: DUPLANE from 128-bit
+  def : Pat<(v8f16 (OpNode (v8f16 V128:$Rd), (v8f16 V128:$Rn),
+   (v8f16 (AArch64duplane16 (v8f16 V128:$Rm),
+VectorIndexS:$idx,
+(!cast(INST # "v8i16_indexed")
+V128:$Rd, V128:$Rn, V128:$Rm, VectorIndexS:$idx)>;
+  } // Predicates = [HasNEON, HasFullFP16]
+
   // 2 variants for the .2s version: DUPLANE from 128-bit and DUP scalar.
   def : Pat<(v2f32 (OpNode (v2f32 V64:$Rd), (v2f32 V64:$Rn),
(AArch64duplane32 (v4f32 V128:$Rm),
Index: clang/test/CodeGen/aarch64-v8.2a-neon-intrinsics-constrained.c
===
--- clang/test/CodeGen/aarch64-v8.2a-neon-intrinsics-constrained.c
+++ clang/test/CodeGen/aarch64-v8.2a-neon-intrinsics-constrained.c
@@ -105,7 +105,7 @@
 // COMMONIR:  [[TMP5:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
 // UNCONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> 
[[TMP4]], <8 x half> [[LANE]], <8 x half> [[TMP5]])
 // CONSTRAINED:   [[FMLA:%.*]] = call <8 x half> 
@llvm.experimental.constrained.fma.v8f16(<8 x half> [[TMP4]], <8 x half> 
[[LANE]], <8 x half> [[TMP5]], metadata !"round.tonearest", metadata 
!"fpexcept.strict")
-// CHECK-ASM: fmla v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.8h
+// CHECK-ASM: fmla v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, 
v{{[0-9]+}}.h[{{[0-9]+}}]
 // COMMONIR:  ret <8 x half> [[FMLA]]
 float16x8_t test_vfmaq_lane_f16(float16x8_t a, float16x8_t b, float16x4_t c) {
   return vfmaq_lane_f16(a, b, c, 3);
@@ -213,7 +213,6 @@
 
 // COMMON-LABEL: test_vfmsq_lane_f16
 // COMMONIR:  [[SUB:%.*]]  = fneg <8 x half> %b
-// CHECK-ASM: fneg v{{[0-9]+}}.8h, v{{[0-9]+}}.8h
 // COMMONIR:  [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>
 // COMMONIR:  [[TMP1:%.*]] = bitcast <8 x half> [[SUB]] to <16 x i8>
 // COMMONIR:  [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>
@@ -223,7 +222,7 @@
 // COMMONIR:  [[TMP5:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
 // UNCONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> 
[[TMP4]], <8 x half> [[LANE]], <8 x half> [[TMP5]])
 // CONSTRAINED:   [[FMLA:%.*]] = call <8 x half> 
@llvm.experimental.constrained.fma.v8f16(<8 x half> [[TMP4]], <8 x half> 
[[LANE]], <8 x half> [[TMP5]], metadata !"round.tonearest", metadata 
!"f

[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-15 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8055
 multiclass SIMDFPIndexedTiedPatterns {
+  let Predicates = [HasNEON, HasFullFP16] in {
+  // 1 variant for the .8h version: DUPLANE from 128-bit

Should we have equal patterns to those below for f32 as well? So using DUP, D 
vector (4xf16) and possibly from a vector_extract too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-17 Thread Pavel Iliin via Phabricator via cfe-commits
ilinpv updated this revision to Diff 258337.
ilinpv marked an inline comment as done.
ilinpv edited the summary of this revision.
ilinpv added a comment.

More patterns added.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252

Files:
  clang/test/CodeGen/aarch64-v8.2a-neon-intrinsics-constrained.c
  llvm/lib/Target/AArch64/AArch64InstrFormats.td
  llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll

Index: llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
===
--- llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
+++ llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
@@ -14,8 +14,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmla v0.4h, v2.4h, v1.4h
+; CHECK-NEXT:fmla v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> zeroinitializer
@@ -29,8 +28,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmla v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> zeroinitializer
@@ -43,8 +41,7 @@
 ; CHECK:   .Lt_vfma_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmla v0.4h, v1.4h, v2.4h
+; CHECK-NEXT:fmla v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> zeroinitializer
@@ -57,8 +54,7 @@
 ; CHECK:   .Lt_vfmaq_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v1.8h, v2.8h
+; CHECK-NEXT:fmla v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> zeroinitializer
@@ -72,8 +68,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $h2 killed $h2 def $q2
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmla v0.4h, v2.4h, v1.4h
+; CHECK-NEXT:fmla v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %vecinit = insertelement <4 x half> undef, half %c, i32 0
@@ -88,8 +83,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $h2 killed $h2 def $q2
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmla v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %vecinit = insertelement <8 x half> undef, half %c, i32 0
@@ -104,7 +98,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:fmadd h0, h1, h2, h0
+; CHECK-NEXT:fmla h0, h1, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %extract = extractelement <4 x half> %c, i32 0
@@ -117,7 +111,7 @@
 ; CHECK:   .Lt_vfmah_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:fmadd h0, h1, h2, h0
+; CHECK-NEXT:fmla h0, h1, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %extract = extractelement <8 x half> %c, i32 0
@@ -131,9 +125,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:fneg v1.4h, v1.4h
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmla v0.4h, v2.4h, v1.4h
+; CHECK-NEXT:fmls v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <4 x half> , %b
@@ -148,9 +140,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:fneg v1.8h, v1.8h
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmls v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <8 x half> , %b
@@ -164,8 +154,7 @@
 ; CHECK:   .Lt_vfms_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmls v0.4h, v2.4h, v1.4h
+; CHECK-NEXT:fmls v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <4 x half> , %b
@@ -179,8 +168,7 @@
 ; CHECK:   .Lt_vfmsq_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmls v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmls v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <8 x half> , %b
@@ -195,9 +183,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// 

[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-17 Thread Pavel Iliin via Phabricator via cfe-commits
ilinpv added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8055
 multiclass SIMDFPIndexedTiedPatterns {
+  let Predicates = [HasNEON, HasFullFP16] in {
+  // 1 variant for the .8h version: DUPLANE from 128-bit

dmgreen wrote:
> Should we have equal patterns to those below for f32 as well? So using DUP, D 
> vector (4xf16) and possibly from a vector_extract too.
I'm worried about performance impact of change fmadd/sub -> fmla/ls in last 
pattern case.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-18 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8055
 multiclass SIMDFPIndexedTiedPatterns {
+  let Predicates = [HasNEON, HasFullFP16] in {
+  // 1 variant for the .8h version: DUPLANE from 128-bit

ilinpv wrote:
> dmgreen wrote:
> > Should we have equal patterns to those below for f32 as well? So using DUP, 
> > D vector (4xf16) and possibly from a vector_extract too.
> I'm worried about performance impact of change fmadd/sub -> fmla/ls in last 
> pattern case.
What performance impact are you worried about?



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8077
+
+  def : Pat<(f16 (OpNode (f16 FPR16:$Rd), (f16 FPR16:$Rn),
+ (vector_extract (v8f16 V128:$Rm), 
VectorIndexS:$idx))),

Do you mean the v4f16 variant of this pattern?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-18 Thread Pavel Iliin via Phabricator via cfe-commits
ilinpv marked an inline comment as not done.
ilinpv added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8055
 multiclass SIMDFPIndexedTiedPatterns {
+  let Predicates = [HasNEON, HasFullFP16] in {
+  // 1 variant for the .8h version: DUPLANE from 128-bit

dmgreen wrote:
> ilinpv wrote:
> > dmgreen wrote:
> > > Should we have equal patterns to those below for f32 as well? So using 
> > > DUP, D vector (4xf16) and possibly from a vector_extract too.
> > I'm worried about performance impact of change fmadd/sub -> fmla/ls in last 
> > pattern case.
> What performance impact are you worried about?
I mean, can fmla/ls take more cycles that fmadd/sub, is it any performance 
improvement of such replacement?



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8077
+
+  def : Pat<(f16 (OpNode (f16 FPR16:$Rd), (f16 FPR16:$Rn),
+ (vector_extract (v8f16 V128:$Rm), 
VectorIndexS:$idx))),

dmgreen wrote:
> Do you mean the v4f16 variant of this pattern?
This pattern exactly replaces fmadd/sub to fmla/ls, so it is questionable 
weather or not this pattern is useful.
v4f16 vector_extract variant has no any test cases at all.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-20 Thread Pavel Iliin via Phabricator via cfe-commits
ilinpv updated this revision to Diff 258865.
ilinpv added a comment.

Patterns corrected, vector_extract tests added.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252

Files:
  clang/test/CodeGen/aarch64-v8.2a-neon-intrinsics-constrained.c
  llvm/lib/Target/AArch64/AArch64InstrFormats.td
  llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll

Index: llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
===
--- llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
+++ llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
@@ -14,8 +14,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmla v0.4h, v2.4h, v1.4h
+; CHECK-NEXT:fmla v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> zeroinitializer
@@ -29,8 +28,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmla v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> zeroinitializer
@@ -43,8 +41,7 @@
 ; CHECK:   .Lt_vfma_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmla v0.4h, v1.4h, v2.4h
+; CHECK-NEXT:fmla v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> zeroinitializer
@@ -57,8 +54,7 @@
 ; CHECK:   .Lt_vfmaq_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v1.8h, v2.8h
+; CHECK-NEXT:fmla v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> zeroinitializer
@@ -72,8 +68,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $h2 killed $h2 def $q2
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmla v0.4h, v2.4h, v1.4h
+; CHECK-NEXT:fmla v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %vecinit = insertelement <4 x half> undef, half %c, i32 0
@@ -88,8 +83,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $h2 killed $h2 def $q2
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmla v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %vecinit = insertelement <8 x half> undef, half %c, i32 0
@@ -104,7 +98,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:fmadd h0, h1, h2, h0
+; CHECK-NEXT:fmla h0, h1, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %extract = extractelement <4 x half> %c, i32 0
@@ -117,7 +111,7 @@
 ; CHECK:   .Lt_vfmah_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:fmadd h0, h1, h2, h0
+; CHECK-NEXT:fmla h0, h1, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %extract = extractelement <8 x half> %c, i32 0
@@ -131,9 +125,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:fneg v1.4h, v1.4h
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmla v0.4h, v2.4h, v1.4h
+; CHECK-NEXT:fmls v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <4 x half> , %b
@@ -148,9 +140,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:fneg v1.8h, v1.8h
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmls v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <8 x half> , %b
@@ -164,8 +154,7 @@
 ; CHECK:   .Lt_vfms_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmls v0.4h, v2.4h, v1.4h
+; CHECK-NEXT:fmls v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <4 x half> , %b
@@ -179,8 +168,7 @@
 ; CHECK:   .Lt_vfmsq_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmls v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmls v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <8 x half> , %b
@@ -195,9 +183,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $h2 killed $h2 def $q2
-; CHECK-NEXT:fneg v1

[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-21 Thread Pavel Iliin via Phabricator via cfe-commits
ilinpv updated this revision to Diff 259008.
ilinpv edited the summary of this revision.
ilinpv added a comment.

v2f32 pattern removed, test added.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252

Files:
  clang/test/CodeGen/aarch64-v8.2a-neon-intrinsics-constrained.c
  llvm/lib/Target/AArch64/AArch64InstrFormats.td
  llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll

Index: llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
===
--- llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
+++ llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
@@ -14,8 +14,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmla v0.4h, v2.4h, v1.4h
+; CHECK-NEXT:fmla v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> zeroinitializer
@@ -29,8 +28,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmla v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> zeroinitializer
@@ -43,8 +41,7 @@
 ; CHECK:   .Lt_vfma_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmla v0.4h, v1.4h, v2.4h
+; CHECK-NEXT:fmla v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> zeroinitializer
@@ -57,8 +54,7 @@
 ; CHECK:   .Lt_vfmaq_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v1.8h, v2.8h
+; CHECK-NEXT:fmla v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %lane1 = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> zeroinitializer
@@ -72,8 +68,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $h2 killed $h2 def $q2
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmla v0.4h, v2.4h, v1.4h
+; CHECK-NEXT:fmla v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %vecinit = insertelement <4 x half> undef, half %c, i32 0
@@ -88,8 +83,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $h2 killed $h2 def $q2
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmla v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %vecinit = insertelement <8 x half> undef, half %c, i32 0
@@ -104,7 +98,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:fmadd h0, h1, h2, h0
+; CHECK-NEXT:fmla h0, h1, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %extract = extractelement <4 x half> %c, i32 0
@@ -117,7 +111,7 @@
 ; CHECK:   .Lt_vfmah_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:fmadd h0, h1, h2, h0
+; CHECK-NEXT:fmla h0, h1, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %extract = extractelement <8 x half> %c, i32 0
@@ -131,9 +125,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:fneg v1.4h, v1.4h
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmla v0.4h, v2.4h, v1.4h
+; CHECK-NEXT:fmls v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <4 x half> , %b
@@ -148,9 +140,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $d2 killed $d2 def $q2
-; CHECK-NEXT:fneg v1.8h, v1.8h
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmla v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmls v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <8 x half> , %b
@@ -164,8 +154,7 @@
 ; CHECK:   .Lt_vfms_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.4h, v2.h[0]
-; CHECK-NEXT:fmls v0.4h, v2.4h, v1.4h
+; CHECK-NEXT:fmls v0.4h, v1.4h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <4 x half> , %b
@@ -179,8 +168,7 @@
 ; CHECK:   .Lt_vfmsq_laneq_f16$local:
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
-; CHECK-NEXT:dup v2.8h, v2.h[0]
-; CHECK-NEXT:fmls v0.8h, v2.8h, v1.8h
+; CHECK-NEXT:fmls v0.8h, v1.8h, v2.h[0]
 ; CHECK-NEXT:ret
 entry:
   %sub = fsub <8 x half> , %b
@@ -195,9 +183,7 @@
 ; CHECK-NEXT:.cfi_startproc
 ; CHECK-NEXT:  // %bb.0: // %entry
 ; CHECK-NEXT:// kill: def $h2 killed $h2 de

[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-21 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks




Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8094
 V128:$Rm, VectorIndexS:$idx)>;
-  def : Pat<(f32 (OpNode (f32 FPR32:$Rd), (f32 FPR32:$Rn),
- (vector_extract (v2f32 V64:$Rm), VectorIndexS:$idx))),

I was a little surprised when you said we could remove these, but it looks like 
the vector_extract (v2f32) is always converted to a vector_extract (v4f32 
insert_subvector (v2f32)). So I agree, seems Ok to remove. (And if we do run 
into a problem, we can always add it back in).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-21 Thread Pavel Iliin via Phabricator via cfe-commits
ilinpv closed this revision.
ilinpv added a comment.

Committed be881e2831735d6879ee43710f5a4d1c8d50c615 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-21 Thread Ahmed Bougacha via Phabricator via cfe-commits
ab added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8058
+  def : Pat<(v8f16 (OpNode (v8f16 V128:$Rd), (v8f16 V128:$Rn),
+   (AArch64duplane16 (v8f16 V128:$Rm),
+   VectorIndexH:$idx))),

Should this be V128_lo?  I don't think this is encodable for Rm in V16-V31  
(same in the other indexed f16 variants I think)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-22 Thread Pavel Iliin via Phabricator via cfe-commits
ilinpv marked 2 inline comments as done.
ilinpv added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8058
+  def : Pat<(v8f16 (OpNode (v8f16 V128:$Rd), (v8f16 V128:$Rn),
+   (AArch64duplane16 (v8f16 V128:$Rm),
+   VectorIndexH:$idx))),

ab wrote:
> Should this be V128_lo?  I don't think this is encodable for Rm in V16-V31  
> (same in the other indexed f16 variants I think)
Yep, I double checked encoding, you are right. Thank you very much for this. 
Fixed in 4eca1c06a4a9183fcf7bb230d894617caf3cf3be


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-22 Thread Pavel Iliin via Phabricator via cfe-commits
ilinpv added a comment.

Patterns corrected to comply with encoding 
4eca1c06a4a9183fcf7bb230d894617caf3cf3be 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-22 Thread Ahmed Bougacha via Phabricator via cfe-commits
ab added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8058
+  def : Pat<(v8f16 (OpNode (v8f16 V128:$Rd), (v8f16 V128:$Rn),
+   (AArch64duplane16 (v8f16 V128:$Rm),
+   VectorIndexH:$idx))),

ilinpv wrote:
> ab wrote:
> > Should this be V128_lo?  I don't think this is encodable for Rm in V16-V31  
> > (same in the other indexed f16 variants I think)
> Yep, I double checked encoding, you are right. Thank you very much for this. 
> Fixed in 4eca1c06a4a9183fcf7bb230d894617caf3cf3be
Thanks Pavel!  I think this applies to the `AArch64dup` variants too, which 
does entail adding `FPR16Op_lo` and `FPR16_lo` I imagine, and maybe a couple 
more


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-23 Thread Pavel Iliin via Phabricator via cfe-commits
ilinpv marked an inline comment as done.
ilinpv added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8058
+  def : Pat<(v8f16 (OpNode (v8f16 V128:$Rd), (v8f16 V128:$Rn),
+   (AArch64duplane16 (v8f16 V128:$Rm),
+   VectorIndexH:$idx))),

ab wrote:
> ilinpv wrote:
> > ab wrote:
> > > Should this be V128_lo?  I don't think this is encodable for Rm in 
> > > V16-V31  (same in the other indexed f16 variants I think)
> > Yep, I double checked encoding, you are right. Thank you very much for 
> > this. Fixed in 4eca1c06a4a9183fcf7bb230d894617caf3cf3be
> Thanks Pavel!  I think this applies to the `AArch64dup` variants too, which 
> does entail adding `FPR16Op_lo` and `FPR16_lo` I imagine, and maybe a couple 
> more
Oops. Thanks again, fix landed cc457672e628846c20e92c6e0a82896f0d6db031


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits