[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2021-01-27 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: llvm/include/llvm-c/Core.h:163
   LLVMX86_MMXTypeKind,   /**< X86 MMX */
+  LLVMX86_AMXTypeKind,   /**< X86 AMX */
   LLVMTokenTypeKind, /**< Tokens */

MaskRay wrote:
> cuviper wrote:
> > This is a breaking change to the C ABI -- can we move it to the end of the 
> > enum?
> > https://bugs.llvm.org/show_bug.cgi?id=48905
> Done in 6612c2bb68becda5504099b48082c844503c6d4c
@MaskRay, thank you!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2021-01-27 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added inline comments.



Comment at: llvm/include/llvm-c/Core.h:163
   LLVMX86_MMXTypeKind,   /**< X86 MMX */
+  LLVMX86_AMXTypeKind,   /**< X86 AMX */
   LLVMTokenTypeKind, /**< Tokens */

cuviper wrote:
> This is a breaking change to the C ABI -- can we move it to the end of the 
> enum?
> https://bugs.llvm.org/show_bug.cgi?id=48905
Done in 6612c2bb68becda5504099b48082c844503c6d4c


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2021-01-27 Thread Josh Stone via Phabricator via cfe-commits
cuviper added inline comments.



Comment at: llvm/include/llvm-c/Core.h:163
   LLVMX86_MMXTypeKind,   /**< X86 MMX */
+  LLVMX86_AMXTypeKind,   /**< X86 AMX */
   LLVMTokenTypeKind, /**< Tokens */

This is a breaking change to the C ABI -- can we move it to the end of the enum?
https://bugs.llvm.org/show_bug.cgi?id=48905


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-30 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added a comment.

Thank @pengfei and @MaskRay.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-30 Thread Fangrui Song via Phabricator via cfe-commits
MaskRay added a comment.

D93944  fixed an llvm-c-test issue. Note, 
adding new enum members usually requires `check-all` (at least `check-llvm`, 
but Clang may use these enum as well).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-30 Thread Mikael Holmén via Phabricator via cfe-commits
uabelho added inline comments.



Comment at: llvm/include/llvm/IR/Type.h:68
 X86_MMXTyID,   ///< MMX vectors (64 bits, X86 specific)
+X86_AMXTyID,   ///< AMX vectors (8192 bits, X86 specific)
 TokenTyID, ///< Tokens

pengfei wrote:
> uabelho wrote:
> > This addition causes a compilation warning in HexagonTargetObjectFile.cpp:
> > 
> > ```
> > ../lib/Target/Hexagon/HexagonTargetObjectFile.cpp:297:11: error: 
> > enumeration value 'X86_AMXTyID' not handled in switch [-Werror,-Wswitch]
> >   switch (Ty->getTypeID()) {
> >   ^
> > 1 error generated.
> > ```
> > Seen in build bots, e.g. here:
> > http://lab.llvm.org:8011/#/builders/57/builds/2889/steps/6/logs/stdio
> Thanks Mikael for pointing it out. I think we just need to put the type in 
> the switch table. I've posted a patch to fix it. rG16c2067cf212.
Yep, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-30 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/include/llvm/IR/Type.h:68
 X86_MMXTyID,   ///< MMX vectors (64 bits, X86 specific)
+X86_AMXTyID,   ///< AMX vectors (8192 bits, X86 specific)
 TokenTyID, ///< Tokens

uabelho wrote:
> This addition causes a compilation warning in HexagonTargetObjectFile.cpp:
> 
> ```
> ../lib/Target/Hexagon/HexagonTargetObjectFile.cpp:297:11: error: enumeration 
> value 'X86_AMXTyID' not handled in switch [-Werror,-Wswitch]
>   switch (Ty->getTypeID()) {
>   ^
> 1 error generated.
> ```
> Seen in build bots, e.g. here:
> http://lab.llvm.org:8011/#/builders/57/builds/2889/steps/6/logs/stdio
Thanks Mikael for pointing it out. I think we just need to put the type in the 
switch table. I've posted a patch to fix it. rG16c2067cf212.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-30 Thread Mikael Holmén via Phabricator via cfe-commits
uabelho added inline comments.



Comment at: llvm/include/llvm/IR/Type.h:68
 X86_MMXTyID,   ///< MMX vectors (64 bits, X86 specific)
+X86_AMXTyID,   ///< AMX vectors (8192 bits, X86 specific)
 TokenTyID, ///< Tokens

This addition causes a compilation warning in HexagonTargetObjectFile.cpp:

```
../lib/Target/Hexagon/HexagonTargetObjectFile.cpp:297:11: error: enumeration 
value 'X86_AMXTyID' not handled in switch [-Werror,-Wswitch]
  switch (Ty->getTypeID()) {
  ^
1 error generated.
```
Seen in build bots, e.g. here:
http://lab.llvm.org:8011/#/builders/57/builds/2889/steps/6/logs/stdio


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-29 Thread LuoYuanke via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG981a0bd85811: [X86] Add x86_amx type for intel AMX. 
(authored by LuoYuanke).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-intrinsic-chain.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,104 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+; test bitcast x86_amx to <256 x i32>
+define dso_local void @test_user_empty(i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_user_empty(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[BUF:%.*]], i64 [[S:%.*]]) [[ATTR3:#.*]]
+; CHECK-NEXT:ret void
+;
+entry:
+  %t1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
+  %t2 = bitcast x86_amx %t1 to <256 x i32>
+  ret void
+}
+
+; test bitcast <256 x i32> to x86_amx
+define dso_local void @test_user_empty2(<256 x i32> %in) #2 {
+; CHECK-LABEL: @test_user_empty2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t = bitcast <256 x i32> %in to x86_amx
+  ret void
+}
+
+define dso_local <256 x i32> @test_amx_load_bitcast(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_load_bitcast(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = load <256 x i32>, <256 x i32>* [[IN:%.*]], align 64
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast <256 x i32>* [[IN]] to i8*
+; CHECK-NEXT:[[TMP1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[TMP0]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP1]]) [[ATTR3]]
+; CHECK-NEXT:ret <256 x i32> [[T1]]
+;
+entry:
+  %t1 = load <256 x i32>, <256 x i32>* %in, align 64
+  %t2 = bitcast <256 x i32> %t1 to x86_amx
+  call void @llvm.x86.tilestored64.internal(i16 %m, i16 %n, i8* %buf, i64 %s, x86_amx %t2) 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-24 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added a comment.

In D91927#2471140 , @pengfei wrote:

> LGTM. Thanks for the refactors. Maybe better to wait for a few days to see if 
> others have objections.

Thank Pengfei for the review. Sure, I'll wait for a few days.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-24 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei accepted this revision.
pengfei added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks for the refactors. Maybe better to wait for a few days to see if 
others have objections.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-24 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 313669.
LuoYuanke added a comment.

Refine comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-intrinsic-chain.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,104 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+; test bitcast x86_amx to <256 x i32>
+define dso_local void @test_user_empty(i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_user_empty(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[BUF:%.*]], i64 [[S:%.*]]) [[ATTR3:#.*]]
+; CHECK-NEXT:ret void
+;
+entry:
+  %t1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
+  %t2 = bitcast x86_amx %t1 to <256 x i32>
+  ret void
+}
+
+; test bitcast <256 x i32> to x86_amx
+define dso_local void @test_user_empty2(<256 x i32> %in) #2 {
+; CHECK-LABEL: @test_user_empty2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t = bitcast <256 x i32> %in to x86_amx
+  ret void
+}
+
+define dso_local <256 x i32> @test_amx_load_bitcast(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_load_bitcast(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = load <256 x i32>, <256 x i32>* [[IN:%.*]], align 64
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast <256 x i32>* [[IN]] to i8*
+; CHECK-NEXT:[[TMP1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[TMP0]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP1]]) [[ATTR3]]
+; CHECK-NEXT:ret <256 x i32> [[T1]]
+;
+entry:
+  %t1 = load <256 x i32>, <256 x i32>* %in, align 64
+  %t2 = bitcast <256 x i32> %t1 to x86_amx
+  call void @llvm.x86.tilestored64.internal(i16 %m, i16 %n, i8* %buf, i64 %s, x86_amx %t2) #3
+  ret <256 x i32> %t1
+}
+
+define dso_local <256 x i32> @test_amx_bitcast_store(<256 x i32>* %out, i16 %m, i16 %n, i8 *%buf, i64 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-24 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 313663.
LuoYuanke added a comment.

Address Pengfei's comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-intrinsic-chain.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,104 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+; test bitcast x86_amx to <256 x i32>
+define dso_local void @test_user_empty(i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_user_empty(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[BUF:%.*]], i64 [[S:%.*]]) [[ATTR3:#.*]]
+; CHECK-NEXT:ret void
+;
+entry:
+  %t1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
+  %t2 = bitcast x86_amx %t1 to <256 x i32>
+  ret void
+}
+
+; test bitcast <256 x i32> to x86_amx
+define dso_local void @test_user_empty2(<256 x i32> %in) #2 {
+; CHECK-LABEL: @test_user_empty2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t = bitcast <256 x i32> %in to x86_amx
+  ret void
+}
+
+define dso_local <256 x i32> @test_amx_load_bitcast(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_load_bitcast(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = load <256 x i32>, <256 x i32>* [[IN:%.*]], align 64
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast <256 x i32>* [[IN]] to i8*
+; CHECK-NEXT:[[TMP1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[TMP0]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP1]]) [[ATTR3]]
+; CHECK-NEXT:ret <256 x i32> [[T1]]
+;
+entry:
+  %t1 = load <256 x i32>, <256 x i32>* %in, align 64
+  %t2 = bitcast <256 x i32> %t1 to x86_amx
+  call void @llvm.x86.tilestored64.internal(i16 %m, i16 %n, i8* %buf, i64 %s, x86_amx %t2) #3
+  ret <256 x i32> %t1
+}
+
+define dso_local <256 x i32> @test_amx_bitcast_store(<256 x i32>* %out, i16 %m, i16 %n, i8 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-23 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:243
+}
+// If load has mutli-user, duplicate a amx load.
+// %src = load <256 x i32>, <256 x i32>* %addr, align 64

vector



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:312
+  SmallSet DeletedInst;
+  auto DeleteInst = [&](Instruction *Inst) {
+SmallVector DeadIs;

Why we need to recursively delete them? I think delete the nodes in DeadInsts 
is enough.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:124
+  auto *II = cast(Tile);
+  // Tile is output from AMX intrinsic. The first operand of the
+  // intrinsic is row, the second operand of the intrinsic is column.

LuoYuanke wrote:
> pengfei wrote:
> > How about the `Tile` comes from tdpbssd?
> We have a convention, when amx intrinsics define a x86_amx tile the first 2 
> operands is the shape of the defined tile. For tdpbssd, the intrinsics 
> operands are (m, n, k, ...). (m, n) is the shape of the produced tile.
Oh, yes. I missed that. Thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-23 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:124
+  auto *II = cast(Tile);
+  // Tile is output from AMX intrinsic. The first operand of the
+  // intrinsic is row, the second operand of the intrinsic is column.

pengfei wrote:
> How about the `Tile` comes from tdpbssd?
We have a convention, when amx intrinsics define a x86_amx tile the first 2 
operands is the shape of the defined tile. For tdpbssd, the intrinsics operands 
are (m, n, k, ...). (m, n) is the shape of the produced tile.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-23 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added a comment.

In D91927#2470818 , @LuoYuanke wrote:

> In D91927#2469977 , @pengfei wrote:
>
>>> In my test case, it is transformed after Combine redundant instructions.
>>
>> Can we disable it for AMX type? The pointer to AMX type is meaningless and 
>> may result in bad perfomance.
>
> Ok, I'll disable the transform for AMX type.

Good job.




Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:123
 
-  LoadMap[Inst] = std::make_pair(Lo, Hi);
+  auto *Tile = Bitcast->getOperand(0);
+  auto *II = cast(Tile);

Value



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:196
+// %2 = load <256 x i32>, <256 x i32>* %addr, align 1024
+auto *II = dyn_cast(Src);
+if (!II)

How about the `Tile` comes from tdpbssd?



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:47
+  Type *V256I32Ty = VectorType::get(Builder.getInt32Ty(), 256, false);
+  auto AllocaAlignment = DL.getPrefTypeAlign(V256I32Ty);
+  unsigned AllocaAS = DL.getAllocaAddrSpace();

Currently, we don't have HW type for v256i32. I think 64 bytes(512bits) should 
be enough here.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:124
+  auto *II = cast(Tile);
+  // Tile is output from AMX intrinsic. The first operand of the
+  // intrinsic is row, the second operand of the intrinsic is column.

How about the `Tile` comes from tdpbssd?



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:79
+// -->
+// %addr = alloca <256 x i32>, align 1024
+// store <256 x i32> %src, <256 x i32>* %addr, align 1024

LuoYuanke wrote:
> pengfei wrote:
> > Why the alignment not be 64?
> 1024 is conservatives, because vector require the alignment to be the vector 
> size. Here generate vector <256 x i32> load/store.
We don't need to align to 1024. 64 should be enough. The same for below 
comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-23 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 313639.
LuoYuanke added a comment.

Improve the comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-intrinsic-chain.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,103 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+; test bitcast x86_amx to <256 x i32>
+define dso_local void @test_user_empty(i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_user_empty(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
+  %t2 = bitcast x86_amx %t1 to <256 x i32>
+  ret void
+}
+
+; test bitcast <256 x i32> to x86_amx
+define dso_local void @test_user_empty2(<256 x i32> %in) #2 {
+; CHECK-LABEL: @test_user_empty2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t = bitcast <256 x i32> %in to x86_amx
+  ret void
+}
+
+define dso_local <256 x i32> @test_amx_load_bitcast(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_load_bitcast(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = load <256 x i32>, <256 x i32>* [[IN:%.*]], align 64
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast <256 x i32>* [[IN]] to i8*
+; CHECK-NEXT:[[TMP1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[TMP0]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP1]]) [[ATTR3:#.*]]
+; CHECK-NEXT:ret <256 x i32> [[T1]]
+;
+entry:
+  %t1 = load <256 x i32>, <256 x i32>* %in, align 64
+  %t2 = bitcast <256 x i32> %t1 to x86_amx
+  call void @llvm.x86.tilestored64.internal(i16 %m, i16 %n, i8* %buf, i64 %s, x86_amx %t2) #3
+  ret <256 x i32> %t1
+}
+
+define dso_local <256 x i32> @test_amx_bitcast_store(<256 x i32>* %out, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_bitcast_store(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = call x86_amx 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-23 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 313635.
LuoYuanke added a comment.

Address Pengfei's comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-intrinsic-chain.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,103 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+; test bitcast x86_amx to <256 x i32>
+define dso_local void @test_user_empty(i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_user_empty(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
+  %t2 = bitcast x86_amx %t1 to <256 x i32>
+  ret void
+}
+
+; test bitcast <256 x i32> to x86_amx
+define dso_local void @test_user_empty2(<256 x i32> %in) #2 {
+; CHECK-LABEL: @test_user_empty2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t = bitcast <256 x i32> %in to x86_amx
+  ret void
+}
+
+define dso_local <256 x i32> @test_amx_load_bitcast(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_load_bitcast(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = load <256 x i32>, <256 x i32>* [[IN:%.*]], align 64
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast <256 x i32>* [[IN]] to i8*
+; CHECK-NEXT:[[TMP1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[TMP0]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP1]]) [[ATTR3:#.*]]
+; CHECK-NEXT:ret <256 x i32> [[T1]]
+;
+entry:
+  %t1 = load <256 x i32>, <256 x i32>* %in, align 64
+  %t2 = bitcast <256 x i32> %t1 to x86_amx
+  call void @llvm.x86.tilestored64.internal(i16 %m, i16 %n, i8* %buf, i64 %s, x86_amx %t2) #3
+  ret <256 x i32> %t1
+}
+
+define dso_local <256 x i32> @test_amx_bitcast_store(<256 x i32>* %out, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_bitcast_store(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = call x86_amx 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-23 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added a comment.

In D91927#2469977 , @pengfei wrote:

>> In my test case, it is transformed after Combine redundant instructions.
>
> Can we disable it for AMX type? The pointer to AMX type is meaningless and 
> may result in bad perfomance.

Ok, I'll disable the transform for AMX type.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-23 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added a comment.

> In my test case, it is transformed after Combine redundant instructions.

Can we disable it for AMX type? The pointer to AMX type is meaningless and may 
result in bad perfomance.




Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:300
+// If the dst type is <256 x i32>*, it is valid intruction.
+// %0 = bitcast x86_amx* %tile to <256 x i32>*
+// %1 = load <256 x i32>, <256 x i32>* %0, align 64

I don't see any chance this happen. But we still need to handle the x86_amx* 
here if possible, right?
Maybe better to give an assertion for now.
```
cast(Src->getType())->isX86_AMXTy()
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-22 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:265
+// If the dst type is <256 x i32>*, it is valid intruction.
+// %0 = bitcast x86_amx* %tile to <256 x i32>*
+// %1 = load <256 x i32>, <256 x i32>* %0, align 64

pengfei wrote:
> Where's `x86_amx* %tile` from? Shouldn't been transfered to `x86_amx` before 
> this bitcast if it exists?
In my test case, it is transformed after Combine redundant instructions.
```
*** IR Dump After Simplify the CFG ***
define internal fastcc void 
@_ZL12__tile_loaddP15__tile1024i_strPKvm(%struct.__tile1024i_str* nocapture 
%dst) unnamed_addr #4 {
entry:
  %row = getelementptr inbounds %struct.__tile1024i_str, 
%struct.__tile1024i_str* %dst, i64 0, i32 0
  %0 = load i16, i16* %row, align 64, !tbaa !2
  %col = getelementptr inbounds %struct.__tile1024i_str, 
%struct.__tile1024i_str* %dst, i64 0, i32 1
  %1 = load i16, i16* %col, align 2, !tbaa !7
  %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* 
getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 64) 
#6
  %3 = bitcast x86_amx %2 to <256 x i32>
  %tile = getelementptr inbounds %struct.__tile1024i_str, 
%struct.__tile1024i_str* %dst, i64 0, i32 3
  store <256 x i32> %3, <256 x i32>* %tile, align 64, !tbaa !8
  ret void
}
```

To

```
*** IR Dump After Combine redundant instructions ***
; Function Attrs: alwaysinline nounwind uwtable mustprogress
define internal fastcc void 
@_ZL12__tile_loaddP15__tile1024i_strPKvm(%struct.__tile1024i_str* nocapture 
%dst) unnamed_addr #4 {
entry:
  %row = getelementptr inbounds %struct.__tile1024i_str, 
%struct.__tile1024i_str* %dst, i64 0, i32 0
  %0 = load i16, i16* %row, align 64, !tbaa !2
  %col = getelementptr inbounds %struct.__tile1024i_str, 
%struct.__tile1024i_str* %dst, i64 0, i32 1
  %1 = load i16, i16* %col, align 2, !tbaa !7
  %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* 
getelementptr inbounds ([1024 x i8], [1024
x i8]* @buf, i64 0, i64 0), i64 64) #6
  %tile = getelementptr inbounds %struct.__tile1024i_str, 
%struct.__tile1024i_str* %dst, i64 0, i32 3
  %3 = bitcast <256 x i32>* %tile to x86_amx*
  store x86_amx %2, x86_amx* %3, align 64, !tbaa !8
  ret void
}
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-22 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 313458.
LuoYuanke added a comment.

Address Pengfei's comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-intrinsic-chain.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,135 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+define dso_local void @test_amx_store(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_store(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T0:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[BUF:%.*]], i64 [[S:%.*]]) [[ATTR3:#.*]]
+; CHECK-NEXT:[[ADDR:%.*]] = bitcast <256 x i32>* [[IN:%.*]] to x86_amx*
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast x86_amx* [[ADDR]] to i8*
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[TMP0]], i64 64, x86_amx [[T0]])
+; CHECK-NEXT:ret void
+;
+entry:
+  %t0 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
+  %addr = bitcast <256 x i32>* %in to x86_amx*
+  store x86_amx %t0, x86_amx* %addr, align 64
+  ret void
+}
+
+define dso_local void @test_amx_load(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_load(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T0:%.*]] = bitcast <256 x i32>* [[IN:%.*]] to x86_amx*
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast x86_amx* [[T0]] to i8*
+; CHECK-NEXT:[[TMP1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[TMP0]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP1]]) [[ATTR3]]
+; CHECK-NEXT:ret void
+;
+entry:
+  %t0 = bitcast <256 x i32>* %in to x86_amx*
+  %t1 = load x86_amx, x86_amx* %t0, align 64
+  call void @llvm.x86.tilestored64.internal(i16 %m, i16 %n, i8* %buf, i64 %s, x86_amx %t1) #3
+  ret void
+}
+
+; test bitcast x86_amx to <256 x i32>
+define dso_local void @test_user_empty(i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_user_empty(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT: 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-22 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added inline comments.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:264
+  SmallVector DeadInsts;
+  SmallVector DeadBitcasts;
+

pengfei wrote:
> Maybe better to use BitCastInst?
There may be dead load or store instructions.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:280
+  Type *Ty = Bitcast->getType();
+  auto CanonicalizeBitcast = [&]() {
+bool Change = false;

pengfei wrote:
> Can we leave the canonicalize bitcast cases a single patch. It's a bit 
> complex here and I don't think it's a common case.
Ok, I'll create another patch for it.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:193
+// %2 = load <256 x i32>, <256 x i32>* %addr, align 1024
+auto *II = cast(Src);
+Value *Row = II->getOperand(0);

pengfei wrote:
> Is it possible the x86_amx operand isn't from AMX intrinsic, e.g.
> ```
> %src = bitcast <256 x i32> %xxx to x86_amx
> %2 = bitcast x86_amx %src to <256 x i32>
> ```
Good catch. I'll add support for this pattern.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:238
 for (Instruction  : BB) {
-  LoadInst *LD = dyn_cast();
-  // Check load instruction.
-  // %3 = load <256 x i32>, <256 x i32>* %1, align 64
-  if (LD) {
-FixedVectorType *VTy = dyn_cast(Inst.getType());
-if (!IsAMXType(VTy))
-  continue;
-LDSet.insert();
+  if (!dyn_cast())
 continue;

pengfei wrote:
> Better to reuse the cast result, e.g.
> ```
> BitCastInst *BInst = dyn_cast();
> if (!BInst )
> ```
> You can save several `cast()` below.
That's good. Thanks.



Comment at: llvm/test/CodeGen/X86/AMX/amx-across-func.ll:89-91
 attributes #2 = { nounwind uwtable 
"correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" 
"frame-pointer"="none" "less-precise-fpmad"="false" 
"min-legal-vector-width"="8192" "no-infs-fp-math"="false" 
"no-jump-tables"="false" "no-nans-fp-math"="false" 
"no-signed-zeros-fp-math"="false" "no-trapping-math"="true" 
"stack-protector-buffer-size"="8" "target-cpu"="x86-64" 
"target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" 
"tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
 attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" 
"disable-tail-calls"="false" "frame-pointer"="none" 
"less-precise-fpmad"="false" "no-infs-fp-math"="false" 
"no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" 
"no-trapping-math"="true" "stack-protector-buffer-size"="8" 
"target-cpu"="x86-64" 
"target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" 
"tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
 attributes #4 = { nounwind }

pengfei wrote:
> Better to remove these unused attributes. The same to other tests.
I'll create a separate patch to clean the attributes.



Comment at: llvm/test/CodeGen/X86/AMX/amx-type.ll:67
+
+define dso_local void @test_src_add(<256 x i32> %x, <256 x i32> %y, i16 %r, 
i16 %c, i8* %buf, i64 %s) #2 {
+; CHECK-LABEL: @test_src_add(

pengfei wrote:
> For this and the next test, we have chances to optimize to memcpy if we can 
> make sure %s is constant 64.
If the stride is 64 we can transform the code to memcpy. How about do it in 
another patch?



Comment at: llvm/test/CodeGen/X86/AMX/amx-type.ll:103
+
 define dso_local void @test_load(i8* %in, i8* %out) local_unnamed_addr #2 {
 ; CHECK-LABEL: @test_load(

pengfei wrote:
> We don't need to check this case now, right?
It can check the load and store instruction is not transformed if they are not 
participate in amx operation. I prefer to keep the case. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-22 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:264
+  SmallVector DeadInsts;
+  SmallVector DeadBitcasts;
+

Maybe better to use BitCastInst?



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:274
+  if (Bitcast->user_empty()) {
+DeadInsts.push_back(Bitcast);
 continue;

Why don't put it in DeadBitcasts?



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:280
+  Type *Ty = Bitcast->getType();
+  auto CanonicalizeBitcast = [&]() {
+bool Change = false;

Can we leave the canonicalize bitcast cases a single patch. It's a bit complex 
here and I don't think it's a common case.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:420
   }
+  // Delete user first.
+  for (auto *Inst : DeadBitcasts)

This comment is for above code? Better move it up.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-22 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 313326.
LuoYuanke added a comment.

Rebase and fix lit test case failure.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-intrinsic-chain.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,146 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+define dso_local <256 x i32> @test_amx_bitcast(<256 x i32> %in) #2 {
+; CHECK-LABEL: @test_amx_bitcast(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret <256 x i32> [[IN:%.*]]
+;
+entry:
+  %amx = bitcast <256 x i32> %in to x86_amx
+  %vec = bitcast x86_amx %amx to <256 x i32>
+  ret <256 x i32> %vec
+}
+
+define dso_local void @test_amx_store(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_store(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T0:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[BUF:%.*]], i64 [[S:%.*]]) [[ATTR3:#.*]]
+; CHECK-NEXT:[[ADDR:%.*]] = bitcast <256 x i32>* [[IN:%.*]] to x86_amx*
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast x86_amx* [[ADDR]] to i8*
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[TMP0]], i64 64, x86_amx [[T0]])
+; CHECK-NEXT:ret void
+;
+entry:
+  %t0 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
+  %addr = bitcast <256 x i32>* %in to x86_amx*
+  store x86_amx %t0, x86_amx* %addr, align 64
+  ret void
+}
+
+define dso_local void @test_amx_load(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_load(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T0:%.*]] = bitcast <256 x i32>* [[IN:%.*]] to x86_amx*
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast x86_amx* [[T0]] to i8*
+; CHECK-NEXT:[[TMP1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[TMP0]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP1]]) [[ATTR3]]
+; CHECK-NEXT:ret void
+;
+entry:
+  %t0 = bitcast <256 x i32>* %in to x86_amx*
+  %t1 = load x86_amx, x86_amx* 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-22 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 313315.
LuoYuanke added a comment.

Address Pengfei's comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,146 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+define dso_local <256 x i32> @test_amx_bitcast(<256 x i32> %in) #2 {
+; CHECK-LABEL: @test_amx_bitcast(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret <256 x i32> [[IN:%.*]]
+;
+entry:
+  %amx = bitcast <256 x i32> %in to x86_amx
+  %vec = bitcast x86_amx %amx to <256 x i32>
+  ret <256 x i32> %vec
+}
+
+define dso_local void @test_amx_store(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_store(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T0:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[BUF:%.*]], i64 [[S:%.*]]) [[ATTR3:#.*]]
+; CHECK-NEXT:[[ADDR:%.*]] = bitcast <256 x i32>* [[IN:%.*]] to x86_amx*
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast x86_amx* [[ADDR]] to i8*
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[TMP0]], i64 64, x86_amx [[T0]])
+; CHECK-NEXT:ret void
+;
+entry:
+  %t0 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
+  %addr = bitcast <256 x i32>* %in to x86_amx*
+  store x86_amx %t0, x86_amx* %addr, align 64
+  ret void
+}
+
+define dso_local void @test_amx_load(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_load(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T0:%.*]] = bitcast <256 x i32>* [[IN:%.*]] to x86_amx*
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast x86_amx* [[T0]] to i8*
+; CHECK-NEXT:[[TMP1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[TMP0]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP1]]) [[ATTR3]]
+; CHECK-NEXT:ret void
+;
+entry:
+  %t0 = bitcast <256 x i32>* %in to x86_amx*
+  %t1 = load x86_amx, x86_amx* %t0, align 64
+  call void @llvm.x86.tilestored64.internal(i16 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-21 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/lib/IR/ConstantFold.cpp:539
+  if (V->isNullValue() && !DestTy->isX86_MMXTy() && !DestTy->isX86_AMXTy()
+  && opc != Instruction::AddrSpaceCast)
 return Constant::getNullValue(DestTy);

Operation should at the end of the line.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:55
+  switch (II->getIntrinsicID()) {
+  default: {
+Row = II->getArgOperand(0);

I think we'd better to check exceptions. E.g.
```
default:
  llvm_unreachable("");
case Intrinsic::x86_tileloadd64_internal:
case Intrinsic::x86_tdpbssd_internal:
case Intrinsic::x86_tilestored64_internal:
  Row = II->getArgOperand(0);
  Col = II->getArgOperand(1);
  break;
```



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:114
+  Value *Row = nullptr, *Col = nullptr;
+  Use  = *(Bitcast->use_begin());
+  unsigned OpNo = U.getOperandNo();

Why don't check empty like line 157?



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:193
+// %2 = load <256 x i32>, <256 x i32>* %addr, align 1024
+auto *II = cast(Src);
+Value *Row = II->getOperand(0);

Is it possible the x86_amx operand isn't from AMX intrinsic, e.g.
```
%src = bitcast <256 x i32> %xxx to x86_amx
%2 = bitcast x86_amx %src to <256 x i32>
```



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:233
 bool X86LowerAMXType::visit() {
   bool C;
+  SmallVector DeadInsts;

Better move it to line 310.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:238
 for (Instruction  : BB) {
-  LoadInst *LD = dyn_cast();
-  // Check load instruction.
-  // %3 = load <256 x i32>, <256 x i32>* %1, align 64
-  if (LD) {
-FixedVectorType *VTy = dyn_cast(Inst.getType());
-if (!IsAMXType(VTy))
-  continue;
-LDSet.insert();
+  if (!dyn_cast())
 continue;

Better to reuse the cast result, e.g.
```
BitCastInst *BInst = dyn_cast();
if (!BInst )
```
You can save several `cast()` below.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:265
+// If the dst type is <256 x i32>*, it is valid intruction.
+// %0 = bitcast x86_amx* %tile to <256 x i32>*
+// %1 = load <256 x i32>, <256 x i32>* %0, align 64

Where's `x86_amx* %tile` from? Shouldn't been transfered to `x86_amx` before 
this bitcast if it exists?



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:271
+LoadInst *LD = dyn_cast(Src);
+if (!LD || !LD->hasOneUser()) {
+  transformBitcast(cast());

Maybe better to keep a duplicated `load` that calling `transformBitcast`. The 
same for line 285.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:286
+if (!Inst.hasOneUser()) {
+  transformBitcast(cast());
+  DeadInsts.push_back();

Why we need to consider <256 x i32> has more than one use?



Comment at: llvm/test/CodeGen/X86/AMX/amx-across-func.ll:89-91
 attributes #2 = { nounwind uwtable 
"correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" 
"frame-pointer"="none" "less-precise-fpmad"="false" 
"min-legal-vector-width"="8192" "no-infs-fp-math"="false" 
"no-jump-tables"="false" "no-nans-fp-math"="false" 
"no-signed-zeros-fp-math"="false" "no-trapping-math"="true" 
"stack-protector-buffer-size"="8" "target-cpu"="x86-64" 
"target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" 
"tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
 attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" 
"disable-tail-calls"="false" "frame-pointer"="none" 
"less-precise-fpmad"="false" "no-infs-fp-math"="false" 
"no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" 
"no-trapping-math"="true" "stack-protector-buffer-size"="8" 
"target-cpu"="x86-64" 
"target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" 
"tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
 attributes #4 = { nounwind }

Better to remove these unused attributes. The same to other tests.



Comment at: llvm/test/CodeGen/X86/AMX/amx-type.ll:67
+
+define dso_local void @test_src_add(<256 x i32> %x, <256 x i32> %y, i16 %r, 
i16 %c, i8* %buf, i64 %s) #2 {
+; CHECK-LABEL: @test_src_add(

For this and the next test, we have chances to optimize to memcpy if we can 
make sure %s is constant 64.



Comment at: llvm/test/CodeGen/X86/AMX/amx-type.ll:103
+
 define dso_local void @test_load(i8* %in, i8* %out) local_unnamed_addr #2 {
 ; CHECK-LABEL: @test_load(

We don't need to check this case now, right?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-10 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 310800.
LuoYuanke added a comment.

Rebase.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,104 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+define dso_local void @test_amx_store(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_store(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T0:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[BUF:%.*]], i64 [[S:%.*]]) [[ATTR3:#.*]]
+; CHECK-NEXT:[[ADDR:%.*]] = bitcast <256 x i32>* [[IN:%.*]] to x86_amx*
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast x86_amx* [[ADDR]] to i8*
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[TMP0]], i64 64, x86_amx [[T0]])
+; CHECK-NEXT:ret void
+;
+entry:
+  %t0 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
+  %addr = bitcast <256 x i32>* %in to x86_amx*
+  store x86_amx %t0, x86_amx* %addr, align 64
+  ret void
+}
+
+define dso_local void @test_amx_load(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_load(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T0:%.*]] = bitcast <256 x i32>* [[IN:%.*]] to x86_amx*
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast x86_amx* [[T0]] to i8*
+; CHECK-NEXT:[[TMP1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[TMP0]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP1]]) [[ATTR3]]
+; CHECK-NEXT:ret void
+;
+entry:
+  %t0 = bitcast <256 x i32>* %in to x86_amx*
+  %t1 = load x86_amx, x86_amx* %t0, align 64
+  call void @llvm.x86.tilestored64.internal(i16 %m, i16 %n, i8* %buf, i64 %s, x86_amx %t1) #3
+  ret void
+}
+
+; test bitcast x86_amx to <256 x i32>
+define dso_local void @test_user_empty(i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_user_empty(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-12-05 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 309761.
LuoYuanke added a comment.

Avoid generatng constant for x86_amx.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/Analysis/ConstantFolding.cpp
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,104 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+define dso_local void @test_amx_store(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_store(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T0:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[BUF:%.*]], i64 [[S:%.*]]) [[ATTR3:#.*]]
+; CHECK-NEXT:[[ADDR:%.*]] = bitcast <256 x i32>* [[IN:%.*]] to x86_amx*
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast x86_amx* [[ADDR]] to i8*
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[TMP0]], i64 64, x86_amx [[T0]])
+; CHECK-NEXT:ret void
+;
+entry:
+  %t0 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
+  %addr = bitcast <256 x i32>* %in to x86_amx*
+  store x86_amx %t0, x86_amx* %addr, align 64
+  ret void
+}
+
+define dso_local void @test_amx_load(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_load(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T0:%.*]] = bitcast <256 x i32>* [[IN:%.*]] to x86_amx*
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast x86_amx* [[T0]] to i8*
+; CHECK-NEXT:[[TMP1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[TMP0]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP1]]) [[ATTR3]]
+; CHECK-NEXT:ret void
+;
+entry:
+  %t0 = bitcast <256 x i32>* %in to x86_amx*
+  %t1 = load x86_amx, x86_amx* %t0, align 64
+  call void @llvm.x86.tilestored64.internal(i16 %m, i16 %n, i8* %buf, i64 %s, x86_amx %t1) #3
+  ret void
+}
+
+; test bitcast x86_amx to <256 x i32>
+define dso_local void @test_user_empty(i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_user_empty(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = call x86_amx 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-11-28 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 308146.
LuoYuanke added a comment.

Add the handler of "bitcast <256 x i32>* to x86_amx*".
Refactor the code.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/ConstantFold.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,104 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+define dso_local void @test_amx_store(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_store(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T0:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[BUF:%.*]], i64 [[S:%.*]]) [[ATTR3:#.*]]
+; CHECK-NEXT:[[ADDR:%.*]] = bitcast <256 x i32>* [[IN:%.*]] to x86_amx*
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast x86_amx* [[ADDR]] to i8*
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[TMP0]], i64 64, x86_amx [[T0]])
+; CHECK-NEXT:ret void
+;
+entry:
+  %t0 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
+  %addr = bitcast <256 x i32>* %in to x86_amx*
+  store x86_amx %t0, x86_amx* %addr, align 64
+  ret void
+}
+
+define dso_local void @test_amx_load(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_amx_load(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T0:%.*]] = bitcast <256 x i32>* [[IN:%.*]] to x86_amx*
+; CHECK-NEXT:[[TMP0:%.*]] = bitcast x86_amx* [[T0]] to i8*
+; CHECK-NEXT:[[TMP1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.*]], i16 [[N:%.*]], i8* [[TMP0]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP1]]) [[ATTR3]]
+; CHECK-NEXT:ret void
+;
+entry:
+  %t0 = bitcast <256 x i32>* %in to x86_amx*
+  %t1 = load x86_amx, x86_amx* %t0, align 64
+  call void @llvm.x86.tilestored64.internal(i16 %m, i16 %n, i8* %buf, i64 %s, x86_amx %t1) #3
+  ret void
+}
+
+; test bitcast x86_amx to <256 x i32>
+define dso_local void @test_user_empty(i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
+; CHECK-LABEL: @test_user_empty(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[T1:%.*]] = call x86_amx 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-11-24 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 307514.
LuoYuanke marked an inline comment as done.
LuoYuanke added a comment.

Address Craig and Pengfei's comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,70 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+; test bitcast x86_amx to <256 x i32>
+define dso_local void @test_user_empty(x86_amx %in) #2 {
+; CHECK-LABEL: @test_user_empty(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t = bitcast x86_amx %in to <256 x i32>
+  ret void
+}
+
+; test bitcast <256 x i32> to x86_amx
+define dso_local void @test_user_empty2(<256 x i32> %in) #2 {
+; CHECK-LABEL: @test_user_empty2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t = bitcast <256 x i32> %in to x86_amx
+  ret void
+}
+
+define dso_local void @test_src_add(<256 x i32> %x, <256 x i32> %y, i16 %r, i16 %c, i8* %buf, i64 %s) #2 {
+; CHECK-LABEL: @test_src_add(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[TMP0:%.*]] = alloca <256 x i32>, align 1024
+; CHECK-NEXT:[[ADD:%.*]] = add <256 x i32> [[Y:%.*]], [[X:%.*]]
+; CHECK-NEXT:[[TMP1:%.*]] = bitcast <256 x i32>* [[TMP0]] to i8*
+; CHECK-NEXT:store <256 x i32> [[ADD]], <256 x i32>* [[TMP0]], align 1024
+; CHECK-NEXT:[[TMP2:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.*]], i16 [[C:%.*]], i8* [[TMP1]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[R]], i16 [[C]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP2]]) [[ATTR3:#.*]]
+; CHECK-NEXT:ret void
+;
+entry:
+  %add = add <256 x i32> %y, %x
+  %t = bitcast <256 x i32> %add to x86_amx
+  call void @llvm.x86.tilestored64.internal(i16 %r, i16 %c, i8* %buf, i64 %s, x86_amx %t) #3
+  ret void
+}
+
+define dso_local void @test_src_add2(<256 x i32> %x, i16 %r, i16 %c, i8* %buf, i64 %s) #2 {
+; CHECK-LABEL: @test_src_add2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[TMP0:%.*]] = alloca <256 x i32>, align 1024
+; CHECK-NEXT:[[T1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.*]], i16 [[C:%.*]], i8* [[BUF:%.*]], i64 [[S:%.*]]) [[ATTR3]]
+; CHECK-NEXT:

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-11-24 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke marked an inline comment as done.
LuoYuanke added inline comments.



Comment at: llvm/lib/IR/DataLayout.cpp:819
+  case Type::X86_AMXTyID:
+return Align(64);
   default:

pengfei wrote:
> Should be 512 bits?
Yes. It is 512. Thanks.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:72
   LLVMContext  = Builder.getContext();
-  Type *Ty = LD->getType();
-  EVT VT = EVT::getEVT(Ty);
-  EVT HalfVT = VT.getHalfNumVectorElementsVT(Ctx);
-  Type *HalfTy = HalfVT.getTypeForEVT(Ctx);
-
-  Value *Ptr = LD->getPointerOperand();
-  PointerType *HalfPtrTy = HalfTy->getPointerTo(LD->getPointerAddressSpace());
-  Value *HalfPtr = Builder.CreateBitCast(Ptr, HalfPtrTy);
-  // The HW require the alignment for AMX tile is 64, but front-end generate
-  // code for the vector alignment which is the vector size.
-  uint64_t HalfTySize = HalfTy->getPrimitiveSizeInBits().getFixedSize() / 8;
-  Align Alignment = std::min(LD->getAlign(), Align(HalfTySize));
-  auto *Lo =
-  Builder.CreateAlignedLoad(HalfTy, HalfPtr, Alignment, LD->isVolatile());
-
-  HalfPtr = Builder.CreateGEP(HalfTy, HalfPtr, Builder.getInt32(1));
-  auto *Hi =
-  Builder.CreateAlignedLoad(HalfTy, HalfPtr, Alignment, LD->isVolatile());
-
-  LoadMap[Inst] = std::make_pair(Lo, Hi);
-}
-
-bool X86LowerAMXType::visitLD() {
-  if (LDSet.empty())
-return false;
-  for (auto  : LDSet) {
-int Count = 0;
-Value *NewInst = nullptr;
-// The user should be all AMX intrinsics or all LLVM instruction.
-// Don't support it is used by both AMX intrinsics and LLVM instructions.
-for (auto I = Inst->use_begin(), E = Inst->use_end(); I != E;) {
-  Use  = *I++;
-  const IntrinsicInst *II = dyn_cast(U.getUser());
-  if (!II) {
-Count++;
-continue;
-  }
-  if (NewInst)
-continue;
-  Value *Row, *Col;
-  switch (II->getIntrinsicID()) {
-  default:
-report_fatal_error("Non-AMX intrinsic use tile type.");
-break;
-  case Intrinsic::x86_tdpbssd_internal: {
-unsigned OpNo = U.getOperandNo();
-switch (OpNo) {
-case 3:
-  Row = II->getArgOperand(0);
-  Col = II->getArgOperand(1);
-  break;
-case 4:
-  Row = II->getArgOperand(0);
-  Col = II->getArgOperand(2);
-  break;
-case 5:
-  Row = II->getArgOperand(2);
-  Col = II->getArgOperand(1);
-  break;
-}
-break;
-  }
-  case Intrinsic::x86_tilestored64_internal: {
-Row = II->getArgOperand(0);
-Col = II->getArgOperand(1);
-break;
-  }
-  }
-  assert(Count == 0 && "Can NOT mix amx intrinsic and LLVM instruction");
-  // FIXME: The shape def should be ahead of load.
-  IRBuilder<> Builder(Inst);
-  LLVMContext  = Builder.getContext();
-  // Use the maximun column as stride.
-  Value *Stride = Builder.getInt64(64);
-  Value *I8Ptr =
-  Builder.CreateBitCast(Inst->getOperand(0), Type::getInt8PtrTy(Ctx));
-  std::array Args = {Row, Col, I8Ptr, Stride};
-
-  NewInst = Builder.CreateIntrinsic(Intrinsic::x86_tileloadd64_internal,
-None, Args);
-
-  Inst->replaceAllUsesWith(NewInst);
-}
-if (!NewInst)
-  splitLD(Inst);
+  AllocaInst *AllocaAddr = CreateAllocaInst(Builder, Bitcast->getParent());
+  Value *I8Ptr =

craig.topper wrote:
> Shouldn't this be in the function's entry block?
Yes. It is in function's entry block. It is done in line 48 of function 
CreateAllocaInst(). CreateAllocaInst() is actually copied from your code. :)



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:79
+// -->
+// %addr = alloca <256 x i32>, align 1024
+// store <256 x i32> %src, <256 x i32>* %addr, align 1024

pengfei wrote:
> Why the alignment not be 64?
1024 is conservatives, because vector require the alignment to be the vector 
size. Here generate vector <256 x i32> load/store.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-11-24 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:72
   LLVMContext  = Builder.getContext();
-  Type *Ty = LD->getType();
-  EVT VT = EVT::getEVT(Ty);
-  EVT HalfVT = VT.getHalfNumVectorElementsVT(Ctx);
-  Type *HalfTy = HalfVT.getTypeForEVT(Ctx);
-
-  Value *Ptr = LD->getPointerOperand();
-  PointerType *HalfPtrTy = HalfTy->getPointerTo(LD->getPointerAddressSpace());
-  Value *HalfPtr = Builder.CreateBitCast(Ptr, HalfPtrTy);
-  // The HW require the alignment for AMX tile is 64, but front-end generate
-  // code for the vector alignment which is the vector size.
-  uint64_t HalfTySize = HalfTy->getPrimitiveSizeInBits().getFixedSize() / 8;
-  Align Alignment = std::min(LD->getAlign(), Align(HalfTySize));
-  auto *Lo =
-  Builder.CreateAlignedLoad(HalfTy, HalfPtr, Alignment, LD->isVolatile());
-
-  HalfPtr = Builder.CreateGEP(HalfTy, HalfPtr, Builder.getInt32(1));
-  auto *Hi =
-  Builder.CreateAlignedLoad(HalfTy, HalfPtr, Alignment, LD->isVolatile());
-
-  LoadMap[Inst] = std::make_pair(Lo, Hi);
-}
-
-bool X86LowerAMXType::visitLD() {
-  if (LDSet.empty())
-return false;
-  for (auto  : LDSet) {
-int Count = 0;
-Value *NewInst = nullptr;
-// The user should be all AMX intrinsics or all LLVM instruction.
-// Don't support it is used by both AMX intrinsics and LLVM instructions.
-for (auto I = Inst->use_begin(), E = Inst->use_end(); I != E;) {
-  Use  = *I++;
-  const IntrinsicInst *II = dyn_cast(U.getUser());
-  if (!II) {
-Count++;
-continue;
-  }
-  if (NewInst)
-continue;
-  Value *Row, *Col;
-  switch (II->getIntrinsicID()) {
-  default:
-report_fatal_error("Non-AMX intrinsic use tile type.");
-break;
-  case Intrinsic::x86_tdpbssd_internal: {
-unsigned OpNo = U.getOperandNo();
-switch (OpNo) {
-case 3:
-  Row = II->getArgOperand(0);
-  Col = II->getArgOperand(1);
-  break;
-case 4:
-  Row = II->getArgOperand(0);
-  Col = II->getArgOperand(2);
-  break;
-case 5:
-  Row = II->getArgOperand(2);
-  Col = II->getArgOperand(1);
-  break;
-}
-break;
-  }
-  case Intrinsic::x86_tilestored64_internal: {
-Row = II->getArgOperand(0);
-Col = II->getArgOperand(1);
-break;
-  }
-  }
-  assert(Count == 0 && "Can NOT mix amx intrinsic and LLVM instruction");
-  // FIXME: The shape def should be ahead of load.
-  IRBuilder<> Builder(Inst);
-  LLVMContext  = Builder.getContext();
-  // Use the maximun column as stride.
-  Value *Stride = Builder.getInt64(64);
-  Value *I8Ptr =
-  Builder.CreateBitCast(Inst->getOperand(0), Type::getInt8PtrTy(Ctx));
-  std::array Args = {Row, Col, I8Ptr, Stride};
-
-  NewInst = Builder.CreateIntrinsic(Intrinsic::x86_tileloadd64_internal,
-None, Args);
-
-  Inst->replaceAllUsesWith(NewInst);
-}
-if (!NewInst)
-  splitLD(Inst);
+  AllocaInst *AllocaAddr = CreateAllocaInst(Builder, Bitcast->getParent());
+  Value *I8Ptr =

Shouldn't this be in the function's entry block?



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:89
+// TODO we can pick an constant operand for the shape.
+auto *Row = AMXIntrinsic->getOperand(0);
+auto *Col = AMXIntrinsic->getOperand(1);

Just use Value. auto doesn't add any value other than shortening by 1 character.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:178
+LLVMContext  = Builder.getContext();
+// Use the maximun column as stride. It must be the same with load
+// stride.

maximun->maximum



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:182
+Value *I8Ptr =
+Builder.CreateBitCast(ST->getOperand(1), Type::getInt8PtrTy(Ctx));
+std::array Args = {Row, Col, I8Ptr, Stride, Src};

Use Builder.getInt8PtrTy then you don't need Ctx


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-11-24 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:164
+}
+// %src = call x86_amx @llvm.x86.tileloadd64.internal(%row, %col, 
%addr,
+// %stride);

`%src` is not used here.



Comment at: llvm/utils/TableGen/IntrinsicEmitter.cpp:252
+  IIT_V256 = 50,
+  IIT_AMX  = 51,
 };

Remove `,`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-11-24 Thread Pengfei Wang via Phabricator via cfe-commits
pengfei added inline comments.



Comment at: llvm/lib/IR/DataLayout.cpp:819
+  case Type::X86_AMXTyID:
+return Align(64);
   default:

Should be 512 bits?



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:79
+// -->
+// %addr = alloca <256 x i32>, align 1024
+// store <256 x i32> %src, <256 x i32>* %addr, align 1024

Why the alignment not be 64?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-11-24 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 307300.
LuoYuanke added a comment.

Address Craig's comments.
Change dyn_cast to cast.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51,
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,70 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+; test bitcast x86_amx to <256 x i32>
+define dso_local void @test_user_empty(x86_amx %in) #2 {
+; CHECK-LABEL: @test_user_empty(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t = bitcast x86_amx %in to <256 x i32>
+  ret void
+}
+
+; test bitcast <256 x i32> to x86_amx
+define dso_local void @test_user_empty2(<256 x i32> %in) #2 {
+; CHECK-LABEL: @test_user_empty2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t = bitcast <256 x i32> %in to x86_amx
+  ret void
+}
+
+define dso_local void @test_src_add(<256 x i32> %x, <256 x i32> %y, i16 %r, i16 %c, i8* %buf, i64 %s) #2 {
+; CHECK-LABEL: @test_src_add(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[TMP0:%.*]] = alloca <256 x i32>, align 1024
+; CHECK-NEXT:[[ADD:%.*]] = add <256 x i32> [[Y:%.*]], [[X:%.*]]
+; CHECK-NEXT:[[TMP1:%.*]] = bitcast <256 x i32>* [[TMP0]] to i8*
+; CHECK-NEXT:store <256 x i32> [[ADD]], <256 x i32>* [[TMP0]], align 1024
+; CHECK-NEXT:[[TMP2:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.*]], i16 [[C:%.*]], i8* [[TMP1]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[R]], i16 [[C]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP2]]) [[ATTR3:#.*]]
+; CHECK-NEXT:ret void
+;
+entry:
+  %add = add <256 x i32> %y, %x
+  %t = bitcast <256 x i32> %add to x86_amx
+  call void @llvm.x86.tilestored64.internal(i16 %r, i16 %c, i8* %buf, i64 %s, x86_amx %t) #3
+  ret void
+}
+
+define dso_local void @test_src_add2(<256 x i32> %x, i16 %r, i16 %c, i8* %buf, i64 %s) #2 {
+; CHECK-LABEL: @test_src_add2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[TMP0:%.*]] = alloca <256 x i32>, align 1024
+; CHECK-NEXT:[[T1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.*]], i16 [[C:%.*]], i8* [[BUF:%.*]], i64 [[S:%.*]]) [[ATTR3]]
+; CHECK-NEXT:[[TMP1:%.*]] = bitcast <256 x i32>* 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-11-24 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke updated this revision to Diff 307298.
LuoYuanke retitled this revision from "[X86] Add x86_amx type for intel AMX. " 
to "[X86] Add x86_amx type for intel AMX.".
LuoYuanke added a comment.

Address Craig's comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51,
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -8,18 +8,70 @@
 @buf = dso_local global [1024 x i8] zeroinitializer, align 16
 @buf2 = dso_local global [1024 x i8] zeroinitializer, align 16
 
+; test bitcast x86_amx to <256 x i32>
+define dso_local void @test_user_empty(x86_amx %in) #2 {
+; CHECK-LABEL: @test_user_empty(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t = bitcast x86_amx %in to <256 x i32>
+  ret void
+}
+
+; test bitcast <256 x i32> to x86_amx
+define dso_local void @test_user_empty2(<256 x i32> %in) #2 {
+; CHECK-LABEL: @test_user_empty2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:ret void
+;
+entry:
+  %t = bitcast <256 x i32> %in to x86_amx
+  ret void
+}
+
+define dso_local void @test_src_add(<256 x i32> %x, <256 x i32> %y, i16 %r, i16 %c, i8* %buf, i64 %s) #2 {
+; CHECK-LABEL: @test_src_add(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[TMP0:%.*]] = alloca <256 x i32>, align 1024
+; CHECK-NEXT:[[ADD:%.*]] = add <256 x i32> [[Y:%.*]], [[X:%.*]]
+; CHECK-NEXT:[[TMP1:%.*]] = bitcast <256 x i32>* [[TMP0]] to i8*
+; CHECK-NEXT:store <256 x i32> [[ADD]], <256 x i32>* [[TMP0]], align 1024
+; CHECK-NEXT:[[TMP2:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.*]], i16 [[C:%.*]], i8* [[TMP1]], i64 64)
+; CHECK-NEXT:call void @llvm.x86.tilestored64.internal(i16 [[R]], i16 [[C]], i8* [[BUF:%.*]], i64 [[S:%.*]], x86_amx [[TMP2]]) [[ATTR3:#.*]]
+; CHECK-NEXT:ret void
+;
+entry:
+  %add = add <256 x i32> %y, %x
+  %t = bitcast <256 x i32> %add to x86_amx
+  call void @llvm.x86.tilestored64.internal(i16 %r, i16 %c, i8* %buf, i64 %s, x86_amx %t) #3
+  ret void
+}
+
+define dso_local void @test_src_add2(<256 x i32> %x, i16 %r, i16 %c, i8* %buf, i64 %s) #2 {
+; CHECK-LABEL: @test_src_add2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[TMP0:%.*]] = alloca <256 x i32>, align 1024
+; CHECK-NEXT:[[T1:%.*]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.*]], i16 

[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-11-23 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

In D91927#2412604 , @LuoYuanke wrote:

> In D91927#2412557 , @craig.topper 
> wrote:
>
>> I only took a quick pass through this so far.  What happens if a bitcast 
>> between x86amx and v256i32(or any other 1024-bit vector type) exists in the 
>> IR but isn't next to a load/store?
>
> @craig.topper , thank you for reviewing my patch.
> I think if user just use our external API, such IR won't be generated. 
> However if there is such IR, we can transform bitcast to , so 
> that the type can be translated through memory. One of  is AMX 
> intrinsics store/load, so it won't be optimized. Is it reasonable?

Its fine if its not optimized, just make sure it doesn't crash.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-11-23 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke added a comment.

In D91927#2412557 , @craig.topper 
wrote:

> I only took a quick pass through this so far.  What happens if a bitcast 
> between x86amx and v256i32(or any other 1024-bit vector type) exists in the 
> IR but isn't next to a load/store?

@craig.topper , thank you for reviewing my patch.
I think if user just use our external API, such IR won't be generated. However 
if there is such IR, we can transform bitcast to , so that the 
type can be translated through memory. One of  is AMX intrinsics 
store/load, so it won't be optimized. Is it reasonable?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX.

2020-11-23 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

I only took a quick pass through this so far.  What happens if a bitcast 
between x86amx and v256i32(or any other 1024-bit vector type) exists in the IR 
but isn't next to a load/store?




Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:5334
   // to x86 amx tile on amx intrinsics.
-  if (MemVT == MVT::v256i32)
-return false;
+  // if (MemVT == MVT::v256i32)
+  //   return false;

Should this just be deleted?



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:63
+LoadInst *LD = dyn_cast(Src);
+assert(LD && "Expected bitcast to x86amx from load");
+assert(LD->hasOneUser() &&

Don't use an assert to check the result of a dyn_cast. If it shouldn't fail 
just use cast which will assert internally.



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:71
+auto *AMXIntrinsic =
+dyn_cast(Inst.use_begin()->getUser());
+auto *Row = AMXIntrinsic->getOperand(0);

Unchecked dyn_cast



Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:94
+for (auto UI = Inst.use_begin(), UE = Inst.use_end(); UI != UE;) {
+  StoreInst *ST = dyn_cast((UI++)->getUser());
+  assert(ST && "Expected bitcast to x86amx for store");

Use cast.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91927/new/

https://reviews.llvm.org/D91927

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91927: [X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32>

2020-11-21 Thread LuoYuanke via Phabricator via cfe-commits
LuoYuanke created this revision.
Herald added subscribers: llvm-commits, cfe-commits, dexonsmith, pengfei, 
hiraditya, qcolombet.
Herald added a reviewer: deadalnix.
Herald added projects: clang, LLVM.
LuoYuanke requested review of this revision.
Herald added a subscriber: jdoerfert.

...only operate on type x86_amx. It can separate amx intrinsics from llvm IR 
instructions (+-*/). Thank Craig for the idea. This patch depend on 
https://reviews.llvm.org/D87981.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D91927

Files:
  clang/test/CodeGen/X86/amx_api.c
  llvm/include/llvm-c/Core.h
  llvm/include/llvm/Bitcode/LLVMBitCodes.h
  llvm/include/llvm/CodeGen/ValueTypes.td
  llvm/include/llvm/IR/DataLayout.h
  llvm/include/llvm/IR/Intrinsics.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsX86.td
  llvm/include/llvm/IR/Type.h
  llvm/include/llvm/Support/MachineValueType.h
  llvm/lib/AsmParser/LLLexer.cpp
  llvm/lib/Bitcode/Reader/BitcodeReader.cpp
  llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
  llvm/lib/CodeGen/ValueTypes.cpp
  llvm/lib/IR/AsmWriter.cpp
  llvm/lib/IR/Constants.cpp
  llvm/lib/IR/Core.cpp
  llvm/lib/IR/DataLayout.cpp
  llvm/lib/IR/Function.cpp
  llvm/lib/IR/LLVMContextImpl.cpp
  llvm/lib/IR/LLVMContextImpl.h
  llvm/lib/IR/Type.cpp
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86LowerAMXType.cpp
  llvm/lib/Target/X86/X86RegisterInfo.td
  llvm/test/CodeGen/X86/AMX/amx-across-func.ll
  llvm/test/CodeGen/X86/AMX/amx-config.ll
  llvm/test/CodeGen/X86/AMX/amx-spill.ll
  llvm/test/CodeGen/X86/AMX/amx-type.ll
  llvm/utils/TableGen/CodeGenTarget.cpp
  llvm/utils/TableGen/IntrinsicEmitter.cpp

Index: llvm/utils/TableGen/IntrinsicEmitter.cpp
===
--- llvm/utils/TableGen/IntrinsicEmitter.cpp
+++ llvm/utils/TableGen/IntrinsicEmitter.cpp
@@ -248,7 +248,8 @@
   IIT_V128 = 47,
   IIT_BF16 = 48,
   IIT_STRUCT9 = 49,
-  IIT_V256 = 50
+  IIT_V256 = 50,
+  IIT_AMX  = 51,
 };
 
 static void EncodeFixedValueType(MVT::SimpleValueType VT,
@@ -276,6 +277,7 @@
   case MVT::token: return Sig.push_back(IIT_TOKEN);
   case MVT::Metadata: return Sig.push_back(IIT_METADATA);
   case MVT::x86mmx: return Sig.push_back(IIT_MMX);
+  case MVT::x86amx: return Sig.push_back(IIT_AMX);
   // MVT::OtherVT is used to mean the empty struct type here.
   case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
   // MVT::isVoid is used to represent varargs here.
Index: llvm/utils/TableGen/CodeGenTarget.cpp
===
--- llvm/utils/TableGen/CodeGenTarget.cpp
+++ llvm/utils/TableGen/CodeGenTarget.cpp
@@ -76,6 +76,7 @@
   case MVT::f128: return "MVT::f128";
   case MVT::ppcf128:  return "MVT::ppcf128";
   case MVT::x86mmx:   return "MVT::x86mmx";
+  case MVT::x86amx:   return "MVT::x86amx";
   case MVT::Glue: return "MVT::Glue";
   case MVT::isVoid:   return "MVT::isVoid";
   case MVT::v1i1: return "MVT::v1i1";
Index: llvm/test/CodeGen/X86/AMX/amx-type.ll
===
--- llvm/test/CodeGen/X86/AMX/amx-type.ll
+++ llvm/test/CodeGen/X86/AMX/amx-type.ll
@@ -12,14 +12,8 @@
 ; CHECK-LABEL: @test_load(
 ; CHECK-NEXT:[[TMP1:%.*]] = bitcast i8* [[IN:%.*]] to <256 x i32>*
 ; CHECK-NEXT:[[TMP2:%.*]] = bitcast i8* [[OUT:%.*]] to <256 x i32>*
-; CHECK-NEXT:[[TMP3:%.*]] = bitcast <256 x i32>* [[TMP1]] to <128 x i32>*
-; CHECK-NEXT:[[TMP4:%.*]] = load <128 x i32>, <128 x i32>* [[TMP3]], align 64
-; CHECK-NEXT:[[TMP5:%.*]] = getelementptr <128 x i32>, <128 x i32>* [[TMP3]], i32 1
-; CHECK-NEXT:[[TMP6:%.*]] = load <128 x i32>, <128 x i32>* [[TMP5]], align 64
-; CHECK-NEXT:[[TMP7:%.*]] = bitcast <256 x i32>* [[TMP2]] to <128 x i32>*
-; CHECK-NEXT:store <128 x i32> [[TMP4]], <128 x i32>* [[TMP7]], align 64
-; CHECK-NEXT:[[TMP8:%.*]] = getelementptr <128 x i32>, <128 x i32>* [[TMP7]], i32 1
-; CHECK-NEXT:store <128 x i32> [[TMP6]], <128 x i32>* [[TMP8]], align 64
+; CHECK-NEXT:[[TMP3:%.*]] = load <256 x i32>, <256 x i32>* [[TMP1]], align 64, [[TBAA2:!tbaa !.*]]
+; CHECK-NEXT:store <256 x i32> [[TMP3]], <256 x i32>* [[TMP2]], align 64, [[TBAA2]]
 ; CHECK-NEXT:ret void
 ;
   %1 = bitcast i8* %in to <256 x i32>*
@@ -29,18 +23,33 @@
   ret void
 }
 
+define dso_local <256 x i32> @foo(<256 x i32>* nocapture readonly byval(<256 x i32>) align 1024 %0, <256 x i32>* nocapture readonly byval(<256 x i32>) align 1024 %1) local_unnamed_addr #0 {
+; CHECK-LABEL: @foo(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[X:%.*]] = load <256 x i32>, <256 x i32>* [[TMP0:%.*]], align 1024, [[TBAA5:!tbaa !.*]]
+; CHECK-NEXT:[[Y:%.*]] = load <256 x i32>, <256 x i32>* [[TMP1:%.*]], align 1024, [[TBAA5]]
+; CHECK-NEXT:[[ADD:%.*]] = add <256 x i32> [[Y]], [[X]]
+; CHECK-NEXT:ret <256 x i32> [[ADD]]
+;
+entry:
+  %x = load <256 x i32>, <256 x i32>*