[PATCH] D148796: [AMDGPU][GFX908] Add builtin support for global add atomic f16/f32

2023-04-20 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec requested changes to this revision.
rampitec added a comment.
This revision now requires changes to proceed.

We used to support it that way and decided just not doing it. It is very hard 
to explain why a supported atomic results in error. Someone who really needs it 
can use intrinsic.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148796/new/

https://reviews.llvm.org/D148796

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D147732: [AMDGPU] Add f32 permlane{16, x16} builtin variants

2023-04-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D147732#4267553 , @foad wrote:

> Changing the existing intrinsics to use type mangling could break clients 
> like LLPC and Mesa. I've put up a patch for LLPC to protect it against this 
> change: https://github.com/GPUOpen-Drivers/llpc/pull/2404

It can be fixed with IR autoupgrade I suppose.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147732/new/

https://reviews.llvm.org/D147732

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D147732: [AMDGPU] Add f32 permlane{16, x16} builtin variants

2023-04-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D147732#4249584 , @jrbyrnes wrote:

> In D147732#4249567 , @rampitec 
> wrote:
>
>> Isn't it simpler to lower it to an existing int intrinsic and casts in clang?
>
> Thanks for your comment Stas!
>
> I think it would be ideal if clang inserted pure bitcasts for floats instead 
> of fptoui when passed as operands to these builtins. My concern is -- Do you 
> think we need to preserve the implicit casting behavior for compatibility?

You can manually lower builtin to a proper cast and intrinsic in the 
CGBuiltin.cpp.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147732/new/

https://reviews.llvm.org/D147732

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D147732: [AMDGPU] Add f32 permlane{16, x16} builtin variants

2023-04-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

Isn't it simpler to lower it to an existing int intrinsic and casts in clang?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147732/new/

https://reviews.llvm.org/D147732

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146840: [AMDGPU] Replace target feature for global fadd32

2023-03-28 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM. Please wait for @b-sumner.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146840/new/

https://reviews.llvm.org/D146840

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146840: [AMDGPU] Replace target feature for global fadd32

2023-03-28 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

Can you please also add gfx90a and gfx940 tests?

Otherwise LGTM *if* @b-sumner has no objections.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146840/new/

https://reviews.llvm.org/D146840

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D146840: [AMDGPU] Replace target feature for global fadd32

2023-03-27 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec requested changes to this revision.
rampitec added a comment.
This revision now requires changes to proceed.

You cannot just enable it on gfx908 which does not have return version of it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146840/new/

https://reviews.llvm.org/D146840

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D142507#4127505 , @b-sumner wrote:

>> My current understanding is the c-p will go into already forked clang-16, 
>> but not to rocm 5.4. So rocm device-libs will be accompanied by the older 
>> clang-16 w/o this and stay compatible. Someone building from scratch will 
>> use latest clang-16 and staging device-libs with this change. Do you think 
>> this will work?
>
> I wouldn't recommend it.  I would patch whatever device libs are being built 
> in association with clang-16, not staging.  Staging device libs is only 
> appropriate for the staging compiler.  A hash of device libs from around the 
> time that clang-16 stable released would probably be safe.

In general the idea is that compiler and device-libs should match. I guess the 
correct answer then users of clang-16 shall use rocm-5.4.x branch of the device 
libs?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142507/new/

https://reviews.llvm.org/D142507

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D142507#4127421 , @b-sumner wrote:

> I have no objection to backporting this, but it may need to be accompanied 
> with a device-libs patch, and I don't know where that patch would be checked 
> in.  The ROCm-Device-Libs in github certainly doesn't have a "clang-16" 
> branch.

My current understanding is the c-p will go into already forked clang-16, but 
not to rocm 5.4. So rocm device-libs will be accompanied by the older clang-16 
w/o this and stay compatible. Someone building from scratch will use latest 
clang-16 and staging device-libs with this change. Do you think this will work?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142507/new/

https://reviews.llvm.org/D142507

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D142507#4127374 , @aaronmondal 
wrote:

> I think unless conflicts arise creating an issue similar to this 
> https://github.com/llvm/llvm-project/issues/60600 with the `cherry-pick` line 
> set to this commit should be enough. (See also 
> https://llvm.org/docs/GitHub.html).

I believe it will need D142407  to be 
cherry-picked as well to apply cleanly. Otherwise I do not expect conflicts. So 
the c-p need to go into release/16.x, right?
Let's wait for @b-sumner first anyway, he is maintaining device-lib.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142507/new/

https://reviews.llvm.org/D142507

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D142507#4127275 , @aaronmondal 
wrote:

>> I cannot say there was much choice. The only real choice was to postpone the 
>> split and magnify the problem in the future. As for the ifdefs, this might 
>> be possible in the device-libs but I do not see how to do it the 
>> Builtins.def.
>
> Hmm maybe ifdefs in the device libs would also just delay the issue. Maybe it 
> really is best to pull this change into Clang 16 and accept the fact that 
> it's an unfortunate situation, but at least give users with very recent 
> hardware the option to use a regular Clang to build ROCm. Realistically, 
> those actually upgrading to Clang 16 early will also be those upgrading to 
> ROCm5.5 early and likely also be those most likely to have 7900 GPUs.
>
> Somehow, telling users "if you have a new GPU you need new Clang + ROCm" and 
> "if you want new ROCm for your old GPU you need to also upgrade Clang" sounds 
> better to me than telling them "if you have a new GPU you are SOL unless you 
> use binary releases or build the amd-llvm-fork" 

In fact pulling it into clang-16 does not automatically mean it should be the 
same in the rocm clang build... So this may be a way to go. @b-sumner do you 
have any objections to backport this into clang-16?

@aaronmondal what exactly backport will look like?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142507/new/

https://reviews.llvm.org/D142507

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D142507#4127167 , @aaronmondal 
wrote:

> Well, I can already feel the pain that distro maintainers having to build the 
> next ROCm releases 
>
> I wonder what the better course of action is here:
>
> 1. Port this patch to Clang 16 so that users with new hardware will be able 
> to build ROCm 5.5, but make it impossible to build ROCm 5.4 and older with 
> clang 16.
> 2. Don't port this patch and have a ~6 months gap during which users with the 
> 7900 GPUs won't be able to build ROCm with a stable Clang version, requiring 
> distro maintainers to use several toolchains and source-based distro users to 
> use differentl compatibility patches for different ROCm releases. So 
> basically when 8900 GPUs are announced, clang would support ROCm for 7900 
> GPUs 
>
> Would there be a way to retain at least *some* backwards compatibility or 
> version interoperability? For instance, via an `#ifdef CLANG_VERSION_MAJOR` 
> in the device libs and an `#ifdef INCOMPATIBLE_AMDGPU_INSTS` in Clang?
>
> This would obviously very ugly, but it still seems better to me than locking 
> out users (and more likely, ROCm contributors) from using 7900 GPUs if they 
> are unable to build Clang themselves. Users already complain about how hard 
> it is to build ROCm, and they also complain about the frequent breaking 
> changes Clang. I'm very much in favor of moving fast, but I'm worried that 
> complete disregard for backwards compatibility like this with no clear 
> upgrade path or fallback mechanism could cause a lot of frustration for users 
> and distro maintainers.
>
> Maybe there is some other, prettier way to solve this? 凉

I cannot say there was much choice. The only real choice was to postpone the 
split and magnify the problem in the future. As for the ifdefs, this might be 
possible in the device-libs but I do not see how to do it the Builtins.def.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142507/new/

https://reviews.llvm.org/D142507

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D142507#4126864 , @aaronmondal 
wrote:

>> It shall be complimented by the device-lib change in the corresponding 
>> release, so it is not that simple.
>
> @rampitec I'm not sure I understand. Does this mean that this is breaking in 
> a way that Clang 17 won't be able to build ROCm 5.4?
>
> I thought it was like "we need D142507  to 
> build device-libs after 8dc779e 
> "
>  and for older device libs we just fall back to some older behavior.

Since the feature is actually used by the device-lib it had to be updated in 
lock step with the compiler change, not after or before. That's what was done 
in the downstream.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142507/new/

https://reviews.llvm.org/D142507

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-02-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D142507#4125940 , @aaronmondal 
wrote:

> Would it be possible to backport this to Clang 16?
>
> If 
> https://github.com/RadeonOpenCompute/ROCm-Device-Libs/commit/8dc779e19cbf2ccfd3307b60f7db57cf4203a5be
>  makes it into ROCm 5.5 no distro would be able to build it with "vanilla" 
> Clang 16, potentially causing pain for users that try to build ROCm 5.5 with 
> a Clang from a package manager (a realistic scenario, considering that one 
> may want to invest 5 min to build ROCm but not 40 min to build Clang). ROCm 
> 5.5 will be the first release to officially support the 7900XT and 7900XTX, 
> so not having this potentially causes issues for users with recent AMD 
> hardware. (See https://github.com/RadeonOpenCompute/ROCm/issues/1880 for 
> extensive, related discussion).
>
> @jhuber6 This wouldn't exactly "solve" 
> https://github.com/llvm/llvm-project/issues/60660, but I think this could 
> also be a workaround (with potentially better user experience), as allowing 
> users build ROCm with regular Clang 16 prevents that deadlock where we can't 
> build ROCm anymore. This is entirely based on speculation that ROCm 5.5 won't 
> introduce other breakages before its release though, so I'd totally 
> understand if this is not a satisfactory solution.

It shall be complimented by the device-lib change in the corresponding release, 
so it is not that simple.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142507/new/

https://reviews.llvm.org/D142507

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D142507: [AMDGPU] Split dot7 feature

2023-01-26 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
rampitec marked an inline comment as done.
Closed by commit rGdf0488369d32: [AMDGPU] Split dot7 feature (authored by 
rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142507/new/

https://reviews.llvm.org/D142507

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/GCNSubtarget.h
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td

Index: llvm/lib/Target/AMDGPU/VOP3PInstructions.td
===
--- llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+++ llvm/lib/Target/AMDGPU/VOP3PInstructions.td
@@ -337,11 +337,12 @@
 
 } // End SubtargetPredicate = HasDot2Insts
 
-let SubtargetPredicate = HasDot7Insts in {
-
+let SubtargetPredicate = HasDot10Insts in
 defm V_DOT2_F32_F16 : VOP3PInst<"v_dot2_f32_f16",
   VOP3P_Profile,
   AMDGPUfdot2, 1/*ExplicitClamp*/>;
+
+let SubtargetPredicate = HasDot7Insts in {
 defm V_DOT4_U32_U8  : VOP3PInst<"v_dot4_u32_u8",
   VOP3P_Profile, int_amdgcn_udot4, 1>;
 defm V_DOT8_U32_U4  : VOP3PInst<"v_dot8_u32_u4",
Index: llvm/lib/Target/AMDGPU/GCNSubtarget.h
===
--- llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -146,6 +146,7 @@
   bool HasDot7Insts = false;
   bool HasDot8Insts = false;
   bool HasDot9Insts = false;
+  bool HasDot10Insts = false;
   bool HasMAIInsts = false;
   bool HasFP8Insts = false;
   bool HasPkFmacF16Inst = false;
@@ -738,6 +739,10 @@
 return HasDot9Insts;
   }
 
+  bool hasDot10Insts() const {
+return HasDot10Insts;
+  }
+
   bool hasMAIInsts() const {
 return HasMAIInsts;
   }
Index: llvm/lib/Target/AMDGPU/AMDGPU.td
===
--- llvm/lib/Target/AMDGPU/AMDGPU.td
+++ llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -581,7 +581,7 @@
 def FeatureDot7Insts : SubtargetFeature<"dot7-insts",
   "HasDot7Insts",
   "true",
-  "Has v_dot2_f32_f16, v_dot4_u32_u8, v_dot8_u32_u4 instructions"
+  "Has v_dot4_u32_u8, v_dot8_u32_u4 instructions"
 >;
 
 def FeatureDot8Insts : SubtargetFeature<"dot8-insts",
@@ -596,6 +596,12 @@
   "Has v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16 instructions"
 >;
 
+def FeatureDot10Insts : SubtargetFeature<"dot10-insts",
+  "HasDot10Insts",
+  "true",
+  "Has v_dot2_f32_f16 instruction"
+>;
+
 def FeatureMAIInsts : SubtargetFeature<"mai-insts",
   "HasMAIInsts",
   "true",
@@ -1081,6 +1087,7 @@
FeatureDot1Insts,
FeatureDot2Insts,
FeatureDot7Insts,
+   FeatureDot10Insts,
FeatureSupportsSRAMECC,
FeatureImageGather4D16Bug]>;
 
@@ -1101,6 +1108,7 @@
FeatureDot5Insts,
FeatureDot6Insts,
FeatureDot7Insts,
+   FeatureDot10Insts,
FeatureMAIInsts,
FeaturePkFmacF16Inst,
FeatureAtomicFaddNoRtnInsts,
@@ -1133,6 +1141,7 @@
FeatureDot5Insts,
FeatureDot6Insts,
FeatureDot7Insts,
+   FeatureDot10Insts,
Feature64BitDPP,
FeaturePackedFP32Ops,
FeatureMAIInsts,
@@ -1172,6 +1181,7 @@
FeatureDot5Insts,
FeatureDot6Insts,
FeatureDot7Insts,
+   FeatureDot10Insts,
Feature64BitDPP,
FeaturePackedFP32Ops,
FeatureMAIInsts,
@@ -1233,6 +1243,7 @@
  FeatureDot5Insts,
  FeatureDot6Insts,
  FeatureDot7Insts,
+ FeatureDot10Insts,
  FeatureNSAEncoding,
  FeatureNSAMaxSize5,
  FeatureWavefrontSize32,
@@ -1256,6 +1267,7 @@
  FeatureDot5Insts,
  FeatureDot6Insts,
  FeatureDot7Insts,
+ FeatureDot10Insts,
  FeatureNSAEncoding,
  FeatureNSAMaxSize5,
  FeatureWavefrontSize32,
@@ -1300,6 +1312,7 @@
FeatureDot5Insts,
FeatureDot6Insts,
FeatureDot7Insts,
+   FeatureDot10Insts,
FeatureNSAEncoding,
FeatureNSAMaxSize13,
FeatureWavefrontSize32,
@@ -1314,6 +1327,7 @@
FeatureDot7Insts,
FeatureDot8Insts,
FeatureDot9Insts,
+   FeatureDot10Insts,
FeatureNSAEncoding,
FeatureNSAMaxSize5,
FeatureWavefrontSize32,
@@ -1766,6 +1780,9 @@
 def HasDot9Insts : Predicate<"Subtarget->hasDot9Insts()">,
   AssemblerPredicate<(all_of FeatureDot9Insts)>;
 
+def HasDot10Insts : Predicate<"Subtarget->hasDot10Insts()">,
+  AssemblerPredicate<(all_of FeatureDot10Insts)>;
+
 def HasGetWaveIdInst : Predicate<"Subtarget->hasGetWaveIdInst()">,
   AssemblerPredicate<(all_of FeatureGetWaveIdInst)>;
 
Index: clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
===
--- clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
+++ clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
@@ -16,8 +16,8 @@
 short2 

[PATCH] D142407: [AMDGPU] Split dot8 feature

2023-01-24 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG870b92977e89: [AMDGPU] Split dot8 feature (authored by 
rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142407/new/

https://reviews.llvm.org/D142407

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/GCNSubtarget.h
  llvm/lib/Target/AMDGPU/VOP3Instructions.td
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td

Index: llvm/lib/Target/AMDGPU/VOP3PInstructions.td
===
--- llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+++ llvm/lib/Target/AMDGPU/VOP3PInstructions.td
@@ -363,12 +363,12 @@
   let HasSrc1Mods = 1;
 }
 
-let SubtargetPredicate = HasDot8Insts  in {
+let SubtargetPredicate = HasDot9Insts  in {
 
 defm V_DOT2_F32_BF16 : VOP3PInst<"v_dot2_f32_bf16", DOT2_BF16_Profile,
   int_amdgcn_fdot2_f32_bf16, 1>;
 
-} // End SubtargetPredicate = HasDot8Insts
+} // End SubtargetPredicate = HasDot9Insts
 
 } // End let IsDOT = 1
 
Index: llvm/lib/Target/AMDGPU/VOP3Instructions.td
===
--- llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -758,7 +758,7 @@
   defm V_CVT_PK_U16_F32 : VOP3Inst<"v_cvt_pk_u16_f32", VOP3_Profile>;
 } // End SubtargetPredicate = isGFX11Plus
 
-let SubtargetPredicate = HasDot8Insts, IsDOT=1 in {
+let SubtargetPredicate = HasDot9Insts, IsDOT=1 in {
   defm V_DOT2_F16_F16 :   VOP3Inst<"v_dot2_f16_f16",   VOP3_DOT_Profile, int_amdgcn_fdot2_f16_f16>;
   defm V_DOT2_BF16_BF16 : VOP3Inst<"v_dot2_bf16_bf16", VOP3_DOT_Profile, int_amdgcn_fdot2_bf16_bf16>;
 }
Index: llvm/lib/Target/AMDGPU/GCNSubtarget.h
===
--- llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -145,6 +145,7 @@
   bool HasDot6Insts = false;
   bool HasDot7Insts = false;
   bool HasDot8Insts = false;
+  bool HasDot9Insts = false;
   bool HasMAIInsts = false;
   bool HasFP8Insts = false;
   bool HasPkFmacF16Inst = false;
@@ -733,6 +734,10 @@
 return HasDot8Insts;
   }
 
+  bool hasDot9Insts() const {
+return HasDot9Insts;
+  }
+
   bool hasMAIInsts() const {
 return HasMAIInsts;
   }
Index: llvm/lib/Target/AMDGPU/AMDGPU.td
===
--- llvm/lib/Target/AMDGPU/AMDGPU.td
+++ llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -587,8 +587,13 @@
 def FeatureDot8Insts : SubtargetFeature<"dot8-insts",
   "HasDot8Insts",
   "true",
-  "Has v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16, "
-  "v_dot4_i32_iu8, v_dot8_i32_iu4 instructions"
+  "Has v_dot4_i32_iu8, v_dot8_i32_iu4 instructions"
+>;
+
+def FeatureDot9Insts : SubtargetFeature<"dot9-insts",
+  "HasDot9Insts",
+  "true",
+  "Has v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16 instructions"
 >;
 
 def FeatureMAIInsts : SubtargetFeature<"mai-insts",
@@ -1308,6 +1313,7 @@
FeatureDot5Insts,
FeatureDot7Insts,
FeatureDot8Insts,
+   FeatureDot9Insts,
FeatureNSAEncoding,
FeatureNSAMaxSize5,
FeatureWavefrontSize32,
@@ -1757,6 +1763,9 @@
 def HasDot8Insts : Predicate<"Subtarget->hasDot8Insts()">,
   AssemblerPredicate<(all_of FeatureDot8Insts)>;
 
+def HasDot9Insts : Predicate<"Subtarget->hasDot9Insts()">,
+  AssemblerPredicate<(all_of FeatureDot9Insts)>;
+
 def HasGetWaveIdInst : Predicate<"Subtarget->hasGetWaveIdInst()">,
   AssemblerPredicate<(all_of FeatureGetWaveIdInst)>;
 
Index: clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
===
--- clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
+++ clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl
@@ -19,12 +19,12 @@
   fOut[0] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, false);  // expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot7-insts}}
   fOut[1] = __builtin_amdgcn_fdot2(v2hA, v2hB, fC, true);   // expected-error {{'__builtin_amdgcn_fdot2' needs target feature dot7-insts}}
 
-  hOut[0] = __builtin_amdgcn_fdot2_f16_f16(v2hA, v2hB, hC);   // expected-error {{'__builtin_amdgcn_fdot2_f16_f16' needs target feature dot8-insts}}
+  hOut[0] = __builtin_amdgcn_fdot2_f16_f16(v2hA, v2hB, hC);   // expected-error {{'__builtin_amdgcn_fdot2_f16_f16' needs target feature dot9-insts}}
 
-  sOut[0] = __builtin_amdgcn_fdot2_bf16_bf16(v2ssA, v2ssB, sC);   // expected-error {{'__builtin_amdgcn_fdot2_bf16_bf16' needs target feature dot8-insts}}
+  sOut[0] = __builtin_amdgcn_fdot2_bf16_bf16(v2ssA, v2ssB, sC);   // 

[PATCH] D142493: [AMDGPU] Remove dot1 and dot6 features from clang for gfx11

2023-01-24 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG4ab2246d486b: [AMDGPU] Remove dot1 and dot6 features from 
clang for gfx11 (authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142493/new/

https://reviews.llvm.org/D142493

Files:
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-gfx11.cl


Index: clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-gfx11.cl
===
--- clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-gfx11.cl
+++ clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-gfx11.cl
@@ -14,14 +14,10 @@
 // CHECK: call i16 @llvm.amdgcn.fdot2.bf16.bf16(<2 x i16> %v2ssA, <2 x i16> 
%v2ssB, i16 %sC)
 // CHECK: call float @llvm.amdgcn.fdot2.f32.bf16(<2 x i16> %v2ssA, <2 x i16> 
%v2ssB, float %fC, i1 false)
 // CHECK: call float @llvm.amdgcn.fdot2.f32.bf16(<2 x i16> %v2ssA, <2 x i16> 
%v2ssB, float %fC, i1 true)
-// CHECK: call i32 @llvm.amdgcn.sdot4(i32 %siA, i32 %siB, i32 %siC, i1 false)
-// CHECK: call i32 @llvm.amdgcn.sdot4(i32 %siA, i32 %siB, i32 %siC, i1 true)
 // CHECK: call i32 @llvm.amdgcn.udot4(i32 %uiA, i32 %uiB, i32 %uiC, i1 false)
 // CHECK: call i32 @llvm.amdgcn.udot4(i32 %uiA, i32 %uiB, i32 %uiC, i1 true)
 // CHECK: call i32 @llvm.amdgcn.sudot4(i1 true, i32 %A, i1 false, i32 %B, i32 
%C, i1 false)
 // CHECK: call i32 @llvm.amdgcn.sudot4(i1 false, i32 %A, i1 true, i32 %B, i32 
%C, i1 true)
-// CHECK: call i32 @llvm.amdgcn.sdot8(i32 %siA, i32 %siB, i32 %siC, i1 false)
-// CHECK: call i32 @llvm.amdgcn.sdot8(i32 %siA, i32 %siB, i32 %siC, i1 true)
 // CHECK: call i32 @llvm.amdgcn.udot8(i32 %uiA, i32 %uiB, i32 %uiC, i1 false)
 // CHECK: call i32 @llvm.amdgcn.udot8(i32 %uiA, i32 %uiB, i32 %uiC, i1 true)
 // CHECK: call i32 @llvm.amdgcn.sudot8(i1 false, i32 %A, i1 true, i32 %B, i32 
%C, i1 false)
@@ -44,18 +40,12 @@
   fOut[3] = __builtin_amdgcn_fdot2_f32_bf16(v2ssA, v2ssB, fC, false);
   fOut[4] = __builtin_amdgcn_fdot2_f32_bf16(v2ssA, v2ssB, fC, true);
 
-  siOut[2] = __builtin_amdgcn_sdot4(siA, siB, siC, false);
-  siOut[3] = __builtin_amdgcn_sdot4(siA, siB, siC, true);
-
   uiOut[2] = __builtin_amdgcn_udot4(uiA, uiB, uiC, false);
   uiOut[3] = __builtin_amdgcn_udot4(uiA, uiB, uiC, true);
 
   iOut[0] = __builtin_amdgcn_sudot4(true, A, false, B, C, false);
   iOut[1] = __builtin_amdgcn_sudot4(false, A, true, B, C, true);
 
-  siOut[4] = __builtin_amdgcn_sdot8(siA, siB, siC, false);
-  siOut[5] = __builtin_amdgcn_sdot8(siA, siB, siC, true);
-
   uiOut[4] = __builtin_amdgcn_udot8(uiA, uiB, uiC, false);
   uiOut[5] = __builtin_amdgcn_udot8(uiA, uiB, uiC, true);
 
Index: clang/test/CodeGenOpenCL/amdgpu-features.cl
===
--- clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -86,10 +86,10 @@
 // GFX1034: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
 // GFX1035: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
 // GFX1036: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
-// GFX1100: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
-// GFX1101: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
-// GFX1102: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
-// GFX1103: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
-// GFX1103-W64: 
"target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dot8-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize64"
+// GFX1100: 

[PATCH] D133966: [AMDGPU] Added __builtin_amdgcn_ds_bvh_stack_rtn

2022-09-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGe540965915a4: [AMDGPU] Added 
__builtin_amdgcn_ds_bvh_stack_rtn (authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133966/new/

https://reviews.llvm.org/D133966

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11-err.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl


Index: clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl
===
--- clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl
+++ clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl
@@ -6,6 +6,8 @@
 
 typedef unsigned int uint;
 typedef unsigned long ulong;
+typedef uint uint2 __attribute__((ext_vector_type(2)));
+typedef uint uint4 __attribute__((ext_vector_type(4)));
 
 // CHECK-LABEL: @test_s_sendmsg_rtn(
 // CHECK: call i32 @llvm.amdgcn.s.sendmsg.rtn.i32(i32 0)
@@ -18,3 +20,14 @@
 void test_s_sendmsg_rtnl(global ulong* out) {
   *out = __builtin_amdgcn_s_sendmsg_rtnl(0);
 }
+
+// CHECK-LABEL: @test_ds_bvh_stack_rtn(
+// CHECK: %0 = tail call { i32, i32 } @llvm.amdgcn.ds.bvh.stack.rtn(i32 %addr, 
i32 %data, <4 x i32> %data1, i32 128)
+// CHECK: %1 = extractvalue { i32, i32 } %0, 0
+// CHECK: %2 = extractvalue { i32, i32 } %0, 1
+// CHECK: %3 = insertelement <2 x i32> poison, i32 %1, i64 0
+// CHECK: %4 = insertelement <2 x i32> %3, i32 %2, i64 1
+void test_ds_bvh_stack_rtn(global uint2* out, uint addr, uint data, uint4 
data1)
+{
+  *out = __builtin_amdgcn_ds_bvh_stack_rtn(addr, data, data1, 128);
+}
Index: clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11-err.cl
===
--- /dev/null
+++ clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11-err.cl
@@ -0,0 +1,11 @@
+// REQUIRES: amdgpu-registered-target
+
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1100 -verify 
-S -emit-llvm -o - %s
+
+typedef unsigned int uint;
+typedef uint uint2 __attribute__((ext_vector_type(2)));
+typedef uint uint4 __attribute__((ext_vector_type(4)));
+
+kernel void builtins_amdgcn_bvh_err(global uint2* out, uint addr, uint data, 
uint4 data1, uint offset) {
+  *out = __builtin_amdgcn_ds_bvh_stack_rtn(addr, data, data1, offset); // 
expected-error {{'__builtin_amdgcn_ds_bvh_stack_rtn' must be a constant 
integer}}
+}
Index: clang/lib/CodeGen/CGBuiltin.cpp
===
--- clang/lib/CodeGen/CGBuiltin.cpp
+++ clang/lib/CodeGen/CGBuiltin.cpp
@@ -16897,6 +16897,21 @@
   RayInverseDir, TextureDescr});
   }
 
+  case AMDGPU::BI__builtin_amdgcn_ds_bvh_stack_rtn: {
+SmallVector Args;
+for (int i = 0, e = E->getNumArgs(); i != e; ++i)
+  Args.push_back(EmitScalarExpr(E->getArg(i)));
+
+Function *F = CGM.getIntrinsic(Intrinsic::amdgcn_ds_bvh_stack_rtn);
+Value *Call = Builder.CreateCall(F, Args);
+Value *Rtn = Builder.CreateExtractValue(Call, 0);
+Value *A = Builder.CreateExtractValue(Call, 1);
+llvm::Type *RetTy = ConvertType(E->getType());
+Value *I0 = Builder.CreateInsertElement(PoisonValue::get(RetTy), Rtn,
+(uint64_t)0);
+return Builder.CreateInsertElement(I0, A, 1);
+  }
+
   case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_w32:
Index: clang/include/clang/Basic/BuiltinsAMDGPU.def
===
--- clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -281,6 +281,8 @@
 TARGET_BUILTIN(__builtin_amdgcn_s_sendmsg_rtn, "UiUIi", "n", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_sendmsg_rtnl, "UWiUIi", "n", "gfx11-insts")
 
+TARGET_BUILTIN(__builtin_amdgcn_ds_bvh_stack_rtn, "V2UiUiUiV4UiIi", "n", 
"gfx11-insts")
+
 
//===--===//
 // Special builtins.
 
//===--===//


Index: clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl
===
--- clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl
+++ clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl
@@ -6,6 +6,8 @@
 
 typedef unsigned int uint;
 typedef unsigned long ulong;
+typedef uint uint2 __attribute__((ext_vector_type(2)));
+typedef uint uint4 __attribute__((ext_vector_type(4)));
 
 // CHECK-LABEL: @test_s_sendmsg_rtn(
 // CHECK: call i32 @llvm.amdgcn.s.sendmsg.rtn.i32(i32 0)
@@ -18,3 +20,14 @@
 void test_s_sendmsg_rtnl(global ulong* out) {
   *out = 

[PATCH] D129908: [AMDGPU] Support for gfx940 fp8 smfmac

2022-07-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG523a99c0eb03: [AMDGPU] Support for gfx940 fp8 smfmac 
(authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129908/new/

https://reviews.llvm.org/D129908

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
  llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
  llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
  llvm/lib/Target/AMDGPU/SIInstrInfo.td
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll
  llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll
  llvm/test/MC/AMDGPU/mai-gfx940.s
  llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt

Index: llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
+++ llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
@@ -392,3 +392,51 @@
 
 # GFX940: v_smfmac_i32_32x32x32_i8 v[10:25], v[2:3], v[4:7], v3 abid:15 ; encoding: [0x0a,0x78,0xec,0xd3,0x02,0x09,0x0e,0x04]
 0x0a,0x78,0xec,0xd3,0x02,0x09,0x0e,0x04
+
+# GFX940: v_smfmac_f32_16x16x64_bf8_bf8 v[0:3], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xf8,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xf8,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_16x16x64_bf8_bf8 a[0:3], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xf8,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xf8,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_16x16x64_bf8_fp8 v[0:3], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xf9,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xf9,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_16x16x64_bf8_fp8 a[0:3], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xf9,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xf9,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_16x16x64_fp8_bf8 v[0:3], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xfa,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xfa,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_16x16x64_fp8_bf8 a[0:3], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xfa,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xfa,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_16x16x64_fp8_fp8 v[0:3], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xfb,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xfb,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_16x16x64_fp8_fp8 a[0:3], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xfb,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xfb,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_32x32x32_bf8_bf8 v[0:15], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xfc,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xfc,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_32x32x32_bf8_bf8 a[0:15], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xfc,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xfc,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_32x32x32_bf8_fp8 v[0:15], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xfd,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xfd,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_32x32x32_bf8_fp8 a[0:15], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xfd,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xfd,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_32x32x32_fp8_bf8 v[0:15], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xfe,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xfe,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_32x32x32_fp8_bf8 a[0:15], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xfe,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xfe,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_32x32x32_fp8_fp8 v[0:15], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xff,0xd3,0x02,0x09,0x06,0x0c]
+0x00,0x0b,0xff,0xd3,0x02,0x09,0x06,0x0c
+
+# GFX940: v_smfmac_f32_32x32x32_fp8_fp8 a[0:15], v[2:3], a[4:7], v1 ; encoding: [0x00,0x80,0xff,0xd3,0x02,0x09,0x06,0x14]
+0x00,0x80,0xff,0xd3,0x02,0x09,0x06,0x14
Index: llvm/test/MC/AMDGPU/mai-gfx940.s
===
--- llvm/test/MC/AMDGPU/mai-gfx940.s
+++ llvm/test/MC/AMDGPU/mai-gfx940.s
@@ -616,6 +616,70 @@
 // GFX940: v_smfmac_i32_32x32x32_i8 a[10:25], v[2:3], a[4:7], v11 ; encoding: [0x0a,0x80,0xec,0xd3,0x02,0x09,0x2e,0x14]
 // GFX90A: error: instruction not supported on this GPU
 
+v_smfmac_f32_16x16x64_bf8_bf8 v[0:3], a[2:3], v[4:7], v1 cbsz:3 abid:1
+// GFX940: v_smfmac_f32_16x16x64_bf8_bf8 v[0:3], a[2:3], v[4:7], v1 cbsz:3 abid:1 ; encoding: [0x00,0x0b,0xf8,0xd3,0x02,0x09,0x06,0x0c]
+// GFX90A: error: instruction not supported on this GPU
+
+v_smfmac_f32_16x16x64_bf8_bf8 a[0:3], v[2:3], a[4:7], v1
+// GFX940: 

[PATCH] D129906: [AMDGPU] Support for gfx940 fp8 mfma

2022-07-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG2695f0a688e9: [AMDGPU] Support for gfx940 fp8 mfma (authored 
by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129906/new/

https://reviews.llvm.org/D129906

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
  llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
  llvm/lib/Target/AMDGPU/SIInstrInfo.td
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll
  llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll
  llvm/test/MC/AMDGPU/mai-gfx940.s
  llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt

Index: llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
+++ llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
@@ -63,6 +63,78 @@
 # GFX940: v_mfma_f32_32x32x4_xf32 a[0:15], v[2:3], v[4:5], a[2:17] ; encoding: [0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04]
 0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04
 
+# GFX940: v_mfma_f32_16x16x32_bf8_bf8 v[0:3], v[2:3], v[4:5], v[0:3] ; encoding: [0x00,0x00,0xf0,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf0,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_bf8_bf8 a[0:3], v[2:3], v[4:5], a[0:3] ; encoding: [0x00,0x80,0xf0,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf0,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_bf8_bf8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xf0,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf0,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_f32_16x16x32_bf8_fp8 v[0:3], v[2:3], v[4:5], v[0:3] ; encoding: [0x00,0x00,0xf1,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf1,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_bf8_fp8 a[0:3], v[2:3], v[4:5], a[0:3] ; encoding: [0x00,0x80,0xf1,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf1,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_bf8_fp8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xf1,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf1,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_f32_16x16x32_fp8_bf8 v[0:3], v[2:3], v[4:5], v[0:3] ; encoding: [0x00,0x00,0xf2,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf2,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_fp8_bf8 a[0:3], v[2:3], v[4:5], a[0:3] ; encoding: [0x00,0x80,0xf2,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf2,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_fp8_bf8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xf2,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf2,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_f32_16x16x32_fp8_fp8 v[0:3], v[2:3], v[4:5], v[0:3] ; encoding: [0x00,0x00,0xf3,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf3,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_fp8_fp8 a[0:3], v[2:3], v[4:5], a[0:3] ; encoding: [0x00,0x80,0xf3,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf3,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_16x16x32_fp8_fp8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xf3,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf3,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_f32_32x32x16_bf8_bf8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xf4,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf4,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_32x32x16_bf8_bf8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xf4,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf4,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_32x32x16_bf8_bf8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5 ; encoding: [0x00,0x80,0xf4,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf4,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_f32_32x32x16_bf8_fp8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xf5,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf5,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_32x32x16_bf8_fp8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xf5,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf5,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_32x32x16_bf8_fp8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5 ; encoding: [0x00,0x80,0xf5,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xf5,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_f32_32x32x16_fp8_bf8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xf6,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xf6,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_32x32x16_fp8_bf8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xf6,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xf6,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_f32_32x32x16_fp8_bf8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5 ; encoding: [0x00,0x80,0xf6,0xd3,0x02,0x09,0x02,0xa4]

[PATCH] D129902: [AMDGPU] Support for gfx940 fp8 conversions

2022-07-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG9fa5a6b7e8a2: [AMDGPU] Support for gfx940 fp8 conversions 
(authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129902/new/

https://reviews.llvm.org/D129902

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/CodeGenOpenCL/builtins-amdgcn-fp8.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
  llvm/lib/Target/AMDGPU/GCNSubtarget.h
  llvm/lib/Target/AMDGPU/SIInstrInfo.td
  llvm/lib/Target/AMDGPU/VOP1Instructions.td
  llvm/lib/Target/AMDGPU/VOP3Instructions.td
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll
  llvm/test/MC/AMDGPU/gfx940_asm_features.s
  llvm/test/MC/AMDGPU/gfx940_err.s
  llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt

Index: llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt
+++ llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt
@@ -263,3 +263,159 @@
 
 # GFX940: buffer_atomic_min_f64 v[4:5], off, s[8:11], s3 sc1 ; encoding: [0x00,0x80,0x40,0xe1,0x00,0x04,0x02,0x03]
 0x00,0x80,0x40,0xe1,0x00,0x04,0x02,0x03
+
+# GFX940: v_cvt_f32_bf8_e32 v1, s3; encoding: [0x03,0xaa,0x02,0x7e]
+0x03,0xaa,0x02,0x7e
+
+# GFX940: v_cvt_f32_bf8_e32 v1, 3 ; encoding: [0x83,0xaa,0x02,0x7e]
+0x83,0xaa,0x02,0x7e
+
+# GFX940: v_cvt_f32_bf8_e32 v1, v3; encoding: [0x03,0xab,0x02,0x7e]
+0x03,0xab,0x02,0x7e
+
+# GFX940: v_cvt_f32_bf8_sdwa v1, s3 src0_sel:BYTE_1 ; encoding: [0xf9,0xaa,0x02,0x7e,0x03,0x06,0x81,0x00]
+0xf9,0xaa,0x02,0x7e,0x03,0x06,0x81,0x00
+
+# GFX940: v_cvt_f32_bf8_dpp v1, v3 quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xaa,0x02,0x7e,0x03,0x58,0x00,0xff]
+0xfa,0xaa,0x02,0x7e,0x03,0x58,0x00,0xff
+
+# GFX940: v_cvt_f32_bf8_e64 v1, s3 mul:2  ; encoding: [0x01,0x00,0x95,0xd1,0x03,0x00,0x00,0x08]
+0x01,0x00,0x95,0xd1,0x03,0x00,0x00,0x08
+
+# GFX940: v_cvt_f32_bf8_sdwa v1, s3 clamp mul:2 src0_sel:BYTE_1 ; encoding: [0xf9,0xaa,0x02,0x7e,0x03,0x66,0x81,0x00]
+0xf9,0xaa,0x02,0x7e,0x03,0x66,0x81,0x00
+
+# GFX940: v_cvt_f32_bf8_e64 v1, s3 clamp  ; encoding: [0x01,0x80,0x95,0xd1,0x03,0x00,0x00,0x00]
+0x01,0x80,0x95,0xd1,0x03,0x00,0x00,0x00
+
+# GFX940: v_cvt_f32_fp8_e32 v1, s3; encoding: [0x03,0xa8,0x02,0x7e]
+0x03,0xa8,0x02,0x7e
+
+# GFX940: v_cvt_f32_fp8_e32 v1, 3 ; encoding: [0x83,0xa8,0x02,0x7e]
+0x83,0xa8,0x02,0x7e
+
+# GFX940: v_cvt_f32_fp8_e32 v1, v3; encoding: [0x03,0xa9,0x02,0x7e]
+0x03,0xa9,0x02,0x7e
+
+# GFX940: v_cvt_f32_fp8_sdwa v1, s3 src0_sel:BYTE_1 ; encoding: [0xf9,0xa8,0x02,0x7e,0x03,0x06,0x81,0x00]
+0xf9,0xa8,0x02,0x7e,0x03,0x06,0x81,0x00
+
+# GFX940: v_cvt_f32_fp8_dpp v1, v3 quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xa8,0x02,0x7e,0x03,0x58,0x00,0xff]
+0xfa,0xa8,0x02,0x7e,0x03,0x58,0x00,0xff
+
+# GFX940: v_cvt_f32_fp8_e64 v1, s3 mul:2  ; encoding: [0x01,0x00,0x94,0xd1,0x03,0x00,0x00,0x08]
+0x01,0x00,0x94,0xd1,0x03,0x00,0x00,0x08
+
+# GFX940: v_cvt_f32_fp8_sdwa v1, s3 clamp mul:2 src0_sel:BYTE_1 ; encoding: [0xf9,0xa8,0x02,0x7e,0x03,0x66,0x81,0x00]
+0xf9,0xa8,0x02,0x7e,0x03,0x66,0x81,0x00
+
+# GFX940: v_cvt_f32_fp8_e64 v1, s3 clamp  ; encoding: [0x01,0x80,0x94,0xd1,0x03,0x00,0x00,0x00]
+0x01,0x80,0x94,0xd1,0x03,0x00,0x00,0x00
+
+# GFX940: v_cvt_f32_fp8_sdwa v1, 3 src0_sel:BYTE_1 ; encoding: [0xf9,0xa8,0x02,0x7e,0x83,0x06,0x81,0x00]
+0xf9,0xa8,0x02,0x7e,0x83,0x06,0x81,0x00
+
+# GFX940: v_cvt_pk_f32_bf8_e32 v[2:3], s3 ; encoding: [0x03,0xae,0x04,0x7e]
+0x03,0xae,0x04,0x7e
+
+# GFX940: v_cvt_pk_f32_bf8_e32 v[2:3], 3  ; encoding: [0x83,0xae,0x04,0x7e]
+0x83,0xae,0x04,0x7e
+
+# GFX940: v_cvt_pk_f32_bf8_e32 v[2:3], v3 ; encoding: [0x03,0xaf,0x04,0x7e]
+0x03,0xaf,0x04,0x7e
+
+# GFX940: v_cvt_pk_f32_bf8_sdwa v[2:3], s3 src0_sel:WORD_1 ; encoding: [0xf9,0xae,0x04,0x7e,0x03,0x06,0x85,0x00]
+0xf9,0xae,0x04,0x7e,0x03,0x06,0x85,0x00
+
+# GFX940: v_cvt_pk_f32_bf8_dpp v[0:1], v3 quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf ; encoding: [0xfa,0xae,0x00,0x7e,0x03,0x58,0x00,0xff]
+0xfa,0xae,0x00,0x7e,0x03,0x58,0x00,0xff
+
+# GFX940: v_cvt_pk_f32_bf8_e64 v[2:3], s3 mul:2   ; encoding: [0x02,0x00,0x97,0xd1,0x03,0x00,0x00,0x08]
+0x02,0x00,0x97,0xd1,0x03,0x00,0x00,0x08
+
+# GFX940: v_cvt_pk_f32_bf8_sdwa v[2:3], s3 clamp mul:2 src0_sel:WORD_1 ; encoding: [0xf9,0xae,0x04,0x7e,0x03,0x66,0x85,0x00]
+0xf9,0xae,0x04,0x7e,0x03,0x66,0x85,0x00
+
+# GFX940: v_cvt_pk_f32_bf8_e64 v[2:3], s3 clamp   ; encoding: 

[PATCH] D128952: [AMDGPU] Add WMMA clang builtins

2022-06-30 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128952/new/

https://reviews.llvm.org/D128952

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D127904: [AMDGPU] gfx11 new dot instruction codegen support

2022-06-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127904/new/

https://reviews.llvm.org/D127904

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D127904: [AMDGPU] gfx11 new dot instruction codegen support

2022-06-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl:1
 // REQUIRES: amdgpu-registered-target
 

Also need positive tests like in builtins-amdgcn-dl-insts.cl.



Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:1926
 
+// f16 %r = llvm.amdgcn.fdot2.f16.f16(v2f16 %a, v2f16 %b, f16 %c, i1 %clamp)
+//   %r = %a[0] * %b[0] + %a[1] * %b[1] + %c

I do not see clamp in the definition. Make a separate comment for the last 2?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127904/new/

https://reviews.llvm.org/D127904

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D124700: [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic

2022-05-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision.
rampitec added a comment.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124700/new/

https://reviews.llvm.org/D124700

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D124700: [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic

2022-04-29 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

In D124700#3483715 , @kerbowa wrote:

> In D124700#3483633 , @rampitec 
> wrote:
>
>> In D124700#3483609 , @kerbowa 
>> wrote:
>>
>>> In D124700#3483556 , @rampitec 
>>> wrote:
>>>
 You do not handle masks other than 0 yet?
>>>
>>> We handle 0 and 1 only.
>>
>> Do you mean 1 is supported simply because it has side effects? If I 
>> understand it right you will need to remove this to support more flexible 
>> masks, right?
>
> Yes.

LGTM given that. But change imm to i32 before committing.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124700/new/

https://reviews.llvm.org/D124700

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D124700: [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic

2022-04-29 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D124700#3483609 , @kerbowa wrote:

> In D124700#3483556 , @rampitec 
> wrote:
>
>> You do not handle masks other than 0 yet?
>
> We handle 0 and 1 only.

Do you mean 1 is supported simply because it has side effects? If I understand 
it right you will need to remove this to support more flexible masks, right?




Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:219
+// MASK = 0: No instructions may be scheduled across SCHED_BARRIER.
+// MASK = 1: Non-memory, non-side-effect producing instructions may be
+//   scheduled across SCHED_BARRIER, i.e. allow ALU instructions 
to pass.

kerbowa wrote:
> rampitec wrote:
> > Since you are going to extend it I'd suggest this is -1. Then you will 
> > start carving bits outs of it. That way if someone start to use it it will 
> > still work after update.
> Since the most common use case will be to block all instruction types I 
> thought having that be MASK = 0 made the most sense. After that, we carve out 
> bits for types of instructions that should be scheduled across it.
> 
> There may be modes where we restrict certain types of memops, so we cannot 
> have MASK = 1 above changed to -1. Since this (MASK = 1) is allowing all ALU 
> across we could define which bits mean VALU/SALU/MFMA etc and use that mask 
> if you think it's better. I'm worried we won't be able to anticipate all the 
> types that we could want to be maskable. It might be better to just have a 
> single bit that can mean all ALU, or all MemOps, and so on to avoid this 
> problem.
Ok. Let it be 1.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124700/new/

https://reviews.llvm.org/D124700

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D124700: [AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic

2022-04-29 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

You do not handle masks other than 0 yet?




Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:219
+// MASK = 0: No instructions may be scheduled across SCHED_BARRIER.
+// MASK = 1: Non-memory, non-side-effect producing instructions may be
+//   scheduled across SCHED_BARRIER, i.e. allow ALU instructions 
to pass.

Since you are going to extend it I'd suggest this is -1. Then you will start 
carving bits outs of it. That way if someone start to use it it will still work 
after update.



Comment at: llvm/include/llvm/IR/IntrinsicsAMDGPU.td:222
+def int_amdgcn_sched_barrier : GCCBuiltin<"__builtin_amdgcn_sched_barrier">,
+  Intrinsic<[], [llvm_i16_ty], [ImmArg>, IntrNoMem,
+IntrHasSideEffects, IntrConvergent, 
IntrWillReturn]>;

Why not full i32? This is immediate anyway but you will have more bits for the 
future.



Comment at: llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp:213
+OutStreamer->emitRawComment(" sched_barrier mask(" +
+Twine(MI->getOperand(0).getImm()) + ")");
+  }

Use hex?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124700/new/

https://reviews.llvm.org/D124700

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D123825: clang/AMDGPU: Define macro for -munsafe-fp-atomics

2022-04-14 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123825/new/

https://reviews.llvm.org/D123825

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D122191: [AMDGPU] Support gfx940 smfmac instructions

2022-03-24 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG6e3e14f600af: [AMDGPU] Support gfx940 smfmac instructions 
(authored by rampitec).
Herald added subscribers: cfe-commits, hsmhsm.
Herald added a project: clang.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122191/new/

https://reviews.llvm.org/D122191

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
  llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
  llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
  llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
  llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
  llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
  llvm/lib/Target/AMDGPU/MCTargetDesc/SIMCCodeEmitter.cpp
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
  llvm/lib/Target/AMDGPU/SIInstrInfo.td
  llvm/lib/Target/AMDGPU/SIRegisterInfo.td
  llvm/lib/Target/AMDGPU/SISchedule.td
  llvm/lib/Target/AMDGPU/VOP3Instructions.td
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td
  llvm/lib/Target/AMDGPU/VOPInstructions.td
  llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.mfma.gfx940.mir
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll
  llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll
  llvm/test/MC/AMDGPU/mai-gfx940.s
  llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt

Index: llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
+++ llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
@@ -62,3 +62,45 @@
 
 # GFX940: v_mfma_f32_32x32x4_xf32 a[0:15], v[2:3], v[4:5], a[2:17] ; encoding: [0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04]
 0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04
+
+# GFX940: v_smfmac_f32_16x16x32_f16 v[10:13], a[2:3], v[4:7], v0 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xe2,0xd3,0x02,0x09,0x02,0x0c]
+0x0a,0x0b,0xe2,0xd3,0x02,0x09,0x02,0x0c
+
+# GFX940: v_smfmac_f32_16x16x32_f16 a[10:13], v[2:3], a[4:7], v1 ; encoding: [0x0a,0x80,0xe2,0xd3,0x02,0x09,0x06,0x14]
+0x0a,0x80,0xe2,0xd3,0x02,0x09,0x06,0x14
+
+# GFX940: v_smfmac_f32_32x32x16_f16 v[10:25], a[2:3], v[4:7], v2 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xe4,0xd3,0x02,0x09,0x0a,0x0c]
+0x0a,0x0b,0xe4,0xd3,0x02,0x09,0x0a,0x0c
+
+# GFX940: v_smfmac_f32_32x32x16_f16 a[10:25], v[2:3], a[4:7], v3 ; encoding: [0x0a,0x80,0xe4,0xd3,0x02,0x09,0x0e,0x14]
+0x0a,0x80,0xe4,0xd3,0x02,0x09,0x0e,0x14
+
+# GFX940: v_smfmac_f32_16x16x32_bf16 v[10:13], a[2:3], v[4:7], v4 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xe6,0xd3,0x02,0x09,0x12,0x0c]
+0x0a,0x0b,0xe6,0xd3,0x02,0x09,0x12,0x0c
+
+# GFX940: v_smfmac_f32_16x16x32_bf16 a[10:13], v[2:3], a[4:7], v5 ; encoding: [0x0a,0x80,0xe6,0xd3,0x02,0x09,0x16,0x14]
+0x0a,0x80,0xe6,0xd3,0x02,0x09,0x16,0x14
+
+# GFX940: v_smfmac_f32_32x32x16_bf16 v[10:25], a[2:3], v[4:7], v6 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xe8,0xd3,0x02,0x09,0x1a,0x0c]
+0x0a,0x0b,0xe8,0xd3,0x02,0x09,0x1a,0x0c
+
+# GFX940: v_smfmac_f32_32x32x16_bf16 a[10:25], v[2:3], a[4:7], v7 ; encoding: [0x0a,0x80,0xe8,0xd3,0x02,0x09,0x1e,0x14]
+0x0a,0x80,0xe8,0xd3,0x02,0x09,0x1e,0x14
+
+# GFX940: v_smfmac_f32_32x32x16_bf16 v[10:25], a[2:3], v[4:7], v8 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xe8,0xd3,0x02,0x09,0x22,0x0c]
+0x0a,0x0b,0xe8,0xd3,0x02,0x09,0x22,0x0c
+
+# GFX940: v_smfmac_f32_32x32x16_bf16 a[10:25], v[2:3], a[4:7], v9 ; encoding: [0x0a,0x80,0xe8,0xd3,0x02,0x09,0x26,0x14]
+0x0a,0x80,0xe8,0xd3,0x02,0x09,0x26,0x14
+
+# GFX940: v_smfmac_i32_16x16x64_i8 v[10:13], a[2:3], v[4:7], v10 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xea,0xd3,0x02,0x09,0x2a,0x0c]
+0x0a,0x0b,0xea,0xd3,0x02,0x09,0x2a,0x0c
+
+# GFX940: v_smfmac_i32_16x16x64_i8 a[10:13], v[2:3], a[4:7], v11 ; encoding: [0x0a,0x80,0xea,0xd3,0x02,0x09,0x2e,0x14]
+0x0a,0x80,0xea,0xd3,0x02,0x09,0x2e,0x14
+
+# GFX940: v_smfmac_i32_32x32x32_i8 v[10:25], a[2:3], v[4:7], v12 cbsz:3 abid:1 ; encoding: [0x0a,0x0b,0xec,0xd3,0x02,0x09,0x32,0x0c]
+0x0a,0x0b,0xec,0xd3,0x02,0x09,0x32,0x0c
+
+# GFX940: v_smfmac_i32_32x32x32_i8 a[10:25], v[2:3], a[4:7], v13 ; encoding: [0x0a,0x80,0xec,0xd3,0x02,0x09,0x36,0x14]
+0x0a,0x80,0xec,0xd3,0x02,0x09,0x36,0x14
Index: llvm/test/MC/AMDGPU/mai-gfx940.s
===
--- llvm/test/MC/AMDGPU/mai-gfx940.s
+++ llvm/test/MC/AMDGPU/mai-gfx940.s
@@ -459,3 +459,51 @@
 v_mfma_f32_32x32x4xf32 a[0:15], v[2:3], v[4:5], a[18:33]
 // GFX940: v_mfma_f32_32x32x4_xf32 a[0:15], v[2:3], v[4:5], a[18:33] ; encoding: [0x00,0x80,0xbf,0xd3,0x02,0x09,0x4a,0x04]
 // GFX90A: error: instruction not supported on this GPU
+
+v_smfmac_f32_16x16x32_f16 v[10:13], a[2:3], v[4:7], v0 cbsz:3 abid:1
+// GFX940: 

[PATCH] D122044: [AMDGPU] New gfx940 mfma instructions

2022-03-24 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG27439a764230: [AMDGPU] New gfx940 mfma instructions 
(authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122044/new/

https://reviews.llvm.org/D122044

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl
  clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
  llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
  llvm/lib/Target/AMDGPU/SIInstrInfo.td
  llvm/lib/Target/AMDGPU/SISchedule.td
  llvm/lib/Target/AMDGPU/VOP3PInstructions.td
  llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.mfma.gfx940.mir
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll
  llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll
  llvm/test/MC/AMDGPU/mai-gfx940.s
  llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt

Index: llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
+++ llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt
@@ -3,6 +3,24 @@
 # GFX940: v_accvgpr_write_b32 a10, s20 ; encoding: [0x0a,0x40,0xd9,0xd3,0x14,0x00,0x00,0x18]
 0x0a,0x40,0xd9,0xd3,0x14,0x00,0x00,0x18
 
+# GFX940: v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5 ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0xa4
+
+# GFX940: v_mfma_i32_16x16x32_i8 v[0:3], v[2:3], v[4:5], v[0:3] ; encoding: [0x00,0x00,0xd7,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x00,0xd7,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_i32_16x16x32_i8 a[0:3], v[2:3], v[4:5], a[0:3] ; encoding: [0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0x04]
+0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0x04
+
+# GFX940: v_mfma_i32_16x16x32_i8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0xa4]
+0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0xa4
+
 # GFX940: v_mfma_f32_32x32x4_2b_bf16 v[0:31], v[2:3], v[4:5], v[2:33] ; encoding: [0x00,0x00,0xdd,0xd3,0x02,0x09,0x0a,0x04]
 0x00,0x00,0xdd,0xd3,0x02,0x09,0x0a,0x04
 
@@ -32,3 +50,15 @@
 
 # GFX940: v_mfma_f32_16x16x16_bf16 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xe1,0xd3,0x02,0x09,0x0a,0x04]
 0x00,0x80,0xe1,0xd3,0x02,0x09,0x0a,0x04
+
+# GFX940: v_mfma_f32_16x16x8_xf32 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xbe,0xd3,0x02,0x09,0x0a,0x04]
+0x00,0x80,0xbe,0xd3,0x02,0x09,0x0a,0x04
+
+# GFX940: v_mfma_f32_16x16x8_xf32 v[0:3], v[2:3], v[4:5], v[2:5] ; encoding: [0x00,0x00,0xbe,0xd3,0x02,0x09,0x0a,0x04]
+0x00,0x00,0xbe,0xd3,0x02,0x09,0x0a,0x04
+
+# GFX940: v_mfma_f32_32x32x4_xf32 v[0:15], v[2:3], v[4:5], v[2:17] ; encoding: [0x00,0x00,0xbf,0xd3,0x02,0x09,0x0a,0x04]
+0x00,0x00,0xbf,0xd3,0x02,0x09,0x0a,0x04
+
+# GFX940: v_mfma_f32_32x32x4_xf32 a[0:15], v[2:3], v[4:5], a[2:17] ; encoding: [0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04]
+0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04
Index: llvm/test/MC/AMDGPU/mai-gfx940.s
===
--- llvm/test/MC/AMDGPU/mai-gfx940.s
+++ llvm/test/MC/AMDGPU/mai-gfx940.s
@@ -262,6 +262,54 @@
 v_mfma_f32_32x32x1f32 v[0:31], v0, v1, v[34:65] blgp:7
 // GFX940: v_mfma_f32_32x32x1_2b_f32 v[0:31], v0, v1, v[34:65] blgp:7 ; encoding: [0x00,0x00,0xc0,0xd3,0x00,0x03,0x8a,0xe4]
 
+v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15]
+// GFX940: v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0x04]
+// GFX90A: error: instruction not supported on this GPU
+
+v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15]
+// GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0x04]
+// GFX90A: error: instruction not supported on this GPU
+
+v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15]
+// GFX940: v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0x04]
+// GFX90A: error: instruction not supported on this GPU
+
+v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15]
+// GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0x04]
+// GFX90A: error: instruction not supported on this GPU
+
+v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5
+// GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] 

[PATCH] D121172: [AMDGPU] Set noclobber metadata on loads instead of cast to constant

2022-03-07 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG9eabea396814: [AMDGPU] Set noclobber metadata on loads 
instead of cast to constant (authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121172/new/

https://reviews.llvm.org/D121172

Files:
  clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu
  llvm/lib/Target/AMDGPU/AMDGPUPromoteKernelArguments.cpp
  llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll

Index: llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll
===
--- llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll
+++ llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll
@@ -11,15 +11,11 @@
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[I:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
 ; CHECK-NEXT:[[P1:%.*]] = getelementptr inbounds float**, float** addrspace(1)* [[ARG:%.*]], i32 [[I]]
-; CHECK-NEXT:[[P1_CONST:%.*]] = addrspacecast float** addrspace(1)* [[P1]] to float** addrspace(4)*
-; CHECK-NEXT:[[P2:%.*]] = load float**, float** addrspace(4)* [[P1_CONST]], align 8
-; CHECK-NEXT:[[TMP0:%.*]] = addrspacecast float** [[P2]] to float* addrspace(1)*
-; CHECK-NEXT:[[TMP1:%.*]] = addrspacecast float* addrspace(1)* [[TMP0]] to float**
-; CHECK-NEXT:[[P2_FLAT:%.*]] = addrspacecast float* addrspace(1)* [[TMP0]] to float**
-; CHECK-NEXT:[[P2_CONST:%.*]] = addrspacecast float** [[TMP1]] to float* addrspace(4)*
-; CHECK-NEXT:[[P3:%.*]] = load float*, float* addrspace(4)* [[P2_CONST]], align 8
-; CHECK-NEXT:[[TMP2:%.*]] = addrspacecast float* [[P3]] to float addrspace(1)*
-; CHECK-NEXT:store float 0.00e+00, float addrspace(1)* [[TMP2]], align 4
+; CHECK-NEXT:[[P2:%.*]] = load float**, float** addrspace(1)* [[P1]], align 8, !amdgpu.noclobber !0
+; CHECK-NEXT:[[P2_GLOBAL:%.*]] = addrspacecast float** [[P2]] to float* addrspace(1)*
+; CHECK-NEXT:[[P3:%.*]] = load float*, float* addrspace(1)* [[P2_GLOBAL]], align 8, !amdgpu.noclobber !0
+; CHECK-NEXT:[[P3_GLOBAL:%.*]] = addrspacecast float* [[P3]] to float addrspace(1)*
+; CHECK-NEXT:store float 0.00e+00, float addrspace(1)* [[P3_GLOBAL]], align 4
 ; CHECK-NEXT:ret void
 ;
 entry:
@@ -41,11 +37,9 @@
 ; CHECK-NEXT:[[I:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
 ; CHECK-NEXT:[[P1:%.*]] = getelementptr inbounds float*, float* addrspace(1)* [[ARG_GLOBAL]], i32 [[I]]
 ; CHECK-NEXT:[[P1_CAST:%.*]] = bitcast float* addrspace(1)* [[P1]] to i32* addrspace(1)*
-; CHECK-NEXT:[[TMP0:%.*]] = addrspacecast i32* addrspace(1)* [[P1_CAST]] to i32**
-; CHECK-NEXT:[[P1_CAST_CONST:%.*]] = addrspacecast i32** [[TMP0]] to i32* addrspace(4)*
-; CHECK-NEXT:[[P2:%.*]] = load i32*, i32* addrspace(4)* [[P1_CAST_CONST]], align 8
-; CHECK-NEXT:[[TMP1:%.*]] = addrspacecast i32* [[P2]] to i32 addrspace(1)*
-; CHECK-NEXT:store i32 0, i32 addrspace(1)* [[TMP1]], align 4
+; CHECK-NEXT:[[P2:%.*]] = load i32*, i32* addrspace(1)* [[P1_CAST]], align 8, !amdgpu.noclobber !0
+; CHECK-NEXT:[[P2_GLOBAL:%.*]] = addrspacecast i32* [[P2]] to i32 addrspace(1)*
+; CHECK-NEXT:store i32 0, i32 addrspace(1)* [[P2_GLOBAL]], align 4
 ; CHECK-NEXT:ret void
 ;
 entry:
@@ -66,11 +60,10 @@
 ; CHECK-LABEL: @ptr_in_struct(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[P:%.*]] = getelementptr inbounds [[STRUCT_S:%.*]], [[STRUCT_S]] addrspace(1)* [[ARG:%.*]], i64 0, i32 0
-; CHECK-NEXT:[[P_CONST:%.*]] = addrspacecast float* addrspace(1)* [[P]] to float* addrspace(4)*
-; CHECK-NEXT:[[P1:%.*]] = load float*, float* addrspace(4)* [[P_CONST]], align 8
-; CHECK-NEXT:[[TMP0:%.*]] = addrspacecast float* [[P1]] to float addrspace(1)*
+; CHECK-NEXT:[[P1:%.*]] = load float*, float* addrspace(1)* [[P]], align 8, !amdgpu.noclobber !0
+; CHECK-NEXT:[[P1_GLOBAL:%.*]] = addrspacecast float* [[P1]] to float addrspace(1)*
 ; CHECK-NEXT:[[ID:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
-; CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds float, float addrspace(1)* [[TMP0]], i32 [[ID]]
+; CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds float, float addrspace(1)* [[P1_GLOBAL]], i32 [[ID]]
 ; CHECK-NEXT:store float 0.00e+00, float addrspace(1)* [[ARRAYIDX]], align 4
 ; CHECK-NEXT:ret void
 ;
@@ -97,26 +90,22 @@
 ; CHECK-NEXT:[[I:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
 ; CHECK-NEXT:[[IDXPROM:%.*]] = zext i32 [[I]] to i64
 ; CHECK-NEXT:[[ARRAYIDX10:%.*]] = getelementptr inbounds float*, float* addrspace(1)* [[ARG_GLOBAL]], i64 [[IDXPROM]]
-; CHECK-NEXT:[[TMP0:%.*]] = addrspacecast float* addrspace(1)* [[ARRAYIDX10]] to float**
-; CHECK-NEXT:[[ARRAYIDX10_CONST:%.*]] = addrspacecast float** [[TMP0]] to float* addrspace(4)*
-; 

[PATCH] D121028: [AMDGPU] new gfx940 fp atomics

2022-03-07 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG932f628121d8: [AMDGPU] new gfx940 fp atomics (authored by 
rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121028/new/

https://reviews.llvm.org/D121028

Files:
  clang/include/clang/Basic/BuiltinsAMDGPU.def
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/test/CodeGenOpenCL/builtins-amdgcn-fp-atomics-gfx90a-err.cl
  clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl
  llvm/include/llvm/IR/IntrinsicsAMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
  llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
  llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td
  llvm/lib/Target/AMDGPU/DSInstructions.td
  llvm/lib/Target/AMDGPU/FLATInstructions.td
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/test/CodeGen/AMDGPU/GlobalISel/fp-atomics-gfx940.ll
  llvm/test/CodeGen/AMDGPU/fp-atomics-gfx940.ll
  llvm/test/MC/AMDGPU/gfx940_asm_features.s
  llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt

Index: llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt
===
--- llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt
+++ llvm/test/MC/Disassembler/AMDGPU/gfx940_dasm_features.txt
@@ -15,6 +15,78 @@
 # GFX940: buffer_load_dword v5, off, s[8:11], s3 sc0 nt sc1 ; encoding: [0x00,0xc0,0x52,0xe0,0x00,0x05,0x02,0x03]
 0x00,0xc0,0x52,0xe0,0x00,0x05,0x02,0x03
 
+# GFX940: flat_atomic_add_f32 v[2:3], v1  ; encoding: [0x00,0x00,0x34,0xdd,0x02,0x01,0x00,0x00]
+0x00,0x00,0x34,0xdd,0x02,0x01,0x00,0x00
+
+# GFX940: flat_atomic_add_f32 v[2:3], a1  ; encoding: [0x00,0x00,0x34,0xdd,0x02,0x01,0x80,0x00]
+0x00,0x00,0x34,0xdd,0x02,0x01,0x80,0x00
+
+# GFX940: flat_atomic_add_f32 v4, v[2:3], v1 sc0  ; encoding: [0x00,0x00,0x35,0xdd,0x02,0x01,0x00,0x04]
+0x00,0x00,0x35,0xdd,0x02,0x01,0x00,0x04
+
+# GFX940: flat_atomic_add_f32 a4, v[2:3], a1 sc0  ; encoding: [0x00,0x00,0x35,0xdd,0x02,0x01,0x80,0x04]
+0x00,0x00,0x35,0xdd,0x02,0x01,0x80,0x04
+
+# GFX940: flat_atomic_pk_add_f16 v4, v[2:3], v1 sc0 ; encoding: [0x00,0x00,0x39,0xdd,0x02,0x01,0x00,0x04]
+0x00,0x00,0x39,0xdd,0x02,0x01,0x00,0x04
+
+# GFX940: flat_atomic_pk_add_f16 a4, v[2:3], a1 sc0 ; encoding: [0x00,0x00,0x39,0xdd,0x02,0x01,0x80,0x04]
+0x00,0x00,0x39,0xdd,0x02,0x01,0x80,0x04
+
+# GFX940: flat_atomic_pk_add_f16 v[2:3], v1   ; encoding: [0x00,0x00,0x38,0xdd,0x02,0x01,0x00,0x00]
+0x00,0x00,0x38,0xdd,0x02,0x01,0x00,0x00
+
+# GFX940: flat_atomic_pk_add_f16 v[2:3], a1   ; encoding: [0x00,0x00,0x38,0xdd,0x02,0x01,0x80,0x00]
+0x00,0x00,0x38,0xdd,0x02,0x01,0x80,0x00
+
+# GFX940: flat_atomic_pk_add_bf16 v4, v[2:3], v1 sc0 ; encoding: [0x00,0x00,0x49,0xdd,0x02,0x01,0x00,0x04]
+0x00,0x00,0x49,0xdd,0x02,0x01,0x00,0x04
+
+# GFX940: flat_atomic_pk_add_bf16 a4, v[2:3], a1 sc0 ; encoding: [0x00,0x00,0x49,0xdd,0x02,0x01,0x80,0x04]
+0x00,0x00,0x49,0xdd,0x02,0x01,0x80,0x04
+
+# GFX940: flat_atomic_pk_add_bf16 v[2:3], v1  ; encoding: [0x00,0x00,0x48,0xdd,0x02,0x01,0x00,0x00]
+0x00,0x00,0x48,0xdd,0x02,0x01,0x00,0x00
+
+# GFX940: flat_atomic_pk_add_bf16 v[2:3], a1  ; encoding: [0x00,0x00,0x48,0xdd,0x02,0x01,0x80,0x00]
+0x00,0x00,0x48,0xdd,0x02,0x01,0x80,0x00
+
+# GFX940: global_atomic_pk_add_bf16 v4, v[2:3], v1, off sc0 ; encoding: [0x00,0x80,0x49,0xdd,0x02,0x01,0x7f,0x04]
+0x00,0x80,0x49,0xdd,0x02,0x01,0x7f,0x04
+
+# GFX940: global_atomic_pk_add_bf16 a4, v[2:3], a1, off sc0 ; encoding: [0x00,0x80,0x49,0xdd,0x02,0x01,0xff,0x04]
+0x00,0x80,0x49,0xdd,0x02,0x01,0xff,0x04
+
+# GFX940: global_atomic_pk_add_bf16 v[2:3], v1, off ; encoding: [0x00,0x80,0x48,0xdd,0x02,0x01,0x7f,0x00]
+0x00,0x80,0x48,0xdd,0x02,0x01,0x7f,0x00
+
+# GFX940: global_atomic_pk_add_bf16 v[2:3], a1, off ; encoding: [0x00,0x80,0x48,0xdd,0x02,0x01,0xff,0x00]
+0x00,0x80,0x48,0xdd,0x02,0x01,0xff,0x00
+
+# GFX940: ds_pk_add_f16 v2, v1; encoding: [0x00,0x00,0x2e,0xd8,0x02,0x01,0x00,0x00]
+0x00,0x00,0x2e,0xd8,0x02,0x01,0x00,0x00
+
+# GFX940: ds_pk_add_f16 v2, a1; encoding: [0x00,0x00,0x2e,0xda,0x02,0x01,0x00,0x00]
+0x00,0x00,0x2e,0xda,0x02,0x01,0x00,0x00
+
+# GFX940: ds_pk_add_rtn_f16 v3, v2, v1; encoding: [0x00,0x00,0x6e,0xd9,0x02,0x01,0x00,0x03]
+0x00,0x00,0x6e,0xd9,0x02,0x01,0x00,0x03
+
+# GFX940: ds_pk_add_rtn_f16 a3, v2, a1; encoding: [0x00,0x00,0x6e,0xdb,0x02,0x01,0x00,0x03]
+0x00,0x00,0x6e,0xdb,0x02,0x01,0x00,0x03
+
+# GFX940: ds_pk_add_bf16 v2, v1   ; encoding: [0x00,0x00,0x30,0xd8,0x02,0x01,0x00,0x00]
+0x00,0x00,0x30,0xd8,0x02,0x01,0x00,0x00
+
+# GFX940: ds_pk_add_bf16 v2, a1   ; encoding: [0x00,0x00,0x30,0xda,0x02,0x01,0x00,0x00]
+0x00,0x00,0x30,0xda,0x02,0x01,0x00,0x00
+
+# GFX940: ds_pk_add_rtn_bf16 v3, v2, v1   ; 

[PATCH] D120846: [AMDGPU] Add gfx1036 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120846/new/

https://reviews.llvm.org/D120846

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120846: [AMDGPU] Add gfx1036 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

Looks like cuda-bad-arch.cu does not have any gfx10. Let's fix this in a 
separate followup patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120846/new/

https://reviews.llvm.org/D120846

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120846: [AMDGPU] Add gfx1036 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

You also need to rebase it, I have just landed gfx940 target.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120846/new/

https://reviews.llvm.org/D120846

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120846: [AMDGPU] Add gfx1036 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

Please also update these 2 files:

clang/test/Driver/cuda-bad-arch.cu
openmp/libomptarget/DeviceRTL/CMakeLists.txt

In fact the last one was not updated before too, so the last target gfx1031 
there.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120846/new/

https://reviews.llvm.org/D120846

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120688: [AMDGPU] Add gfx940 target

2022-03-02 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG2e2e64df4a4f: [AMDGPU] Add gfx940 target (authored by 
rampitec).
Herald added projects: clang, OpenMP.
Herald added subscribers: openmp-commits, cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120688/new/

https://reviews.llvm.org/D120688

Files:
  clang/include/clang/Basic/Cuda.h
  clang/lib/Basic/Cuda.cpp
  clang/lib/Basic/Targets/AMDGPU.cpp
  clang/lib/Basic/Targets/NVPTX.cpp
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/CodeGenOpenCL/amdgpu-features.cl
  clang/test/Driver/amdgpu-macros.cl
  clang/test/Driver/amdgpu-mcpu.cl
  clang/test/Driver/cuda-bad-arch.cu
  clang/test/Misc/target-invalid-cpu-note.c
  llvm/docs/AMDGPUUsage.rst
  llvm/include/llvm/BinaryFormat/ELF.h
  llvm/include/llvm/Support/TargetParser.h
  llvm/lib/Object/ELFObjectFile.cpp
  llvm/lib/ObjectYAML/ELFYAML.cpp
  llvm/lib/Support/TargetParser.cpp
  llvm/lib/Target/AMDGPU/AMDGPU.td
  llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
  llvm/lib/Target/AMDGPU/GCNProcessors.td
  llvm/lib/Target/AMDGPU/GCNSubtarget.h
  llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
  llvm/test/CodeGen/AMDGPU/directive-amdgcn-target.ll
  llvm/test/CodeGen/AMDGPU/elf-header-flags-mach.ll
  llvm/test/CodeGen/AMDGPU/elf-header-flags-sramecc.ll
  llvm/test/CodeGen/AMDGPU/tid-code-object-v2-backwards-compatibility.ll
  llvm/test/MC/AMDGPU/hsa-gfx940-v3.s
  llvm/test/Object/AMDGPU/elf-header-flags-mach.yaml
  llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll
  llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
  llvm/tools/llvm-readobj/ELFDumper.cpp
  openmp/libomptarget/DeviceRTL/CMakeLists.txt

Index: openmp/libomptarget/DeviceRTL/CMakeLists.txt
===
--- openmp/libomptarget/DeviceRTL/CMakeLists.txt
+++ openmp/libomptarget/DeviceRTL/CMakeLists.txt
@@ -96,7 +96,7 @@
   endif()
 endforeach()
 
-set(amdgpu_mcpus gfx700 gfx701 gfx801 gfx803 gfx900 gfx902 gfx906 gfx908 gfx90a gfx90c gfx1010 gfx1030 gfx1031)
+set(amdgpu_mcpus gfx700 gfx701 gfx801 gfx803 gfx900 gfx902 gfx906 gfx908 gfx90a gfx90c gfx940 gfx1010 gfx1030 gfx1031)
 if (DEFINED LIBOMPTARGET_AMDGCN_GFXLIST)
   set(amdgpu_mcpus ${LIBOMPTARGET_AMDGCN_GFXLIST})
 endif()
Index: llvm/tools/llvm-readobj/ELFDumper.cpp
===
--- llvm/tools/llvm-readobj/ELFDumper.cpp
+++ llvm/tools/llvm-readobj/ELFDumper.cpp
@@ -1527,6 +1527,7 @@
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX909),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX90A),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX90C),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX940),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1010),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1011),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1012),
@@ -1581,6 +1582,7 @@
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX909),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX90A),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX90C),
+  LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX940),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1010),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1011),
   LLVM_READOBJ_ENUM_ENT(ELF, EF_AMDGPU_MACH_AMDGCN_GFX1012),
Index: llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
===
--- llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
+++ llvm/test/tools/llvm-readobj/ELF/amdgpu-elf-headers.test
@@ -196,6 +196,15 @@
 # RUN: yaml2obj %s -o %t -DABI_VERSION=2 -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX90C
 # RUN: llvm-readobj -h %t | FileCheck %s --check-prefixes=ALL,KNOWN-ABI-VERSION,SINGLE-FLAG --match-full-lines -DABI_VERSION=2 -DFILE=%t -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX90C -DFLAG_VALUE=0x32
 
+# RUN: yaml2obj %s -o %t -DABI_VERSION=0 -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX940
+# RUN: llvm-readobj -h %t | FileCheck %s --check-prefixes=ALL,KNOWN-ABI-VERSION,SINGLE-FLAG --match-full-lines -DABI_VERSION=0 -DFILE=%t -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX940 -DFLAG_VALUE=0x40
+
+# RUN: yaml2obj %s -o %t -DABI_VERSION=1 -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX940
+# RUN: llvm-readobj -h %t | FileCheck %s --check-prefixes=ALL,KNOWN-ABI-VERSION,SINGLE-FLAG --match-full-lines -DABI_VERSION=1 -DFILE=%t -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX940 -DFLAG_VALUE=0x40
+
+# RUN: yaml2obj %s -o %t -DABI_VERSION=2 -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX940
+# RUN: llvm-readobj -h %t | FileCheck %s --check-prefixes=ALL,KNOWN-ABI-VERSION,SINGLE-FLAG --match-full-lines -DABI_VERSION=2 -DFILE=%t -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX940 -DFLAG_VALUE=0x40
+
 # RUN: yaml2obj %s -o %t -DABI_VERSION=0 -DFLAG_NAME=EF_AMDGPU_MACH_AMDGCN_GFX1010
 # RUN: 

[PATCH] D119886: [AMDGPU] Promote recursive loads from kernel argument to constant

2022-02-17 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGb0aa1946dfe1: [AMDGPU] Promote recursive loads from kernel 
argument to constant (authored by rampitec).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119886/new/

https://reviews.llvm.org/D119886

Files:
  clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu
  llvm/lib/Target/AMDGPU/AMDGPUPromoteKernelArguments.cpp
  llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll

Index: llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll
===
--- llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll
+++ llvm/test/CodeGen/AMDGPU/promote-kernel-arguments.ll
@@ -11,11 +11,15 @@
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[I:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
 ; CHECK-NEXT:[[P1:%.*]] = getelementptr inbounds float**, float** addrspace(1)* [[ARG:%.*]], i32 [[I]]
-; CHECK-NEXT:[[P2:%.*]] = load float**, float** addrspace(1)* [[P1]], align 8
-; CHECK-NEXT:[[P2_GLOBAL:%.*]] = addrspacecast float** [[P2]] to float* addrspace(1)*
-; CHECK-NEXT:[[P3:%.*]] = load float*, float* addrspace(1)* [[P2_GLOBAL]], align 8
-; CHECK-NEXT:[[P3_GLOBAL:%.*]] = addrspacecast float* [[P3]] to float addrspace(1)*
-; CHECK-NEXT:store float 0.00e+00, float addrspace(1)* [[P3_GLOBAL]], align 4
+; CHECK-NEXT:[[P1_CONST:%.*]] = addrspacecast float** addrspace(1)* [[P1]] to float** addrspace(4)*
+; CHECK-NEXT:[[P2:%.*]] = load float**, float** addrspace(4)* [[P1_CONST]], align 8
+; CHECK-NEXT:[[TMP0:%.*]] = addrspacecast float** [[P2]] to float* addrspace(1)*
+; CHECK-NEXT:[[TMP1:%.*]] = addrspacecast float* addrspace(1)* [[TMP0]] to float**
+; CHECK-NEXT:[[P2_FLAT:%.*]] = addrspacecast float* addrspace(1)* [[TMP0]] to float**
+; CHECK-NEXT:[[P2_CONST:%.*]] = addrspacecast float** [[TMP1]] to float* addrspace(4)*
+; CHECK-NEXT:[[P3:%.*]] = load float*, float* addrspace(4)* [[P2_CONST]], align 8
+; CHECK-NEXT:[[TMP2:%.*]] = addrspacecast float* [[P3]] to float addrspace(1)*
+; CHECK-NEXT:store float 0.00e+00, float addrspace(1)* [[TMP2]], align 4
 ; CHECK-NEXT:ret void
 ;
 entry:
@@ -37,9 +41,11 @@
 ; CHECK-NEXT:[[I:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
 ; CHECK-NEXT:[[P1:%.*]] = getelementptr inbounds float*, float* addrspace(1)* [[ARG_GLOBAL]], i32 [[I]]
 ; CHECK-NEXT:[[P1_CAST:%.*]] = bitcast float* addrspace(1)* [[P1]] to i32* addrspace(1)*
-; CHECK-NEXT:[[P2:%.*]] = load i32*, i32* addrspace(1)* [[P1_CAST]], align 8
-; CHECK-NEXT:[[P2_GLOBAL:%.*]] = addrspacecast i32* [[P2]] to i32 addrspace(1)*
-; CHECK-NEXT:store i32 0, i32 addrspace(1)* [[P2_GLOBAL]], align 4
+; CHECK-NEXT:[[TMP0:%.*]] = addrspacecast i32* addrspace(1)* [[P1_CAST]] to i32**
+; CHECK-NEXT:[[P1_CAST_CONST:%.*]] = addrspacecast i32** [[TMP0]] to i32* addrspace(4)*
+; CHECK-NEXT:[[P2:%.*]] = load i32*, i32* addrspace(4)* [[P1_CAST_CONST]], align 8
+; CHECK-NEXT:[[TMP1:%.*]] = addrspacecast i32* [[P2]] to i32 addrspace(1)*
+; CHECK-NEXT:store i32 0, i32 addrspace(1)* [[TMP1]], align 4
 ; CHECK-NEXT:ret void
 ;
 entry:
@@ -60,10 +66,11 @@
 ; CHECK-LABEL: @ptr_in_struct(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[P:%.*]] = getelementptr inbounds [[STRUCT_S:%.*]], [[STRUCT_S]] addrspace(1)* [[ARG:%.*]], i64 0, i32 0
-; CHECK-NEXT:[[P1:%.*]] = load float*, float* addrspace(1)* [[P]], align 8
-; CHECK-NEXT:[[P1_GLOBAL:%.*]] = addrspacecast float* [[P1]] to float addrspace(1)*
+; CHECK-NEXT:[[P_CONST:%.*]] = addrspacecast float* addrspace(1)* [[P]] to float* addrspace(4)*
+; CHECK-NEXT:[[P1:%.*]] = load float*, float* addrspace(4)* [[P_CONST]], align 8
+; CHECK-NEXT:[[TMP0:%.*]] = addrspacecast float* [[P1]] to float addrspace(1)*
 ; CHECK-NEXT:[[ID:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
-; CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds float, float addrspace(1)* [[P1_GLOBAL]], i32 [[ID]]
+; CHECK-NEXT:[[ARRAYIDX:%.*]] = getelementptr inbounds float, float addrspace(1)* [[TMP0]], i32 [[ID]]
 ; CHECK-NEXT:store float 0.00e+00, float addrspace(1)* [[ARRAYIDX]], align 4
 ; CHECK-NEXT:ret void
 ;
@@ -80,7 +87,14 @@
 
 ; GCN-LABEL: flat_ptr_arg:
 ; GCN-COUNT-2: global_load_dwordx2
-; GCN: global_load_dwordx4
+
+; FIXME: First load is in the constant address space and second is in global
+;because it is clobbered by store. GPU load store vectorizer cannot
+;combine them. Note, this does not happen with -O3 because loads are
+;vectorized in pairs earlier and stay in the global address space.
+
+; GCN: global_load_dword v{{[0-9]+}}, [[PTR:v\[[0-9:]+\]]], off{{$}}
+; GCN: global_load_dwordx3 v[{{[0-9:]+}}], 

[PATCH] D115032: [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args

2021-12-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115032/new/

https://reviews.llvm.org/D115032

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112041: [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.

2021-10-20 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D112041#3074418 , @hliao wrote:

> In D112041#3073676 , @rampitec 
> wrote:
>
>> In D112041#3073637 , @hliao wrote:
>>
>>> In D112041#3073560 , @rampitec 
>>> wrote:
>>>
 Is there anything to remove assume() call after address space is inferred? 
 We do not need it anymore.
>>>
>>> along with a few other intrinsics, assume intrinsic is discarded in SDAG 
>>> and GISel.
>>
>> We may want to discard these earlier for the sake of Value::hasOneUse(). 
>> These are really not needed after casts are inserted.
>
> That sounds reasonable but may face the limit due to the fact that we run 
> addrspace inferring several times (if my  memory is right, 3 times if the 
> backend one is counted.) Among them, we expect pointer values are prompted 
> from memory to register so that we could infer addrspace for them further. If 
> we remove these assume intrinsic earlier, there would be the risk that later 
> addrspace inferring may not be able to leverage those assumptions.
> NVPTX runs that inferring just once. But after that, there would be too much 
> optimizations at the IR level.
>
> I checked the most hasOneUse() usage in IR passes. Most of them are not 
> applied to pointer arithmetic. Only a few cases are applied to pointers but 
> also have quite limited conditions. I would expect we may need to enhance 
> them case by case if we found real cases where the extra use from assume 
> intrinsic makes code quality worse.

Thanks, that sounds reasonable. If we start seeing problems because of this we 
can always remove these later. Conceptually LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112041/new/

https://reviews.llvm.org/D112041

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112041: [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.

2021-10-19 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D112041#3073637 , @hliao wrote:

> In D112041#3073560 , @rampitec 
> wrote:
>
>> Is there anything to remove assume() call after address space is inferred? 
>> We do not need it anymore.
>
> along with a few other intrinsics, assume intrinsic is discarded in SDAG and 
> GISel.

We may want to discard these earlier for the sake of Value::hasOneUse(). These 
are really not needed after casts are inserted.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112041/new/

https://reviews.llvm.org/D112041

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D112041: [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.

2021-10-19 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

Is there anything to remove assume() call after address space is inferred? We 
do not need it anymore.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112041/new/

https://reviews.llvm.org/D112041

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12194
 
-  return (fpModeMatchesGlobalFPAtomicMode(RMW) ||
-  RMW->getFunction()
-  ->getFnAttribute("amdgpu-unsafe-fp-atomics")
-  .getValueAsString() == "true")
- ? AtomicExpansionKind::None
- : AtomicExpansionKind::CmpXChg;
+  if (fpModeMatchesGlobalFPAtomicMode(RMW) ||
+  RMW->getFunction()

```
  if (fpModeMatchesGlobalFPAtomicMode(RMW))
return AtomicExpansionKind::None;

  return RMW->getFunction()
 ->getFnAttribute("amdgpu-unsafe-fp-atomics")
 .getValueAsString() == "true"
 ? ReportUnsafeHWInst(AtomicExpansionKind::None)
 : AtomicExpansionKind::CmpXChg;
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108150/new/

https://reviews.llvm.org/D108150

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D108150#2950479 , @gandhi21299 
wrote:

> My understanding is that since we are reporting unsafe expansion into hw 
> instructions, `fpModeMatchesGlobalFPAtomicMode(RMW)` must be false to match 
> the logic.

Please run check-llvm before updating the patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108150/new/

https://reviews.llvm.org/D108150

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D108150#2950458 , @gandhi21299 
wrote:

> @rampitec Which part of the logic is wrong?

Still the same around LDS.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108150/new/

https://reviews.llvm.org/D108150

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec requested changes to this revision.
rampitec added a comment.
This revision now requires changes to proceed.

Logic is still wrong.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108150/new/

https://reviews.llvm.org/D108150

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-18 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12195
 
+  if (!fpModeMatchesGlobalFPAtomicMode(RMW))
+return reportUnsafeHWInst(RMW, AtomicExpansionKind::None);

gandhi21299 wrote:
> rampitec wrote:
> > rampitec wrote:
> > > This is wrong. Condition is inverted and essentially tests should fail. 
> > > Make sure you can pass testing before posting a diff.
> > Unresolved.
> Remarks are produced if `fpModeMatchesGlobalFPAtomicMode(RMW) == false`
But you have changed what function was doing. It was returning CmpXChg.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108150/new/

https://reviews.llvm.org/D108150

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-17 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:9
 
+// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu 
gfx90a \
+// RUN: -Rpass=si-lower -munsafe-fp-atomics %s -S -o - 2>&1 | \

gandhi21299 wrote:
> rampitec wrote:
> > You are compiling 2 functions with 2 different sets of options. Essentially 
> > it is unclear what are you checking because either half skips half of the 
> > remarks. Either compile a single function differently or make 2 different 
> > tests.
> I will create 2 seperate tests
You do not need to change this file anymore.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12122
 
+TargetLowering::AtomicExpansionKind SITargetLowering::reportUnsafeHWInst(
+AtomicRMWInst *RMW, TargetLowering::AtomicExpansionKind Kind) const {

It does not need to me a method, just a static function or better even lamdba.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12195
 
+  if (!fpModeMatchesGlobalFPAtomicMode(RMW))
+return reportUnsafeHWInst(RMW, AtomicExpansionKind::None);

rampitec wrote:
> This is wrong. Condition is inverted and essentially tests should fail. Make 
> sure you can pass testing before posting a diff.
Unresolved.



Comment at: llvm/test/CodeGen/AMDGPU/atomics-remarks-gfx90a.ll:108
+
+; GFX90A-HW: Hardware instruction generated for atomic fadd operation at 
memory scope system due to an unsafe request.
+; GFX90A-HW: Hardware instruction generated for atomic fadd operation at 
memory scope agent due to an unsafe request.

gandhi21299 wrote:
> rampitec wrote:
> > Does it print a function name before the diagnostics? Label checks would be 
> > useful.
> Nope, it does not.
That's pity. Then this file need to be split too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108150/new/

https://reviews.llvm.org/D108150

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions

2021-08-17 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:9
 
+// RUN: %clang_cc1 -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu 
gfx90a \
+// RUN: -Rpass=si-lower -munsafe-fp-atomics %s -S -o - 2>&1 | \

You are compiling 2 functions with 2 different sets of options. Essentially it 
is unclear what are you checking because either half skips half of the remarks. 
Either compile a single function differently or make 2 different tests.



Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:13
+
+// RUN: %clang_cc1 %s -cl-std=CL2.0 -O0 -triple=amdgcn-amd-amdhsa -target-cpu 
gfx90a \
+// RUN: -Rpass=si-lower -S -emit-llvm -o - 2>&1 | \

This line does not have -munsafe-fp-atomics option...



Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:60
+// GFX90A-HW-LABEL: @atomic_unsafe_hw
+// GFX90A-HW:   atomicrmw fadd float addrspace(1)* %{{.*}}, float %{{.*}} 
syncscope("workgroup-one-as") monotonic, align 4
+// GFX90A-HW:   atomicrmw fadd float addrspace(1)* %{{.*}}, float %{{.*}} 
syncscope("agent-one-as") monotonic, align 4

... therefor these checks must fail.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12195
 
+  if (!fpModeMatchesGlobalFPAtomicMode(RMW))
+return reportUnsafeHWInst(RMW, AtomicExpansionKind::None);

This is wrong. Condition is inverted and essentially tests should fail. Make 
sure you can pass testing before posting a diff.



Comment at: llvm/test/CodeGen/AMDGPU/atomics-remarks-gfx90a.ll:108
+
+; GFX90A-HW: Hardware instruction generated for atomic fadd operation at 
memory scope system due to an unsafe request.
+; GFX90A-HW: Hardware instruction generated for atomic fadd operation at 
memory scope agent due to an unsafe request.

Does it print a function name before the diagnostics? Label checks would be 
useful.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108150/new/

https://reviews.llvm.org/D108150

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108150: [Remarks] Emit optimization remarks for atomics generating hardware instructions

2021-08-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

- Add [AMDGPU] to the title.
- Rebase on top of D106891 .
- Add tests to atomics-remarks-gfx90a.ll as well, including LDS with matching 
and non-matching rounding mode.




Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12194
   if (!Ty->isDoubleTy())
-return AtomicExpansionKind::None;
+return reportUnsafeHWInst(RMW, AtomicExpansionKind::None);
 

This is safe.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12200
   .getValueAsString() == "true")
- ? AtomicExpansionKind::None
+ ? reportUnsafeHWInst(RMW, AtomicExpansionKind::None)
  : AtomicExpansionKind::CmpXChg;

This is safe if `fpModeMatchesGlobalFPAtomicMode(RMW)` returned true and unsafe 
if not.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108150/new/

https://reviews.llvm.org/D108150

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision.
rampitec added a comment.

LGTM, but please wait for others too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:175
 
+  ORE = std::make_unique();
   auto  = TPC->getTM();

gandhi21299 wrote:
> rampitec wrote:
> > Is there a reason to construct it upfront and not just use a local variable 
> > only when needed? Like in StackProtector.cpp for example.
> We can certainly implement it as a local variable as long as we have access 
> to the function this pass is operating on. I was thinking of its potential 
> use throughout this pass in the future.
You have access to the function, AI->getParent()->getParent().
You also will not need to pass ORE everywhere in the subsequent patch, just 
construct it in target in place.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-16 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:175
 
+  ORE = std::make_unique();
   auto  = TPC->getTM();

Is there a reason to construct it upfront and not just use a local variable 
only when needed? Like in StackProtector.cpp for example.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:33
+float atomic_cas(__global atomic_float *d, float a) {
+  return __opencl_atomic_fetch_add(d, a, memory_order_relaxed, 
memory_scope_work_group);
+}

Just combine all the calls into a single function.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-15 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl:32
+// GFX90A-CAS: atomicrmw fadd float addrspace(1)* {{.*}} 
syncscope("workgroup-one-as") monotonic
+float atomic_cas_system(__global atomic_float *d, float a) {
+  return __opencl_atomic_fetch_add(d, a, memory_order_relaxed, 
memory_scope_work_group);

It is not system as test name suggests. Just rename to atomic_cas and add calls 
with all other scopes into the same function.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

Please restore opencl test.




Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:622
+return OptimizationRemark(DEBUG_TYPE, "Passed", AI->getFunction())
+   << "A compare and swap loop was generated for an "
+   << AI->getOperationName(AI->getOperation()) << " operation at "

gandhi21299 wrote:
> arsenm wrote:
> > Missing word atomic?
> Its already part of the OperationName
Matt is right, missing "atomic" word.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

Please retitle it without AMDGPU and remove the changes to pass ORE to targets. 
It is not a part of this change, it is a part of the folloup target specific 
change.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10
+
+// GFX90A-CAS: A compare and swap loop was generated for an atomic operation 
at system memory scope
+// GFX90A-CAS-LABEL: _Z14atomic_add_casPf

gandhi21299 wrote:
> rampitec wrote:
> > gandhi21299 wrote:
> > > rampitec wrote:
> > > > gandhi21299 wrote:
> > > > > rampitec wrote:
> > > > > > Need tests for all scopes.
> > > > > `__atomic_fetch_add` does not take scope as an argument, how could I 
> > > > > add tests with different scopes?
> > > > At least in the IR test.
> > > What do you mean by that?
> > You need to test all of that. If you cannot write a proper .cu test, then 
> > write an IR test and run llc.
> Should I discard this test then since the test fp-atomics-remarks-gfx90a.ll 
> already satisfies?
CU test is still needed. You also need it in the .cl test below.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:585
+  TLI->shouldExpandAtomicRMWInIR(AI, ORE);
+  OptimizationRemark Remark(DEBUG_TYPE, "Passed", AI->getFunction());
+  switch (Kind) {

What should this "Passed" do and why wouldn't just declare it where you use it?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10
+
+// GFX90A-CAS: A compare and swap loop was generated for an atomic operation 
at system memory scope
+// GFX90A-CAS-LABEL: _Z14atomic_add_casPf

gandhi21299 wrote:
> rampitec wrote:
> > gandhi21299 wrote:
> > > rampitec wrote:
> > > > Need tests for all scopes.
> > > `__atomic_fetch_add` does not take scope as an argument, how could I add 
> > > tests with different scopes?
> > At least in the IR test.
> What do you mean by that?
You need to test all of that. If you cannot write a proper .cu test, then write 
an IR test and run llc.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:618
   expandAtomicRMWToCmpXchg(AI, createCmpXchgInstFun);
+  Ctx.getSyncScopeNames(SSNs);
+  auto MemScope = SSNs[AI->getSyncScopeID()].empty()

gandhi21299 wrote:
> rampitec wrote:
> > Only if SSNs.empty().
> Sorry, what do you mean? SSN will be empty at that point.
I thought want to cache it. But really just declare it here.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:624
+Remark << "A compare and swap loop was generated for an "
+   << AI->getOpcodeName() << "operation at " << MemScope
+   << " memory scope";

gandhi21299 wrote:
> rampitec wrote:
> > I believe getOpcodeName() will return "atomicrmw" instead of the operation. 
> > Also missing space after it.
> getOpcodeName() returns `atomicrmwoperation`, as per the tests the spacing 
> looks correct to me.
The operation to report is AI->getOperation(). Spacing is wrong, "operation" is 
your text.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10
+
+// GFX90A-CAS: A compare and swap loop was generated for an atomic operation 
at system memory scope
+// GFX90A-CAS-LABEL: _Z14atomic_add_casPf

gandhi21299 wrote:
> rampitec wrote:
> > Need tests for all scopes.
> `__atomic_fetch_add` does not take scope as an argument, how could I add 
> tests with different scopes?
At least in the IR test.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-13 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:618
   expandAtomicRMWToCmpXchg(AI, createCmpXchgInstFun);
+  Ctx.getSyncScopeNames(SSNs);
+  auto MemScope = SSNs[AI->getSyncScopeID()].empty()

Only if SSNs.empty().



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:624
+Remark << "A compare and swap loop was generated for an "
+   << AI->getOpcodeName() << "operation at " << MemScope
+   << " memory scope";

I believe getOpcodeName() will return "atomicrmw" instead of the operation. 
Also missing space after it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631
+"at "
+ << (AI->getSyncScopeID() ? "system" : "single thread")
+ << " memory scope");

gandhi21299 wrote:
> rampitec wrote:
> > gandhi21299 wrote:
> > > rampitec wrote:
> > > > gandhi21299 wrote:
> > > > > rampitec wrote:
> > > > > > That does not help with target defined scope names, such as our 
> > > > > > "one-as" for example.
> > > > > How can I get target defined scope names?
> > > > It is right on the instruction:
> > > >   %result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 
> > > > syncscope("one-as") seq_cst
> > > > 
> > > Sorry, I meant from the LLVM API.
> > LLVMContext::getSyncScopeNames()
> I think that gives me all sync scopes available for the target. If not, which 
> sync scope in the vector corresponds to the instruction I am dealing with?
https://llvm.org/doxygen/MachineOperand_8cpp_source.html#l00474


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631
+"at "
+ << (AI->getSyncScopeID() ? "system" : "single thread")
+ << " memory scope");

gandhi21299 wrote:
> rampitec wrote:
> > gandhi21299 wrote:
> > > rampitec wrote:
> > > > That does not help with target defined scope names, such as our 
> > > > "one-as" for example.
> > > How can I get target defined scope names?
> > It is right on the instruction:
> >   %result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 
> > syncscope("one-as") seq_cst
> > 
> Sorry, I meant from the LLVM API.
LLVMContext::getSyncScopeNames()


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631
+"at "
+ << (AI->getSyncScopeID() ? "system" : "single thread")
+ << " memory scope");

gandhi21299 wrote:
> rampitec wrote:
> > That does not help with target defined scope names, such as our "one-as" 
> > for example.
> How can I get target defined scope names?
It is right on the instruction:
  %result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 
syncscope("one-as") seq_cst



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:629
+  Remark << "A compare and swap loop was generated for an atomic "
+"operation "
+"at "

Need to name the operation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

You also need to retitle it now, it is not about AMDGPU and not about FP.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-12 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D106891#2940411 , @gandhi21299 
wrote:

> - eliminated unsafe hardware remarks in SIISelLowering.cpp

Most of this patch is not needed now. You do not need to pass ORE to targets, 
it is a part of the next patch.




Comment at: clang/test/CodeGenCUDA/fp-atomics-optremarks.cu:10
+
+// GFX90A-CAS: A compare and swap loop was generated for an atomic operation 
at system memory scope
+// GFX90A-CAS-LABEL: _Z14atomic_add_casPf

Need tests for all scopes.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:587
+OptimizationRemark Remark) {
+  ORE->emit([&]() { return Remark; });
+  return Kind;

I do not see why do you need this function and all its arguments now. You can 
just call ORE->emit() directly.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:631
+"at "
+ << (AI->getSyncScopeID() ? "system" : "single thread")
+ << " memory scope");

That does not help with target defined scope names, such as our "one-as" for 
example.



Comment at: llvm/test/CodeGen/AMDGPU/fp-atomics-remarks-gfx90a.ll:4
+
+; GFX90A-CAS: A compare and swap loop was generated for an atomic operation at 
system memory scope
+; GFX90A-CAS-LABEL: _Z14atomic_add_casPf:

You need to write tests for all scopes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-11 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D106891#2938128 , @gandhi21299 
wrote:

> @rampitec besides the remarks, am I missing anything else in the patch?

You should not use AMD specific code in the common code.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-10 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:628
+  AI, Kind,
+  Remark << "A hardware CAS loop generated: if the memory is "
+"known to be coarse-grain allocated then a hardware "

Still the same problem.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-09 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12120
+   OptimizationRemarkEmitter *ORE,
+   OptimizationRemark OptRemark) {
+  ORE->emit([&]() { return OptRemark; });

gandhi21299 wrote:
> gandhi21299 wrote:
> > rampitec wrote:
> > > gandhi21299 wrote:
> > > > rampitec wrote:
> > > > > gandhi21299 wrote:
> > > > > > rampitec wrote:
> > > > > > > Why OptRemark and not just StringRef? I really want to see as 
> > > > > > > little churn as possible at the call site.
> > > > > > With only StringRef, we would also have to pass in RMW since 
> > > > > > OptimizationRemark constructor depends on that.
> > > > > It seems better be a lamda where you can just capture most of the 
> > > > > stuff.
> > > > What would the type be for the following lambda?
> > > > 
> > > > ```
> > > > [&](){
> > > > return OptimizationRemark(...);
> > > > }
> > > > ```
> > > You need to return TargetLowering::AtomicExpansionKind, not 
> > > OptimizationRemark.
> > This is what goes into ORE->emit() as an argument though. I suspect the 
> > remark won't be emitted if I return AtomicExpansionKind within the lambda.
> @rampitec I don't think there is a way to pass the ORE->emit argument lambda 
> expression into the reportAtomicExpand() function because of its capture. It 
> needs access to RMW from the enclosing scope. (Well, I could define a copy of 
> RMW and somehow pass it into the lambda but that seems too much for this task)
Do not pass it there. Turn reportAtomicExpand into lambda.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12120
+   OptimizationRemarkEmitter *ORE,
+   OptimizationRemark OptRemark) {
+  ORE->emit([&]() { return OptRemark; });

gandhi21299 wrote:
> rampitec wrote:
> > gandhi21299 wrote:
> > > rampitec wrote:
> > > > Why OptRemark and not just StringRef? I really want to see as little 
> > > > churn as possible at the call site.
> > > With only StringRef, we would also have to pass in RMW since 
> > > OptimizationRemark constructor depends on that.
> > It seems better be a lamda where you can just capture most of the stuff.
> What would the type be for the following lambda?
> 
> ```
> [&](){
> return OptimizationRemark(...);
> }
> ```
You need to return TargetLowering::AtomicExpansionKind, not OptimizationRemark.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec requested changes to this revision.
rampitec added inline comments.
This revision now requires changes to proceed.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:588
+  Remark
+  << "A hardware CAS loop generated: if the memory is "
+ "known to be coarse-grain allocated then a hardware 
floating-point"

This is target specific. Besides there can be different reasons for a loop.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12123
+OptRemark
+<< "A hardware floating-point atomic instruction generated: "
+   "only safe if the memory is known to be coarse-grain allocated";

This is not the only reason. You cannot use a fixed string here.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12120
+   OptimizationRemarkEmitter *ORE,
+   OptimizationRemark OptRemark) {
+  ORE->emit([&]() { return OptRemark; });

gandhi21299 wrote:
> rampitec wrote:
> > Why OptRemark and not just StringRef? I really want to see as little churn 
> > as possible at the call site.
> With only StringRef, we would also have to pass in RMW since 
> OptimizationRemark constructor depends on that.
It seems better be a lamda where you can just capture most of the stuff.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-06 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/test/CodeGenOpenCL/fp-atomics-optremarks-gfx90a.cl:23
+
+// GFX90A-HW: A floating-point atomic instruction will generate an unsafe 
hardware instruction which may fail to update memory [-Rpass=si-lower]
+// GFX90A-HW-LABEL: test_atomic_add

Should check other cases too. Essentially a check per every distinct emitted 
remark.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12120
+   OptimizationRemarkEmitter *ORE,
+   OptimizationRemark OptRemark) {
+  ORE->emit([&]() { return OptRemark; });

Why OptRemark and not just StringRef? I really want to see as little churn as 
possible at the call site.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12173
+  if (RMW->use_empty()) {
+if (RMW->getFunction()
+->getFnAttribute("amdgpu-unsafe-fp-atomics")

No need to check attribute. Everything below amdgpu-unsafe-fp-atomics check to 
the end of the block is unsafe. Just revert to original return and call 
reportAtomicExpand() for the AtomicExpansionKind::None case.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12118
 
-TargetLowering::AtomicExpansionKind
-SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
+TargetLowering::AtomicExpansionKind SITargetLowering::reportAtomicExpand(
+AtomicRMWInst *RMW, TargetLowering::AtomicExpansionKind Kind,

Just static, no need to expose it from SITargetLowering. Maybe even a functor 
inside shouldExpandAtomicRMWInIR ifself capturing ORE and RMW to pass less 
arguments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12165
 
-return AtomicExpansionKind::None;
+ORE->emit([&] {
+  OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());

You need to remove all of that now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12155
+  OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+  Remark << "A floating-point atomic instruction with no following use"
+" will generate an unsafe hardware instruction";

gandhi21299 wrote:
> rampitec wrote:
> > I do not understand this message about the use. We are checking the use 
> > below simply because there was no return version of global_atomic_add_f32 
> > on gfx908, so we are forced to expand it.
> right, I forgot to erase that part. How does the following look:
> 
> "A floating-point atomic instruction will generate an unsafe hardware 
> instruction"
> 
> I am not sure what other details I could put in here
In this place it might fail to update memory. But it is difficult to read and 
understand with all of that big ORE->emit blobs all over the place.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12155
+  OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+  Remark << "A floating-point atomic instruction with no following use"
+" will generate an unsafe hardware instruction";

I do not understand this message about the use. We are checking the use below 
simply because there was no return version of global_atomic_add_f32 on gfx908, 
so we are forced to expand it.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12165
 
-  return RMW->use_empty() ? AtomicExpansionKind::None
-  : AtomicExpansionKind::CmpXChg;
+  if (RMW->use_empty()) {
+if (RMW->getFunction()

That's a lot of churn. Please create a function returning AtomicExpansionKind, 
pass what you are going to return into that function, return that argument from 
the function, and also pass a string for diagnosticts to emit from there. 
Replace returns here with its calls. Like:

`return reportAtomicExpand(AtomicExpansionKind::None, ORE, "Produced HW atomic 
is unsafe and might not update memory");`



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12198
+  Remark
+  << "A floating-point atomic instruction will generate an unsafe"
+ " hardware instruction";

This one might be unsafe not because of the cache it works on, but because it 
might not follow denorm mode.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec accepted this revision.
rampitec added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12146
+OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+Remark << "A floating-point atomic instruction will generate an unsafe"
+  " hardware instruction";

It would not necessarily generate a HW instruction. There are still cases where 
we return CmpXChg.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-05 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics-unsupported-gfx7.cl:8
+}
\ No newline at end of file


Add new line.



Comment at: clang/test/CodeGenOpenCL/unsupported-fadd2f16-gfx908.cl:1
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -O0 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu 
gfx908 \

Combine all of these gfx908 error tests into a single file. For example like in 
the builtins-amdgcn-dl-insts-err.cl. It is also better to rename these test 
filenames to follow the existing pattern: builtins-amdgcn-*-err.cl


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics in GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

The title should not mention gfx90a, it is not true.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics in GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12139
+OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+Remark << "A hardware instruction was generated";
+return Remark;

gandhi21299 wrote:
> rampitec wrote:
> > It was not generated. We have multiple returns below this point. Some of 
> > them return None and some CmpXChg for various reasons. The request was to 
> > report when we produce the instruction *if* it is unsafe, not just that we 
> > are about to produce an instruction.
> > 
> > Then to make it useful a remark should tell what was the reason to either 
> > produce an instruction or expand it. Looking at a stream of remarks in a 
> > big program one would also want to understand what exactly was expanded and 
> > what was left as is. A stream of messages "A hardware instruction was 
> > generated" unlikely will help to understand what was done.
> Will the hardware instruction be generated in the end of this function then?
It will not be generated here to begin with. If the function returns None the 
atomicrmw will be just left as is and then later selected into the instruction. 
But if you read the function, it has many returns for many different reasons, 
and that is exactly what a useful remark shall report.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:201
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f32, "ff*1f", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_v2f16, "V2hV2h*1V2h", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fmin_f64, "dd*1d", "t", 
"gfx90a-insts")

gandhi21299 wrote:
> I tried add target feature gfx908-insts for this builtin but the frontend 
> complains that it should have target feature gfx90a-insts.
That was for global_atomic_fadd_f32, but as per discussion we are going to use 
builtin only starting from gfx90a because of the noret problem. Comments in the 
review are off their positions after multiple patch updates.



Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:210
+TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f64, "dd*3d", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f32, "ff*3f", "t", "gfx8-insts")
+

This needs tests with a gfx8 target and a negative test with gfx7.



Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl:13
+// GFX90A:  global_atomic_add_f64
+void test_global_add(__global double *addr, double x) {
+  double *rtn;

Use _f64 or _double in the test name.



Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl:31
+// GFX90A:  global_atomic_min_f64
+void test_global_global_min(__global double *addr, double x){
+  double *rtn;

Same here and in other double tests, use a suffix f64 or double.



Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl:67
+// GFX90A:  flat_atomic_min_f64
+void test_flat_min_constant(__generic double *addr, double x){
+  double *rtn;

'constant' is wrong. It is flat. Here and everywhere.



Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics-unsupported-gfx908.cl:7
+  double *rtn;
+  *rtn = __builtin_amdgcn_global_atomic_fadd_f64(addr, x); // 
expected-error{{'__builtin_amdgcn_global_atomic_fadd_f64' needs target feature 
gfx90a-insts}}
+}

Need to check all other buintins too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

JBTW, patch title is way too long.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D106891#2925692 , @gandhi21299 
wrote:

> - eliminated the scope argument as per discussion
> - added more tests

You have updated wrong patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:595
+  OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+  Remark << "A hardware instruction was generated";
+  return Remark;

Nothing was generated just yet, pass just left IR instruction untouched. In a 
common case we cannot say what an abstract BE will do about it later.



Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12139
+OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+Remark << "A hardware instruction was generated";
+return Remark;

It was not generated. We have multiple returns below this point. Some of them 
return None and some CmpXChg for various reasons. The request was to report 
when we produce the instruction *if* it is unsafe, not just that we are about 
to produce an instruction.

Then to make it useful a remark should tell what was the reason to either 
produce an instruction or expand it. Looking at a stream of remarks in a big 
program one would also want to understand what exactly was expanded and what 
was left as is. A stream of messages "A hardware instruction was generated" 
unlikely will help to understand what was done.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

It still does not do anything useful and still produces useless, wrong and 
misleading remarks for all targets.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16270
+llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
+return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
+  }

gandhi21299 wrote:
> rampitec wrote:
> > gandhi21299 wrote:
> > > rampitec wrote:
> > > > Should we map flags since we already have them?
> > > Do you mean the memory order flag?
> > All 3: ordering, scope and volatile.
> Following the discussion, what change is required here?
Keep zeroes, drop immediate argument of the builtins.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-04 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D106909#2923724 , @gandhi21299 
wrote:

> @rampitec what should I be testing exactly in the IR test?

Produced call to the intrinsic. All of these tests there doing that.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D106909#2922567 , @gandhi21299 
wrote:

> @rampitec how do I handle the following?
>
>   builtins-fp-atomics.cl:38:10: error: 
> '__builtin_amdgcn_global_atomic_fadd_f64' needs target feature 
> atomic-fadd-insts
> *rtn = __builtin_amdgcn_global_atomic_fadd_f64(addr, x, 
> memory_order_relaxed);
>^

It is f64, it needs gfx90a-insts. atomic-fadd-insts is for global f32.




Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16270
+llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
+return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
+  }

gandhi21299 wrote:
> rampitec wrote:
> > Should we map flags since we already have them?
> Do you mean the memory order flag?
All 3: ordering, scope and volatile.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D106891#2921108 , @gandhi21299 
wrote:

> How can I construct an ORE to start off with? I don't think its appropriate 
> to construct it in `shouldExpandAtomicRMWInsts(RMW)`

You have already constructed it. You can just pass it to 
`shouldExpandAtomicRMWInsts`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D106891#2921096 , @gandhi21299 
wrote:

> @rampitec Since remarks cannot be emitted in SIISelLowering because it isn't 
> a pass, in what form can I emit the diagnostics in SIISelLowering?

You could pass ORE to the TLI.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16270
+llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
+return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
+  }

Should we map flags since we already have them?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
+Intrinsic::ID IID;
+llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());

arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > rampitec wrote:
> > > > gandhi21299 wrote:
> > > > > rampitec wrote:
> > > > > > You do not need any of that code. You can directly map a builtin to 
> > > > > > intrinsic in the IntrinsicsAMDGPU.td.
> > > > > Sorry, I looked around for several days but I could not figure this 
> > > > > out. Is there a concrete example?
> > > > Every instantiation of `GCCBuiltin` in the `IntrinsicsAMDGPU.td`.
> > > This is not true if the intrinsic requires type mangling. GCCBuiltin is 
> > > too simple to handle it
> > Yes, but these do not need it. All of these builtins are specific.
> These intrinsics are all mangled based on the FP type
Ah, right. Intrinsics are mangled, builtins not. True. OK, this shall be code 
then.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
+Intrinsic::ID IID;
+llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());

arsenm wrote:
> rampitec wrote:
> > gandhi21299 wrote:
> > > rampitec wrote:
> > > > You do not need any of that code. You can directly map a builtin to 
> > > > intrinsic in the IntrinsicsAMDGPU.td.
> > > Sorry, I looked around for several days but I could not figure this out. 
> > > Is there a concrete example?
> > Every instantiation of `GCCBuiltin` in the `IntrinsicsAMDGPU.td`.
> This is not true if the intrinsic requires type mangling. GCCBuiltin is too 
> simple to handle it
Yes, but these do not need it. All of these builtins are specific.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
+Intrinsic::ID IID;
+llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());

gandhi21299 wrote:
> rampitec wrote:
> > You do not need any of that code. You can directly map a builtin to 
> > intrinsic in the IntrinsicsAMDGPU.td.
> Sorry, I looked around for several days but I could not figure this out. Is 
> there a concrete example?
Every instantiation of `GCCBuiltin` in the `IntrinsicsAMDGPU.td`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec added a comment.

In D106891#2921048 , @gandhi21299 
wrote:

> @rampitec should the unsafe check go in some pass later in the pipeline then?

No. The only place which has all the knowledge is 
`SITargetLowering::shouldExpandAtomicRMWInIR()`. That is where diagnostics 
shall be emitted.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec requested changes to this revision.
rampitec added a comment.
This revision now requires changes to proceed.

You cannot do it in a generic llvm code, it simply has no knowledge of what was 
the reason for BE's choice.




Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:598
+  OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
+  Remark << "An unsafe hardware instruction was generated.";
+  return Remark;

arsenm wrote:
> Unsafe is misleading, plus this is being too specific to AMDGPU
Having UnsafeFPAtomicFlag does not automatically mean a HW instruction produced 
is unsafe. Moreover, you simply cannot know why this or that decision was done 
by a target method here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106891/new/

https://reviews.llvm.org/D106891

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Stanislav Mekhanoshin via Phabricator via cfe-commits
rampitec requested changes to this revision.
rampitec added a comment.
This revision now requires changes to proceed.

Needs an IR test, a test for different supported targets, and a negative test 
for unsupported features.




Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:199
 
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f64, "dd*1di", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f32, "ff*1fi", "t", 
"gfx90a-insts")

Correct attribute for this one in atomic-fadd-insts. In particular it was first 
added in gfx908 and you would need to test it too.



Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:205
+
+TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_f64, "dd*1di", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fmin_f64, "dd*1di", "t", 
"gfx90a-insts")

Flat address space is 0.



Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:210
+TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f64, "dd*3di", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f32, "ff*3fi", "t", 
"gfx90a-insts")
+

This is available since gfx8. Attribute gfx8-insts.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
+Intrinsic::ID IID;
+llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());

You do not need any of that code. You can directly map a builtin to intrinsic 
in the IntrinsicsAMDGPU.td.



Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics.cl:112
+kernel void test_flat_global_max(__global double *addr, double x){
+  __builtin_amdgcn_flat_atomic_fmax_f64(addr, x, memory_order_relaxed);
+}

arsenm wrote:
> gandhi21299 wrote:
> > arsenm wrote:
> > > If you're going to bother testing the ISA, is it worth testing rtn and no 
> > > rtn versions?
> > Sorry, what do you mean by rtn version?
> Most atomics can be optimized if they don't return the in memory value if the 
> value is unused
Certainly yes, because global_atomic_add_f32 did not have return version on 
gfx908.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   >