[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-22 Thread Changpeng Fang via cfe-commits

changpeng wrote:

I am going to propose to rename intrinsics and remove f16/bf16 versions of 
builtins/intrinsics

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-22 Thread Changpeng Fang via cfe-commits

https://github.com/changpeng closed 
https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-22 Thread Changpeng Fang via cfe-commits

changpeng wrote:

[AMD Official Use Only - General]

I am fine to remove f16/bf16 versions. Enumerating all possible types could be 
very painful. For example we gave up enumerating for B64, and ended up using 
v2i32 only. What do others think removing f16/bf16 versions? Thanks

Get Outlook for iOS

From: Matt Arsenault ***@***.***>
Sent: Friday, March 22, 2024 3:45:53 AM
To: llvm/llvm-project ***@***.***>
Cc: Fang, Changpeng ***@***.***>; Author ***@***.***>
Subject: Re: [llvm/llvm-project] AMDGPU: Rename and add bf16 support for 
global_load_tr builtins (PR #86202)

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.


@arsenm commented on this pull request.



In 
clang/include/clang/Basic/BuiltinsAMDGPU.def:

> -TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v2i32, "V2iV2i*1", "nc", 
> "gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
-
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8bf16, "V8yV8y*1", "nc", 
"gfx12-insts,wavefrontsize32")
+
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4bf16, "V4yV4y*1", "nc", 
"gfx12-insts,wavefrontsize64")


Do we really need the f16/bf16 versions? You can always bitcast the i16 
versions.

—
Reply to this email directly, view it on 
GitHub,
 or 
unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>


https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-22 Thread Matt Arsenault via cfe-commits


@@ -432,13 +432,15 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vi", 
"n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "b", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts")
 
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
-
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8bf16, "V8yV8y*1", "nc", 
"gfx12-insts,wavefrontsize32")
+
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4bf16, "V4yV4y*1", "nc", 
"gfx12-insts,wavefrontsize64")

arsenm wrote:

Do we really need the f16/bf16 versions? You can always bitcast the i16 
versions. 

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-22 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-22 Thread Matt Arsenault via cfe-commits

arsenm wrote:

> > I don't think intrinsics are meant for users. Builtins are the user-facing 
> > front. :-)
> 
> Depending on who you consider an user. Are folks writing MLIR generators 
> users?

They're consumers of an unstable API, changing intrinsics is fine 

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-22 Thread Piotr Sobczak via cfe-commits

piotrAMD wrote:

The change LG - thanks for adding support for bf16.

Agreed that the intrinsics should match the builtins for consistency (now or in 
a follow-up commit).
These intrinsics were added for the upcoming generation - it should be fine to 
rename them at this stage.

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-22 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> I don't think intrinsics are meant for users. Builtins are the user-facing 
> front. :-)

Depending on who you consider an user. Are folks writing MLIR generators users?

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread Changpeng Fang via cfe-commits

changpeng wrote:

> I don't think intrinsics are meant for users. Builtins are the user-facing 
> front. :-)

Then renaing the intrinsics should be relatively at a lower priority. We may do 
it in a separate patch once we have reached an agreement.

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread Shilei Tian via cfe-commits

shiltian wrote:

> > > > Do you want to rename intrinsics as well? Because now intrinsic names 
> > > > do not match builtin names.
> > > 
> > > 
> > > Do we have to match builtins with intrinsics? Renaming intrinsics here 
> > > means we will have to duplicate the intrinsics.
> > 
> > 
> > Is that because of the mangling?
> > Right.  It was originally suggested to use  a single instrinsic "load_lr".  
> > But eventually we use global_load_tr to indicate this is in global address 
> > space.  If we want to rename intrinsics here, it should be 
> > global_load_tr_b64 and global_load_tr_b128.
> 
> We should rename intrinsic if users can use intrinsics directly. I think 
> use-friendly is more important.

I don't think intrinsics are meant for users. Builtins are the user-facing 
front. :-)

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread Changpeng Fang via cfe-commits

changpeng wrote:

> > > Do you want to rename intrinsics as well? Because now intrinsic names do 
> > > not match builtin names.
> > 
> > 
> > Do we have to match builtins with intrinsics? Renaming intrinsics here 
> > means we will have to duplicate the intrinsics.
> 
> Is that because of the mangling?
Right.  It was originally suggested to use  a single instrinsic "load_lr".  But 
eventually we use global_load_tr to indicate this is in global address space.  
If we want to rename intrinsics here, it should be global_load_tr_b64 and 
global_load_tr_b128. 

We should rename intrinsic if users can use intrinsics directly. I think 
use-friendly is more important.

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread Stanislav Mekhanoshin via cfe-commits

rampitec wrote:

> > Do you want to rename intrinsics as well? Because now intrinsic names do 
> > not match builtin names.
> 
> Do we have to match builtins with intrinsics? Renaming intrinsics here means 
> we will have to duplicate the intrinsics.

Is that because of the mangling?

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread Changpeng Fang via cfe-commits

changpeng wrote:

> Do you want to rename intrinsics as well? Because now intrinsic names do not 
> match builtin names.

Do we have to match builtins with intrinsics? Renaming intrinsics here means we 
will have to duplicate the intrinsics. 

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread Stanislav Mekhanoshin via cfe-commits


@@ -432,13 +432,15 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vi", 
"n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "b", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts")
 
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
-
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8bf16, "V8yV8y*1", "nc", 
"gfx12-insts,wavefrontsize32")
+
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4bf16, "V4yV4y*1", "nc", 
"gfx12-insts,wavefrontsize64")

rampitec wrote:

There should not be legacy yet.

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread Shilei Tian via cfe-commits


@@ -432,13 +432,15 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vi", 
"n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "b", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts")
 
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
-
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8bf16, "V8yV8y*1", "nc", 
"gfx12-insts,wavefrontsize32")
+
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4bf16, "V4yV4y*1", "nc", 
"gfx12-insts,wavefrontsize64")

shiltian wrote:

Do we still want to keep the old builtins to maintain compatibility, though I 
doubt there is any legacy code using them?

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread Stanislav Mekhanoshin via cfe-commits

https://github.com/rampitec commented:

Do you want to rename intrinsics as well? Because now intrinsic names do not 
match builtin names.

https://github.com/llvm/llvm-project/pull/86202
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread Changpeng Fang via cfe-commits

https://github.com/changpeng created 
https://github.com/llvm/llvm-project/pull/86202

  Make the name of a clang builtin as close to the mnemonic instruction name as 
possible. The data type suffix may not be enough to tell what instruction the 
builtin is going to produce.
  This patch also add the bf16 support for global_load_tr_b128 builtins.

>From a65bd5bd52db208d9aa9c22cbb834787aff978d4 Mon Sep 17 00:00:00 2001
From: Changpeng Fang 
Date: Thu, 21 Mar 2024 14:24:43 -0700
Subject: [PATCH] AMDGPU: Rename and add bf16 support for global_load_tr
 builtins

  Make the name of a clang builtin as close to the mnemonic
instruction name as possible. The data type suffix may not be
enough to tell what instruction the builtin is going to produce.

  This patch also add the bf16 support for global_load_tr_b128
builtins.
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  | 16 
 clang/lib/CodeGen/CGBuiltin.cpp   | 34 +++--
 ...uiltins-amdgcn-global-load-tr-gfx11-err.cl | 25 ++--
 ...ins-amdgcn-global-load-tr-gfx12-w32-err.cl | 11 +++---
 ...ins-amdgcn-global-load-tr-gfx12-w64-err.cl | 11 +++---
 .../builtins-amdgcn-global-load-tr-w32.cl | 38 +--
 .../builtins-amdgcn-global-load-tr-w64.cl | 38 +--
 7 files changed, 94 insertions(+), 79 deletions(-)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 61ec8b79bf054d..4153b316c22b1d 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -432,13 +432,15 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vi", 
"n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "b", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts")
 
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
-
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8bf16, "V8yV8y*1", "nc", 
"gfx12-insts,wavefrontsize32")
+
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4bf16, "V4yV4y*1", "nc", 
"gfx12-insts,wavefrontsize64")
 
 
//===--===//
 // WMMA builtins.
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index e14e8908828218..2eaceeba617700 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18531,35 +18531,45 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
 return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
   }
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_i32:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_v2i32:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_v4f16:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_v4i16:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_v8f16:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_v8i16: {
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4bf16:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4f16:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4i16:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v8bf16:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v8f16:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v8i16: {
 
 llvm::Type *ArgTy;
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_load_tr_i32:
+case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
   ArgTy = llvm::Type::getInt32Ty(getLLVMContext());
   break;
-case 

[clang] AMDGPU: Rename and add bf16 support for global_load_tr builtins (PR #86202)

2024-03-21 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-codegen

Author: Changpeng Fang (changpeng)


Changes

  Make the name of a clang builtin as close to the mnemonic instruction name as 
possible. The data type suffix may not be enough to tell what instruction the 
builtin is going to produce.
  This patch also add the bf16 support for global_load_tr_b128 builtins.

---
Full diff: https://github.com/llvm/llvm-project/pull/86202.diff


7 Files Affected:

- (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+9-7) 
- (modified) clang/lib/CodeGen/CGBuiltin.cpp (+22-12) 
- (modified) 
clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-tr-gfx11-err.cl (+13-12) 
- (modified) 
clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-tr-gfx12-w32-err.cl (+6-5) 
- (modified) 
clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-tr-gfx12-w64-err.cl (+6-5) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-tr-w32.cl 
(+19-19) 
- (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-tr-w64.cl 
(+19-19) 


``diff
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 61ec8b79bf054d..4153b316c22b1d 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -432,13 +432,15 @@ TARGET_BUILTIN(__builtin_amdgcn_s_wakeup_barrier, "vi", 
"n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_barrier_leave, "b", "n", "gfx12-insts")
 TARGET_BUILTIN(__builtin_amdgcn_s_get_barrier_state, "Uii", "n", "gfx12-insts")
 
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
-
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
-TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_v2i32, "V2iV2i*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8i16, "V8sV8s*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8f16, "V8hV8h*1", "nc", 
"gfx12-insts,wavefrontsize32")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v8bf16, "V8yV8y*1", "nc", 
"gfx12-insts,wavefrontsize32")
+
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b64_i32, "ii*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4i16, "V4sV4s*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4f16, "V4hV4h*1", "nc", 
"gfx12-insts,wavefrontsize64")
+TARGET_BUILTIN(__builtin_amdgcn_global_load_tr_b128_v4bf16, "V4yV4y*1", "nc", 
"gfx12-insts,wavefrontsize64")
 
 
//===--===//
 // WMMA builtins.
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index e14e8908828218..2eaceeba617700 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18531,35 +18531,45 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 llvm::Function *F = CGM.getIntrinsic(IID, {ArgTy});
 return Builder.CreateCall(F, {Addr, Val, ZeroI32, ZeroI32, ZeroI1});
   }
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_i32:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_v2i32:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_v4f16:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_v4i16:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_v8f16:
-  case AMDGPU::BI__builtin_amdgcn_global_load_tr_v8i16: {
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4bf16:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4f16:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4i16:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v8bf16:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v8f16:
+  case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v8i16: {
 
 llvm::Type *ArgTy;
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_load_tr_i32:
+case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
   ArgTy = llvm::Type::getInt32Ty(getLLVMContext());
   break;
-case AMDGPU::BI__builtin_amdgcn_global_load_tr_v2i32:
+case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32:
   ArgTy = llvm::FixedVectorType::get(
   llvm::Type::getInt32Ty(getLLVMContext()), 2);
   break;
-case AMDGPU::BI__builtin_amdgcn_global_load_tr_v4f16:
+case