[PATCH] D125378: [Attribute] Introduce shuffle attribute to be used for __shfl_sync like cross-lane APIs

2022-05-11 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas added inline comments.



Comment at: clang/test/CodeGenHIP/shuffle-attr-verify.hip:34
+// CHECK-DAG: attributes #[[attr1]] = { {{[^}]*}}shuffle{{[^}]*}} }
\ No newline at end of file


Add a new line.



Comment at: clang/test/CodeGenHIP/shuffle-noundef-attr.hip:49
+}
\ No newline at end of file


Ditto


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125378/new/

https://reviews.llvm.org/D125378

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120566: [OpenCL][AMDGPU]: Do not allow a call to kernel

2022-02-25 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas created this revision.
cdevadas added reviewers: rjmccall, Anastasia, yaxunl, arsenm.
Herald added subscribers: Naghasan, ldrumm, kerbowa, t-tye, tpr, dstuttard, 
jvesely, kzhuravl.
cdevadas requested review of this revision.
Herald added subscribers: cfe-commits, wdng.
Herald added a project: clang.

In OpenCL, a kernel is allowed to call other kernels as if
they are regular functions. To support it, clang emits
amdgpu_kernel calling convention for both caller and callee.
A backend pass in our downstream compiler alters such calls
by introducing regular function bodies which are clones of
the callee kernels. This implementation currently limits us
in certain ways. For instance, the restriction to not use
byref attribute for callee kernels.

To avoid such limitations, this patch brings in those
cloned functions early on and prevents clang from generating
amdgpu_kernel call sites. A new function body will be added
for each kernel in the compilation unit expecting that the
unused clones will get removed at link time.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D120566

Files:
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/lib/CodeGen/TargetInfo.h
  clang/test/CodeGenOpenCL/amdgpu-kernel-calls.cl
  clang/test/CodeGenOpenCL/visibility.cl

Index: clang/test/CodeGenOpenCL/visibility.cl
===
--- clang/test/CodeGenOpenCL/visibility.cl
+++ clang/test/CodeGenOpenCL/visibility.cl
@@ -94,23 +94,6 @@
 ext_func_default();
 }
 
-// FVIS-DEFAULT: declare amdgpu_kernel void @ext_kern()
-// FVIS-PROTECTED: declare protected amdgpu_kernel void @ext_kern()
-// FVIS-HIDDEN: declare protected amdgpu_kernel void @ext_kern()
-
-// FVIS-DEFAULT: declare protected amdgpu_kernel void @ext_kern_hidden()
-// FVIS-PROTECTED: declare protected amdgpu_kernel void @ext_kern_hidden()
-// FVIS-HIDDEN: declare protected amdgpu_kernel void @ext_kern_hidden()
-
-// FVIS-DEFAULT: declare protected amdgpu_kernel void @ext_kern_protected()
-// FVIS-PROTECTED: declare protected amdgpu_kernel void @ext_kern_protected()
-// FVIS-HIDDEN: declare protected amdgpu_kernel void @ext_kern_protected()
-
-// FVIS-DEFAULT: declare amdgpu_kernel void @ext_kern_default()
-// FVIS-PROTECTED: declare amdgpu_kernel void @ext_kern_default()
-// FVIS-HIDDEN: declare amdgpu_kernel void @ext_kern_default()
-
-
 // FVIS-DEFAULT: declare void @ext_func()
 // FVIS-PROTECTED: declare protected void @ext_func()
 // FVIS-HIDDEN: declare hidden void @ext_func()
@@ -126,3 +109,21 @@
 // FVIS-DEFAULT: declare void @ext_func_default()
 // FVIS-PROTECTED: declare void @ext_func_default()
 // FVIS-HIDDEN: declare void @ext_func_default()
+
+// A kernel call will be emitted as a call to its cloned function
+// of non-kernel convention.
+// FVIS-DEFAULT: declare void @__amdgpu_ext_kern_kernel_body()
+// FVIS-PROTECTED: declare void @__amdgpu_ext_kern_kernel_body()
+// FVIS-HIDDEN: declare void @__amdgpu_ext_kern_kernel_body()
+
+// FVIS-DEFAULT: declare void @__amdgpu_ext_kern_hidden_kernel_body()
+// FVIS-PROTECTED: declare void @__amdgpu_ext_kern_hidden_kernel_body()
+// FVIS-HIDDEN: declare void @__amdgpu_ext_kern_hidden_kernel_body()
+
+// FVIS-DEFAULT: declare void @__amdgpu_ext_kern_protected_kernel_body()
+// FVIS-PROTECTED: declare void @__amdgpu_ext_kern_protected_kernel_body()
+// FVIS-HIDDEN: declare void @__amdgpu_ext_kern_protected_kernel_body()
+
+// FVIS-DEFAULT: declare void @__amdgpu_ext_kern_default_kernel_body()
+// FVIS-PROTECTED: declare void @__amdgpu_ext_kern_default_kernel_body()
+// FVIS-HIDDEN: declare void @__amdgpu_ext_kern_default_kernel_body()
Index: clang/test/CodeGenOpenCL/amdgpu-kernel-calls.cl
===
--- /dev/null
+++ clang/test/CodeGenOpenCL/amdgpu-kernel-calls.cl
@@ -0,0 +1,60 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -S -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s
+
+// AMDGPU disallows kernel callsites from another kernels. For each kernel, clang codegen will introduce
+// a cloned function body with a non-kernel calling convention and amdgpu_kernel callsites will get
+// transformed to call appropriate clones.
+
+extern kernel void test_extern_kernel_callee(global int *in);
+
+// CHECK: define dso_local amdgpu_kernel void @test_kernel_callee(i32 addrspace(1)* noundef align 4 %in)
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[IN_ADDR:%.*]] = alloca i32 addrspace(1)*, align 8, addrspace(5)
+// CHECK-NEXT:store i32 addrspace(1)* [[IN:%.*]], i32 addrspace(1)* addrspace(5)* [[IN_ADDR]], align 8
+// CHECK-NEXT:[[TMP0:%.*]] = load i32 addrspace(1)*, i32 addrspace(1)* addrspace(5)* [[IN_ADDR]], align 8
+// CHECK-NEXT:store i32 10, i32 addrspace(1)* [[TMP0]], align 4
+// CHECK-NEXT:ret void
+//
+kernel void test_kernel_callee(global int *in) {
+  *in = (int)(10);
+}
+
+// CHECK: define 

[PATCH] D80931: AMDGPU: Fix clang side null pointer value for private

2020-06-02 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas accepted this revision.
cdevadas added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80931/new/

https://reviews.llvm.org/D80931



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D64563: Updated the signature for some stack related intrinsics (CLANG)

2019-07-22 Thread Christudasan Devadasan via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL366683: Updated the signature for some stack related 
intrinsics (CLANG) (authored by cdevadas, committed by ).
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D64563?vs=210648=211064#toc

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64563/new/

https://reviews.llvm.org/D64563

Files:
  cfe/trunk/lib/CodeGen/CGBuiltin.cpp
  cfe/trunk/lib/CodeGen/CGException.cpp
  cfe/trunk/test/CodeGen/builtin-sponentry.c
  cfe/trunk/test/CodeGen/exceptions-seh.c
  cfe/trunk/test/CodeGen/integer-overflow.c
  cfe/trunk/test/CodeGen/ms-intrinsics.c
  cfe/trunk/test/CodeGen/ms-setjmp.c
  cfe/trunk/test/CodeGenOpenCL/builtins-generic-amdgcn.cl

Index: cfe/trunk/test/CodeGenOpenCL/builtins-generic-amdgcn.cl
===
--- cfe/trunk/test/CodeGenOpenCL/builtins-generic-amdgcn.cl
+++ cfe/trunk/test/CodeGenOpenCL/builtins-generic-amdgcn.cl
@@ -14,3 +14,8 @@
 {
   *out = __builtin_clzl(a);
 }
+
+// CHECK: tail call i8 addrspace(5)* @llvm.frameaddress.p5i8(i32 0)
+void test_builtin_frame_address(int *out) {
+*out = __builtin_frame_address(0);
+}
Index: cfe/trunk/test/CodeGen/exceptions-seh.c
===
--- cfe/trunk/test/CodeGen/exceptions-seh.c
+++ cfe/trunk/test/CodeGen/exceptions-seh.c
@@ -51,7 +51,7 @@
 // 32-bit SEH needs this filter to save the exception code.
 //
 // X86-LABEL: define internal i32 @"?filt$0@0@safe_div@@"()
-// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress(i32 1)
+// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress.p0i8(i32 1)
 // X86: %[[fp:[^ ]*]] = call i8* @llvm.eh.recoverfp(i8* bitcast (i32 (i32, i32, i32*)* @safe_div to i8*), i8* %[[ebp]])
 // X86: call i8* @llvm.localrecover(i8* bitcast (i32 (i32, i32, i32*)* @safe_div to i8*), i8* %[[fp]], i32 0)
 // X86: load i8*, i8**
@@ -103,7 +103,7 @@
 // ARM64: call i8* @llvm.localrecover(i8* bitcast (i32 ()* @filter_expr_capture to i8*), i8* %[[fp]], i32 0)
 //
 // X86-LABEL: define internal i32 @"?filt$0@0@filter_expr_capture@@"()
-// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress(i32 1)
+// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress.p0i8(i32 1)
 // X86: %[[fp:[^ ]*]] = call i8* @llvm.eh.recoverfp(i8* bitcast (i32 ()* @filter_expr_capture to i8*), i8* %[[ebp]])
 // X86: call i8* @llvm.localrecover(i8* bitcast (i32 ()* @filter_expr_capture to i8*), i8* %[[fp]], i32 0)
 //
Index: cfe/trunk/test/CodeGen/ms-intrinsics.c
===
--- cfe/trunk/test/CodeGen/ms-intrinsics.c
+++ cfe/trunk/test/CodeGen/ms-intrinsics.c
@@ -142,7 +142,7 @@
   return _AddressOfReturnAddress();
 }
 // CHECK-INTEL-LABEL: define dso_local i8* @test_AddressOfReturnAddress()
-// CHECK-INTEL: = tail call i8* @llvm.addressofreturnaddress()
+// CHECK-INTEL: = tail call i8* @llvm.addressofreturnaddress.p0i8()
 // CHECK-INTEL: ret i8*
 #endif
 
Index: cfe/trunk/test/CodeGen/integer-overflow.c
===
--- cfe/trunk/test/CodeGen/integer-overflow.c
+++ cfe/trunk/test/CodeGen/integer-overflow.c
@@ -99,8 +99,8 @@
 
   // PR24256: don't instrument __builtin_frame_address.
   __builtin_frame_address(0 + 0);
-  // DEFAULT:  call i8* @llvm.frameaddress(i32 0)
-  // WRAPV:call i8* @llvm.frameaddress(i32 0)
-  // TRAPV:call i8* @llvm.frameaddress(i32 0)
-  // CATCH_UB: call i8* @llvm.frameaddress(i32 0)
+  // DEFAULT:  call i8* @llvm.frameaddress.p0i8(i32 0)
+  // WRAPV:call i8* @llvm.frameaddress.p0i8(i32 0)
+  // TRAPV:call i8* @llvm.frameaddress.p0i8(i32 0)
+  // CATCH_UB: call i8* @llvm.frameaddress.p0i8(i32 0)
 }
Index: cfe/trunk/test/CodeGen/ms-setjmp.c
===
--- cfe/trunk/test/CodeGen/ms-setjmp.c
+++ cfe/trunk/test/CodeGen/ms-setjmp.c
@@ -20,12 +20,12 @@
   // I386-NEXT:  ret i32 %[[call]]
 
   // X64-LABEL: define dso_local i32 @test_setjmp
-  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress(i32 0)
+  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress.p0i8(i32 0)
   // X64:   %[[call:.*]] = call i32 @_setjmp(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // X64-NEXT:  ret i32 %[[call]]
 
   // AARCH64-LABEL: define dso_local i32 @test_setjmp
-  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry()
+  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry.p0i8()
   // AARCH64:   %[[call:.*]] = call i32 @_setjmpex(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // AARCH64-NEXT:  ret i32 %[[call]]
 }
@@ -33,12 +33,12 @@
 int test_setjmpex() {
   return _setjmpex(jb);
   // X64-LABEL: define dso_local i32 @test_setjmpex
-  // X64:   %[[addr:.*]] = call i8* 

[PATCH] D64563: Updated the signature for some stack related intrinsics (CLANG)

2019-07-18 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas updated this revision to Diff 210648.
cdevadas added a comment.
Herald added a subscriber: jvesely.

updated test/CodeGenOpenCL/builtins-generic-amdgcn.cl to vaildate the address 
space.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64563/new/

https://reviews.llvm.org/D64563

Files:
  lib/CodeGen/CGBuiltin.cpp
  lib/CodeGen/CGException.cpp
  test/CodeGen/builtin-sponentry.c
  test/CodeGen/exceptions-seh.c
  test/CodeGen/integer-overflow.c
  test/CodeGen/ms-intrinsics.c
  test/CodeGen/ms-setjmp.c
  test/CodeGenOpenCL/builtins-generic-amdgcn.cl

Index: test/CodeGenOpenCL/builtins-generic-amdgcn.cl
===
--- test/CodeGenOpenCL/builtins-generic-amdgcn.cl
+++ test/CodeGenOpenCL/builtins-generic-amdgcn.cl
@@ -14,3 +14,8 @@
 {
   *out = __builtin_clzl(a);
 }
+
+// CHECK: tail call i8 addrspace(5)* @llvm.frameaddress.p5i8(i32 0)
+void test_builtin_frame_address(int *out) {
+*out = __builtin_frame_address(0);
+}
Index: test/CodeGen/ms-setjmp.c
===
--- test/CodeGen/ms-setjmp.c
+++ test/CodeGen/ms-setjmp.c
@@ -20,12 +20,12 @@
   // I386-NEXT:  ret i32 %[[call]]
 
   // X64-LABEL: define dso_local i32 @test_setjmp
-  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress(i32 0)
+  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress.p0i8(i32 0)
   // X64:   %[[call:.*]] = call i32 @_setjmp(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // X64-NEXT:  ret i32 %[[call]]
 
   // AARCH64-LABEL: define dso_local i32 @test_setjmp
-  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry()
+  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry.p0i8()
   // AARCH64:   %[[call:.*]] = call i32 @_setjmpex(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // AARCH64-NEXT:  ret i32 %[[call]]
 }
@@ -33,12 +33,12 @@
 int test_setjmpex() {
   return _setjmpex(jb);
   // X64-LABEL: define dso_local i32 @test_setjmpex
-  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress(i32 0)
+  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress.p0i8(i32 0)
   // X64:   %[[call:.*]] = call i32 @_setjmpex(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // X64-NEXT:  ret i32 %[[call]]
 
   // AARCH64-LABEL: define dso_local i32 @test_setjmpex
-  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry()
+  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry.p0i8()
   // AARCH64:   %[[call:.*]] = call i32 @_setjmpex(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // AARCH64-NEXT:  ret i32 %[[call]]
 }
Index: test/CodeGen/ms-intrinsics.c
===
--- test/CodeGen/ms-intrinsics.c
+++ test/CodeGen/ms-intrinsics.c
@@ -142,7 +142,7 @@
   return _AddressOfReturnAddress();
 }
 // CHECK-INTEL-LABEL: define dso_local i8* @test_AddressOfReturnAddress()
-// CHECK-INTEL: = tail call i8* @llvm.addressofreturnaddress()
+// CHECK-INTEL: = tail call i8* @llvm.addressofreturnaddress.p0i8()
 // CHECK-INTEL: ret i8*
 #endif
 
Index: test/CodeGen/integer-overflow.c
===
--- test/CodeGen/integer-overflow.c
+++ test/CodeGen/integer-overflow.c
@@ -99,8 +99,8 @@
 
   // PR24256: don't instrument __builtin_frame_address.
   __builtin_frame_address(0 + 0);
-  // DEFAULT:  call i8* @llvm.frameaddress(i32 0)
-  // WRAPV:call i8* @llvm.frameaddress(i32 0)
-  // TRAPV:call i8* @llvm.frameaddress(i32 0)
-  // CATCH_UB: call i8* @llvm.frameaddress(i32 0)
+  // DEFAULT:  call i8* @llvm.frameaddress.p0i8(i32 0)
+  // WRAPV:call i8* @llvm.frameaddress.p0i8(i32 0)
+  // TRAPV:call i8* @llvm.frameaddress.p0i8(i32 0)
+  // CATCH_UB: call i8* @llvm.frameaddress.p0i8(i32 0)
 }
Index: test/CodeGen/exceptions-seh.c
===
--- test/CodeGen/exceptions-seh.c
+++ test/CodeGen/exceptions-seh.c
@@ -51,7 +51,7 @@
 // 32-bit SEH needs this filter to save the exception code.
 //
 // X86-LABEL: define internal i32 @"?filt$0@0@safe_div@@"()
-// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress(i32 1)
+// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress.p0i8(i32 1)
 // X86: %[[fp:[^ ]*]] = call i8* @llvm.eh.recoverfp(i8* bitcast (i32 (i32, i32, i32*)* @safe_div to i8*), i8* %[[ebp]])
 // X86: call i8* @llvm.localrecover(i8* bitcast (i32 (i32, i32, i32*)* @safe_div to i8*), i8* %[[fp]], i32 0)
 // X86: load i8*, i8**
@@ -103,7 +103,7 @@
 // ARM64: call i8* @llvm.localrecover(i8* bitcast (i32 ()* @filter_expr_capture to i8*), i8* %[[fp]], i32 0)
 //
 // X86-LABEL: define internal i32 @"?filt$0@0@filter_expr_capture@@"()
-// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress(i32 1)
+// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress.p0i8(i32 1)
 // X86: %[[fp:[^ ]*]] 

[PATCH] D64563: Updated the signature for some stack related intrinsics (CLANG)

2019-07-12 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas updated this revision to Diff 209467.
cdevadas added a comment.

Added alloca address space.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64563/new/

https://reviews.llvm.org/D64563

Files:
  lib/CodeGen/CGBuiltin.cpp
  lib/CodeGen/CGException.cpp
  test/CodeGen/builtin-sponentry.c
  test/CodeGen/exceptions-seh.c
  test/CodeGen/integer-overflow.c
  test/CodeGen/ms-intrinsics.c
  test/CodeGen/ms-setjmp.c

Index: test/CodeGen/ms-setjmp.c
===
--- test/CodeGen/ms-setjmp.c
+++ test/CodeGen/ms-setjmp.c
@@ -20,12 +20,12 @@
   // I386-NEXT:  ret i32 %[[call]]
 
   // X64-LABEL: define dso_local i32 @test_setjmp
-  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress(i32 0)
+  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress.p0i8(i32 0)
   // X64:   %[[call:.*]] = call i32 @_setjmp(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // X64-NEXT:  ret i32 %[[call]]
 
   // AARCH64-LABEL: define dso_local i32 @test_setjmp
-  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry()
+  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry.p0i8()
   // AARCH64:   %[[call:.*]] = call i32 @_setjmpex(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // AARCH64-NEXT:  ret i32 %[[call]]
 }
@@ -33,12 +33,12 @@
 int test_setjmpex() {
   return _setjmpex(jb);
   // X64-LABEL: define dso_local i32 @test_setjmpex
-  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress(i32 0)
+  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress.p0i8(i32 0)
   // X64:   %[[call:.*]] = call i32 @_setjmpex(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // X64-NEXT:  ret i32 %[[call]]
 
   // AARCH64-LABEL: define dso_local i32 @test_setjmpex
-  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry()
+  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry.p0i8()
   // AARCH64:   %[[call:.*]] = call i32 @_setjmpex(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // AARCH64-NEXT:  ret i32 %[[call]]
 }
Index: test/CodeGen/ms-intrinsics.c
===
--- test/CodeGen/ms-intrinsics.c
+++ test/CodeGen/ms-intrinsics.c
@@ -142,7 +142,7 @@
   return _AddressOfReturnAddress();
 }
 // CHECK-INTEL-LABEL: define dso_local i8* @test_AddressOfReturnAddress()
-// CHECK-INTEL: = tail call i8* @llvm.addressofreturnaddress()
+// CHECK-INTEL: = tail call i8* @llvm.addressofreturnaddress.p0i8()
 // CHECK-INTEL: ret i8*
 #endif
 
Index: test/CodeGen/integer-overflow.c
===
--- test/CodeGen/integer-overflow.c
+++ test/CodeGen/integer-overflow.c
@@ -99,8 +99,8 @@
 
   // PR24256: don't instrument __builtin_frame_address.
   __builtin_frame_address(0 + 0);
-  // DEFAULT:  call i8* @llvm.frameaddress(i32 0)
-  // WRAPV:call i8* @llvm.frameaddress(i32 0)
-  // TRAPV:call i8* @llvm.frameaddress(i32 0)
-  // CATCH_UB: call i8* @llvm.frameaddress(i32 0)
+  // DEFAULT:  call i8* @llvm.frameaddress.p0i8(i32 0)
+  // WRAPV:call i8* @llvm.frameaddress.p0i8(i32 0)
+  // TRAPV:call i8* @llvm.frameaddress.p0i8(i32 0)
+  // CATCH_UB: call i8* @llvm.frameaddress.p0i8(i32 0)
 }
Index: test/CodeGen/exceptions-seh.c
===
--- test/CodeGen/exceptions-seh.c
+++ test/CodeGen/exceptions-seh.c
@@ -51,7 +51,7 @@
 // 32-bit SEH needs this filter to save the exception code.
 //
 // X86-LABEL: define internal i32 @"?filt$0@0@safe_div@@"()
-// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress(i32 1)
+// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress.p0i8(i32 1)
 // X86: %[[fp:[^ ]*]] = call i8* @llvm.eh.recoverfp(i8* bitcast (i32 (i32, i32, i32*)* @safe_div to i8*), i8* %[[ebp]])
 // X86: call i8* @llvm.localrecover(i8* bitcast (i32 (i32, i32, i32*)* @safe_div to i8*), i8* %[[fp]], i32 0)
 // X86: load i8*, i8**
@@ -103,7 +103,7 @@
 // ARM64: call i8* @llvm.localrecover(i8* bitcast (i32 ()* @filter_expr_capture to i8*), i8* %[[fp]], i32 0)
 //
 // X86-LABEL: define internal i32 @"?filt$0@0@filter_expr_capture@@"()
-// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress(i32 1)
+// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress.p0i8(i32 1)
 // X86: %[[fp:[^ ]*]] = call i8* @llvm.eh.recoverfp(i8* bitcast (i32 ()* @filter_expr_capture to i8*), i8* %[[ebp]])
 // X86: call i8* @llvm.localrecover(i8* bitcast (i32 ()* @filter_expr_capture to i8*), i8* %[[fp]], i32 0)
 //
Index: test/CodeGen/builtin-sponentry.c
===
--- test/CodeGen/builtin-sponentry.c
+++ test/CodeGen/builtin-sponentry.c
@@ -4,5 +4,5 @@
   return __builtin_sponentry();
 }
 // CHECK-LABEL: define dso_local i8* @test_sponentry()
-// CHECK: = tail call i8* @llvm.sponentry()
+// CHECK: = tail call i8* 

[PATCH] D64563: Updated the signature for some stack related intrinsics (CLANG)

2019-07-11 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas created this revision.
cdevadas added a reviewer: arsenm.
Herald added subscribers: cfe-commits, wdng.
Herald added a project: clang.

This patch includes the intrinsics
int_addressofreturnaddress,
int_frameaddress & int_sponentry.
This patch should go along with the 
changes in https://reviews.llvm.org/D64561


Repository:
  rC Clang

https://reviews.llvm.org/D64563

Files:
  lib/CodeGen/CGBuiltin.cpp
  lib/CodeGen/CGException.cpp
  test/CodeGen/builtin-sponentry.c
  test/CodeGen/exceptions-seh.c
  test/CodeGen/integer-overflow.c
  test/CodeGen/ms-intrinsics.c
  test/CodeGen/ms-setjmp.c

Index: test/CodeGen/ms-setjmp.c
===
--- test/CodeGen/ms-setjmp.c
+++ test/CodeGen/ms-setjmp.c
@@ -20,12 +20,12 @@
   // I386-NEXT:  ret i32 %[[call]]
 
   // X64-LABEL: define dso_local i32 @test_setjmp
-  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress(i32 0)
+  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress.p0i8(i32 0)
   // X64:   %[[call:.*]] = call i32 @_setjmp(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // X64-NEXT:  ret i32 %[[call]]
 
   // AARCH64-LABEL: define dso_local i32 @test_setjmp
-  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry()
+  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry.p0i8()
   // AARCH64:   %[[call:.*]] = call i32 @_setjmpex(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // AARCH64-NEXT:  ret i32 %[[call]]
 }
@@ -33,12 +33,12 @@
 int test_setjmpex() {
   return _setjmpex(jb);
   // X64-LABEL: define dso_local i32 @test_setjmpex
-  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress(i32 0)
+  // X64:   %[[addr:.*]] = call i8* @llvm.frameaddress.p0i8(i32 0)
   // X64:   %[[call:.*]] = call i32 @_setjmpex(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // X64-NEXT:  ret i32 %[[call]]
 
   // AARCH64-LABEL: define dso_local i32 @test_setjmpex
-  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry()
+  // AARCH64:   %[[addr:.*]] = call i8* @llvm.sponentry.p0i8()
   // AARCH64:   %[[call:.*]] = call i32 @_setjmpex(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @jb, i64 0, i64 0), i8* %[[addr]])
   // AARCH64-NEXT:  ret i32 %[[call]]
 }
Index: test/CodeGen/ms-intrinsics.c
===
--- test/CodeGen/ms-intrinsics.c
+++ test/CodeGen/ms-intrinsics.c
@@ -142,7 +142,7 @@
   return _AddressOfReturnAddress();
 }
 // CHECK-INTEL-LABEL: define dso_local i8* @test_AddressOfReturnAddress()
-// CHECK-INTEL: = tail call i8* @llvm.addressofreturnaddress()
+// CHECK-INTEL: = tail call i8* @llvm.addressofreturnaddress.p0i8()
 // CHECK-INTEL: ret i8*
 #endif
 
Index: test/CodeGen/integer-overflow.c
===
--- test/CodeGen/integer-overflow.c
+++ test/CodeGen/integer-overflow.c
@@ -99,8 +99,8 @@
 
   // PR24256: don't instrument __builtin_frame_address.
   __builtin_frame_address(0 + 0);
-  // DEFAULT:  call i8* @llvm.frameaddress(i32 0)
-  // WRAPV:call i8* @llvm.frameaddress(i32 0)
-  // TRAPV:call i8* @llvm.frameaddress(i32 0)
-  // CATCH_UB: call i8* @llvm.frameaddress(i32 0)
+  // DEFAULT:  call i8* @llvm.frameaddress.p0i8(i32 0)
+  // WRAPV:call i8* @llvm.frameaddress.p0i8(i32 0)
+  // TRAPV:call i8* @llvm.frameaddress.p0i8(i32 0)
+  // CATCH_UB: call i8* @llvm.frameaddress.p0i8(i32 0)
 }
Index: test/CodeGen/exceptions-seh.c
===
--- test/CodeGen/exceptions-seh.c
+++ test/CodeGen/exceptions-seh.c
@@ -51,7 +51,7 @@
 // 32-bit SEH needs this filter to save the exception code.
 //
 // X86-LABEL: define internal i32 @"?filt$0@0@safe_div@@"()
-// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress(i32 1)
+// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress.p0i8(i32 1)
 // X86: %[[fp:[^ ]*]] = call i8* @llvm.eh.recoverfp(i8* bitcast (i32 (i32, i32, i32*)* @safe_div to i8*), i8* %[[ebp]])
 // X86: call i8* @llvm.localrecover(i8* bitcast (i32 (i32, i32, i32*)* @safe_div to i8*), i8* %[[fp]], i32 0)
 // X86: load i8*, i8**
@@ -103,7 +103,7 @@
 // ARM64: call i8* @llvm.localrecover(i8* bitcast (i32 ()* @filter_expr_capture to i8*), i8* %[[fp]], i32 0)
 //
 // X86-LABEL: define internal i32 @"?filt$0@0@filter_expr_capture@@"()
-// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress(i32 1)
+// X86: %[[ebp:[^ ]*]] = call i8* @llvm.frameaddress.p0i8(i32 1)
 // X86: %[[fp:[^ ]*]] = call i8* @llvm.eh.recoverfp(i8* bitcast (i32 ()* @filter_expr_capture to i8*), i8* %[[ebp]])
 // X86: call i8* @llvm.localrecover(i8* bitcast (i32 ()* @filter_expr_capture to i8*), i8* %[[fp]], i32 0)
 //
Index: test/CodeGen/builtin-sponentry.c
===
--- test/CodeGen/builtin-sponentry.c
+++ test/CodeGen/builtin-sponentry.c
@@ -4,5 

[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-07-10 Thread Christudasan Devadasan via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL365643: [AMDGPU] Increased the number of implicit argument 
bytes for both OpenCL and… (authored by cdevadas, committed by ).
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D63756?vs=206377=208969#toc

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63756/new/

https://reviews.llvm.org/D63756

Files:
  cfe/trunk/lib/CodeGen/TargetInfo.cpp
  cfe/trunk/test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu
  cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl


Index: cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl
===
--- cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl
+++ cfe/trunk/test/CodeGenOpenCL/amdgpu-attrs.cl
@@ -158,30 +158,30 @@
 // CHECK-NOT: "amdgpu-num-sgpr"="0"
 // CHECK-NOT: "amdgpu-num-vgpr"="0"
 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64" 
"amdgpu-implicitarg-num-bytes"="48" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128" 
"amdgpu-implicitarg-num-bytes"="48" 
-// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[NUM_SGPR_32]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
-// CHECK-DAG: attributes [[NUM_VGPR_64]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
-
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_SGPR_32]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_VGPR_64]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[NUM_SGPR_32_NUM_VGPR_64]] = { convergent noinline 
nounwind optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-num-vgpr"="64" 
-
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
"amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
"amdgpu-waves-per-eu"="2,4"
+// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent 
noinline nounwind optnone 

[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-07-05 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas added a comment.

The backend changes, D63886  are in the 
upstream now with revision rL365217 .
Please review this patch.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63756/new/

https://reviews.llvm.org/D63756



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-06-27 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas added a comment.

I have created the review for adding the metadata in the backend. 
https://reviews.llvm.org/D63886 
Marking the current review depended on D63886 .


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63756/new/

https://reviews.llvm.org/D63756



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-06-25 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas added a comment.

Hi Sam,
The compiler generates metadata for the first 48 bytes. I compiled a sample 
code and verified it. The backend does nothing for the extra bytes now.
I will soon submit the backend patch to generate the new metadata.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63756/new/

https://reviews.llvm.org/D63756



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D63756: [AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG).

2019-06-25 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas created this revision.
cdevadas added reviewers: yaxunl, b-sumner.
Herald added subscribers: cfe-commits, t-tye, Anastasia, tpr, dstuttard, 
nhaehnle, wdng, jvesely, kzhuravl.
Herald added a project: clang.

To enable a new implicit kernel argument, increasing the number of argument 
bytes from 48 to 56.


Repository:
  rC Clang

https://reviews.llvm.org/D63756

Files:
  lib/CodeGen/TargetInfo.cpp
  test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu
  test/CodeGenOpenCL/amdgpu-attrs.cl


Index: test/CodeGenOpenCL/amdgpu-attrs.cl
===
--- test/CodeGenOpenCL/amdgpu-attrs.cl
+++ test/CodeGenOpenCL/amdgpu-attrs.cl
@@ -158,30 +158,30 @@
 // CHECK-NOT: "amdgpu-num-sgpr"="0"
 // CHECK-NOT: "amdgpu-num-vgpr"="0"
 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_64_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="64,64" 
"amdgpu-implicitarg-num-bytes"="48" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_16_128]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="16,128" 
"amdgpu-implicitarg-num-bytes"="48" 
-// CHECK-DAG: attributes [[WAVES_PER_EU_2]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[NUM_SGPR_32]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
-// CHECK-DAG: attributes [[NUM_VGPR_64]] = { convergent noinline nounwind 
optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
-
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_SGPR_32]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
-// CHECK-DAG: attributes [[FLAT_WORK_GROUP_SIZE_32_64_NUM_VGPR_64]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-sgpr"="32" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[WAVES_PER_EU_2_4_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48" 
"amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes [[NUM_SGPR_32_NUM_VGPR_64]] = { convergent noinline 
nounwind optnone "amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-num-vgpr"="64" 
-
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
"amdgpu-waves-per-eu"="2"
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_SGPR_32]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-waves-per-eu"="2,4"
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_4_NUM_VGPR_64]] = { convergent 
noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-vgpr"="64" 
"amdgpu-waves-per-eu"="2,4"
-
-// CHECK-DAG: attributes 
[[FLAT_WORK_GROUP_SIZE_32_64_WAVES_PER_EU_2_NUM_SGPR_32_NUM_VGPR_64]] = { 
convergent noinline nounwind optnone "amdgpu-flat-work-group-size"="32,64" 
"amdgpu-implicitarg-num-bytes"="48" "amdgpu-num-sgpr"="32" 
"amdgpu-num-vgpr"="64" "amdgpu-waves-per-eu"="2"
-// 

[PATCH] D62244: [AMDGPU] Enable the implicit arguments for HIP (CLANG)

2019-06-10 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas updated this revision to Diff 203975.
cdevadas added a comment.

simplified the check in the test case.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62244/new/

https://reviews.llvm.org/D62244

Files:
  lib/CodeGen/TargetInfo.cpp
  test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu


Index: test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu
===
--- /dev/null
+++ test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu
@@ -0,0 +1,7 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -x hip -o - %s | 
FileCheck %s
+#include "Inputs/cuda.h"
+
+__global__ void hip_kernel_temp() {
+}
+
+// CHECK: attributes #0 = { noinline nounwind optnone 
"amdgpu-implicitarg-num-bytes"="48"
Index: lib/CodeGen/TargetInfo.cpp
===
--- lib/CodeGen/TargetInfo.cpp
+++ lib/CodeGen/TargetInfo.cpp
@@ -7853,7 +7853,8 @@
   const auto *ReqdWGS = M.getLangOpts().OpenCL ?
 FD->getAttr() : nullptr;
 
-  if (M.getLangOpts().OpenCL && FD->hasAttr() &&
+  if (((M.getLangOpts().OpenCL && FD->hasAttr()) ||
+  (M.getLangOpts().HIP && FD->hasAttr())) &&
   (M.getTriple().getOS() == llvm::Triple::AMDHSA))
 F->addFnAttr("amdgpu-implicitarg-num-bytes", "48");
 


Index: test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu
===
--- /dev/null
+++ test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu
@@ -0,0 +1,7 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -x hip -o - %s | FileCheck %s
+#include "Inputs/cuda.h"
+
+__global__ void hip_kernel_temp() {
+}
+
+// CHECK: attributes #0 = { noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48"
Index: lib/CodeGen/TargetInfo.cpp
===
--- lib/CodeGen/TargetInfo.cpp
+++ lib/CodeGen/TargetInfo.cpp
@@ -7853,7 +7853,8 @@
   const auto *ReqdWGS = M.getLangOpts().OpenCL ?
 FD->getAttr() : nullptr;
 
-  if (M.getLangOpts().OpenCL && FD->hasAttr() &&
+  if (((M.getLangOpts().OpenCL && FD->hasAttr()) ||
+  (M.getLangOpts().HIP && FD->hasAttr())) &&
   (M.getTriple().getOS() == llvm::Triple::AMDHSA))
 F->addFnAttr("amdgpu-implicitarg-num-bytes", "48");
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D62244: [AMDGPU] Enable the implicit arguments for HIP (CLANG)

2019-05-22 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas updated this revision to Diff 200870.
cdevadas added a comment.
Herald added subscribers: nhaehnle, jvesely.

Moved the test to CodeGenCUDA directory.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62244/new/

https://reviews.llvm.org/D62244

Files:
  lib/CodeGen/TargetInfo.cpp
  test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu


Index: test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu
===
--- /dev/null
+++ test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu
@@ -0,0 +1,7 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -x hip -o - %s | 
FileCheck %s
+#include "Inputs/cuda.h"
+
+__global__ void hip_kernel_temp() {
+}
+
+// CHECK-DAG: attributes #0 = { noinline nounwind optnone 
"amdgpu-implicitarg-num-bytes"="48"
Index: lib/CodeGen/TargetInfo.cpp
===
--- lib/CodeGen/TargetInfo.cpp
+++ lib/CodeGen/TargetInfo.cpp
@@ -7853,7 +7853,8 @@
   const auto *ReqdWGS = M.getLangOpts().OpenCL ?
 FD->getAttr() : nullptr;
 
-  if (M.getLangOpts().OpenCL && FD->hasAttr() &&
+  if (((M.getLangOpts().OpenCL && FD->hasAttr()) ||
+  (M.getLangOpts().HIP && FD->hasAttr())) &&
   (M.getTriple().getOS() == llvm::Triple::AMDHSA))
 F->addFnAttr("amdgpu-implicitarg-num-bytes", "48");
 


Index: test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu
===
--- /dev/null
+++ test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu
@@ -0,0 +1,7 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -x hip -o - %s | FileCheck %s
+#include "Inputs/cuda.h"
+
+__global__ void hip_kernel_temp() {
+}
+
+// CHECK-DAG: attributes #0 = { noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48"
Index: lib/CodeGen/TargetInfo.cpp
===
--- lib/CodeGen/TargetInfo.cpp
+++ lib/CodeGen/TargetInfo.cpp
@@ -7853,7 +7853,8 @@
   const auto *ReqdWGS = M.getLangOpts().OpenCL ?
 FD->getAttr() : nullptr;
 
-  if (M.getLangOpts().OpenCL && FD->hasAttr() &&
+  if (((M.getLangOpts().OpenCL && FD->hasAttr()) ||
+  (M.getLangOpts().HIP && FD->hasAttr())) &&
   (M.getTriple().getOS() == llvm::Triple::AMDHSA))
 F->addFnAttr("amdgpu-implicitarg-num-bytes", "48");
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D62244: [AMDGPU] Enable the implicit arguments for HIP (CLANG)

2019-05-22 Thread Christudasan Devadasan via Phabricator via cfe-commits
cdevadas created this revision.
cdevadas added reviewers: b-sumner, yaxunl.
Herald added subscribers: cfe-commits, t-tye, Anastasia, tpr, dstuttard, wdng, 
kzhuravl.
Herald added a project: clang.

Enable 48-bytes of implicit arguments for HIP as well. Earlier it was enabled 
for OpenCL. This code is specific to AMDGPU target.


Repository:
  rC Clang

https://reviews.llvm.org/D62244

Files:
  lib/CodeGen/TargetInfo.cpp
  test/CodeGenHIP/Inputs/hip.h
  test/CodeGenHIP/implicit-kernarg.cpp


Index: test/CodeGenHIP/implicit-kernarg.cpp
===
--- /dev/null
+++ test/CodeGenHIP/implicit-kernarg.cpp
@@ -0,0 +1,7 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -x hip -o - %s | 
FileCheck %s
+#include "Inputs/hip.h"
+
+__global__ void hip_kernel_temp() {
+}
+
+// CHECK-DAG: attributes #0 = { noinline nounwind optnone 
"amdgpu-implicitarg-num-bytes"="48"
Index: test/CodeGenHIP/Inputs/hip.h
===
--- /dev/null
+++ test/CodeGenHIP/Inputs/hip.h
@@ -0,0 +1,3 @@
+/* Minimal declarations for HIP support.  Testing purposes only. */
+
+#define __global__ __attribute__((global))
Index: lib/CodeGen/TargetInfo.cpp
===
--- lib/CodeGen/TargetInfo.cpp
+++ lib/CodeGen/TargetInfo.cpp
@@ -7853,7 +7853,8 @@
   const auto *ReqdWGS = M.getLangOpts().OpenCL ?
 FD->getAttr() : nullptr;
 
-  if (M.getLangOpts().OpenCL && FD->hasAttr() &&
+  if (((M.getLangOpts().OpenCL && FD->hasAttr()) ||
+  (M.getLangOpts().HIP && FD->hasAttr())) &&
   (M.getTriple().getOS() == llvm::Triple::AMDHSA))
 F->addFnAttr("amdgpu-implicitarg-num-bytes", "48");
 


Index: test/CodeGenHIP/implicit-kernarg.cpp
===
--- /dev/null
+++ test/CodeGenHIP/implicit-kernarg.cpp
@@ -0,0 +1,7 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -x hip -o - %s | FileCheck %s
+#include "Inputs/hip.h"
+
+__global__ void hip_kernel_temp() {
+}
+
+// CHECK-DAG: attributes #0 = { noinline nounwind optnone "amdgpu-implicitarg-num-bytes"="48"
Index: test/CodeGenHIP/Inputs/hip.h
===
--- /dev/null
+++ test/CodeGenHIP/Inputs/hip.h
@@ -0,0 +1,3 @@
+/* Minimal declarations for HIP support.  Testing purposes only. */
+
+#define __global__ __attribute__((global))
Index: lib/CodeGen/TargetInfo.cpp
===
--- lib/CodeGen/TargetInfo.cpp
+++ lib/CodeGen/TargetInfo.cpp
@@ -7853,7 +7853,8 @@
   const auto *ReqdWGS = M.getLangOpts().OpenCL ?
 FD->getAttr() : nullptr;
 
-  if (M.getLangOpts().OpenCL && FD->hasAttr() &&
+  if (((M.getLangOpts().OpenCL && FD->hasAttr()) ||
+  (M.getLangOpts().HIP && FD->hasAttr())) &&
   (M.getTriple().getOS() == llvm::Triple::AMDHSA))
 F->addFnAttr("amdgpu-implicitarg-num-bytes", "48");
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits