[PATCH] D74144: [OPENMP50]Add basic support for array-shaping operation.

2023-02-21 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.
Herald added a subscriber: steakhal.
Herald added a reviewer: NoQ.
Herald added a project: All.

In D74144#2308856 , @ABataev wrote:

> In D74144#2308796 , @cchen wrote:
>
>> In D74144#2307494 , @ABataev wrote:
>>
>>> In D74144#2307454 , @cchen wrote:
>>>
 @ABataev, the below test is extracted from Sollve test suite and Clang now 
 emit:

   test.c:17:35: error: subscripted value is not an array or pointer
   #pragma omp target update to( (([N][N])foo)[1:M] )
 ^
   test.c:17:5: error: expected at least one 'to' clause or 'from' clause 
 specified to '#pragma omp target update'
   #pragma omp target update to( (([N][N])foo)[1:M] )

 This error message came from the `ActOnOMPArraySectionExpr` which is 
 called inside `ParsePostfixExpressionSuffix`. The issue is that the base 
 expression in `ActOnOMPArraySectionExpr` looks like:

   ParenExpr 0x122859be0 '' lvalue
   `-OMPArrayShapingExpr 0x122859b98 '' lvalue
 |-IntegerLiteral 0x122859b38 'int' 5
 |-IntegerLiteral 0x122859b58 'int' 5
 `-DeclRefExpr 0x122859b78 'int *' lvalue Var 0x1228599d0 'foo' 'int *'

 which is not a base that we would expect in an array section expr. I've 
 tried relaxing the base type check in `ActOnOMPArraySectionExpr` but not 
 sure it's the way to go. (or should I just extract the DeclReExpr from 
 ArrayShapingExpr before calling `ActOnOMPArraySectionExpr`?)

   #define N 5
   #define M 3
   
   int main(void) {
   int tmp[N][N];
   for(int i=0; i>>>   for(int j=0; j>>>   tmp[i][j] = N*i + j;
   
   int *foo = &tmp[0][0];
   
   // This compiles just fine
   //#pragma omp target update to( ([N][N])foo )
   
   // This is rejected by the compiler
   #pragma omp target update to( (([N][N])foo)[1:M] )
   }
>>>
>>> I don't think it is allowed by the standard.
>>>
>>> According to the standard, The shape-operator can appear only in clauses 
>>> where it is explicitly allowed.
>>> In this case, array shaping is used as a base expression of array section 
>>> (or subscript) expression, which does not meet the standard. Tje array 
>>> sjaping operation is not used in clause, instead it is used as a base 
>>> subexpression of another expression.
>>
>> In OpenMP 5.0 [2.12.6, target update construct, Restrictions, C/C++, p.1] 
>> The list items that appear in the to or from clauses may use shape-operators.
>> Also, in the array shaping section in https://github.com/OpenMP/Examples, 
>> the example is also illustrated with the same usage:
>>
>>   ...
>>   S-17 // update boundary points (two columns of 2D array) on the host
>>   S-18 // pointer is shaped to 2D array using the shape-operator
>>   S-19 #pragma omp target update from( (([nx][ny+2])a)[0:nx][1], 
>> (([nx][ny+2])a)[0:nx][ny] )
>>   ...
>
> Then just need to fix it, if examples document has this example.

Was this ever followed up on and fixed?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74144/new/

https://reviews.llvm.org/D74144

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D74970: [OpenMP] Refactor the analysis in checkMapClauseBaseExpression using StmtVisitor class.

2023-02-21 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added inline comments.
Herald added subscribers: sstefan1, yaxunl.
Herald added a project: All.



Comment at: clang/lib/Sema/SemaOpenMP.cpp:15450
   }
+  bool VisitCXXThisExpr(CXXThisExpr *CTE) { return true; }
+  bool VisitStmt(Stmt *) {

cchen wrote:
> ABataev wrote:
> > Do you really need this function?
> Removed the function.
Was this function intended to be removed? As far as I can tell it was not and 
it seems to be the source of an issue I am having:

expected addressable lvalue in 'map' clause


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74970/new/

https://reviews.llvm.org/D74970

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D109885: [MLIR][[amdgpu-arch]][OpenMP] Remove direct dependency on /opt/rocm

2021-12-16 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D109885#3194819 , @ronlieb wrote:

> @estewart08 thoughts on a good CMAKE variable to allow users to define 
> equivalent of /opt/rocm  ?   and not use environment variable inside the 
> cmake file.

I would be ok with the following, without the check for ENV{ROCM_PATH}. The 
user has the option to set -DROCM_PATH=$ROCM_PATH or -DCMAKE_PREFIX_PATH.

  find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATH 
${ROCM_PATH})


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109885/new/

https://reviews.llvm.org/D109885

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4.

2021-12-14 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D115661#3193157 , @arsenm wrote:

> In D115661#3193152 , @yaxunl wrote:
>
>> In D115661#3192983 , @estewart08 
>> wrote:
>>
>>> In D115661#3190477 , @yaxunl 
>>> wrote:
>>>
 This may cause perf regressions for HIP.
>>>
>>> Do you have a test that would show such a regression? Emitting a store to 
>>> address space (4) in a constructor seems the wrong thing to do.
>>
>> The two lit tests which changed from addr space 4 to 1 demonstrated that. In 
>> alias analysis, if a variable is in addr space 4, the backend knows that it 
>> is constant and can do optimizations on it. After changing to addr space 1, 
>> those optimizations are gone.
>
> The backend also knows because the constant flag is set on the global 
> variable. Addrspace(4) is a kludge which is largely redundant with other 
> mechanisms for indicating constants

If I am understanding you correctly, putting things in address space (4) has 
little to no performance benefit. @yaxunl seems to think otherwise. I agree 
that we can further constrain the address space (1) criteria, but I am getting 
conflicting viewpoints here on performance.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115661/new/

https://reviews.llvm.org/D115661

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4.

2021-12-14 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D115661#3190477 , @yaxunl wrote:

> This may cause perf regressions for HIP.

Do you have a test that would show such a regression? Emitting a store to 
address space (4) in a constructor seems the wrong thing to do.




Comment at: clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp:10
 // X86: @_ZN1A3FooE ={{.*}} constant i32 123, align 4
-// AMD: @_ZN1A3FooE ={{.*}} addrspace(4) constant i32 123, align 4
+// AMD: @_ZN1A3FooE ={{.*}} addrspace(1) constant i32 123, align 4
 const int *p = &A::Foo; // emit available_externally

yaxunl wrote:
> Do you know why this is not treated as constant initialization?
> 
There is no initialization here:
```
const int A::Foo;
```

It seems the compiler ignores the 123 in the struct.

I see address space (4) when the test is written like this:
```
struct A {
  static const int Foo;
};
const int A::Foo = 123;
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115661/new/

https://reviews.llvm.org/D115661

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4.

2021-12-13 Thread Ethan Stewart via Phabricator via cfe-commits
This revision was not accepted when it landed; it landed in state "Needs 
Revision".
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGd1327f8a574a: [clang][amdgpu] - Choose when to promote 
VarDecl to address space 4. (authored by estewart08).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115661/new/

https://reviews.llvm.org/D115661

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
  clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp


Index: clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
===
--- clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
+++ clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
@@ -7,7 +7,7 @@
   static const int Foo = 123;
 };
 // X86: @_ZN1A3FooE ={{.*}} constant i32 123, align 4
-// AMD: @_ZN1A3FooE ={{.*}} addrspace(4) constant i32 123, align 4
+// AMD: @_ZN1A3FooE ={{.*}} addrspace(1) constant i32 123, align 4
 const int *p = &A::Foo; // emit available_externally
 const int A::Foo;   // convert to full definition
 
@@ -37,7 +37,7 @@
   // CXX11X86: @_ZN3Foo21ConstexprStaticMemberE = available_externally 
constant i32 42,
   // CXX17X86: @_ZN3Foo21ConstexprStaticMemberE = linkonce_odr constant i32 42,
   // CXX11AMD: @_ZN3Foo21ConstexprStaticMemberE = available_externally 
addrspace(4) constant i32 42,
-  // CXX17AMD: @_ZN3Foo21ConstexprStaticMemberE = linkonce_odr addrspace(4) 
constant i32 42,
+  // CXX17AMD: @_ZN3Foo21ConstexprStaticMemberE = linkonce_odr addrspace(4) 
constant i32 42, comdat, align 4
   static constexpr int ConstexprStaticMember = 42;
   // X86: @_ZN3Foo17ConstStaticMemberE = available_externally constant i32 43,
   // AMD: @_ZN3Foo17ConstStaticMemberE = available_externally addrspace(4) 
constant i32 43,
Index: clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
===
--- clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
+++ clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
@@ -78,12 +78,12 @@
 // X86: @[[PARTLY_CONSTANT_SECOND:_ZGRN15partly_constant2ilE2_]] = internal 
global [2 x i32] zeroinitializer, align 4
 // X86: @[[PARTLY_CONSTANT_THIRD:_ZGRN15partly_constant2ilE3_]] = internal 
constant [4 x i32] [i32 5, i32 6, i32 7, i32 8], align 4
 // AMDGCN: @_ZN15partly_constant1kE ={{.*}} addrspace(1) global i32 0, align 4
-// AMDGCN: @_ZN15partly_constant2ilE ={{.*}} addrspace(4) global {{.*}} null, 
align 8
-// AMDGCN: @[[PARTLY_CONSTANT_OUTER:_ZGRN15partly_constant2ilE_]] = internal 
addrspace(4) global {{.*}} zeroinitializer, align 8
-// AMDGCN: @[[PARTLY_CONSTANT_INNER:_ZGRN15partly_constant2ilE0_]] = internal 
addrspace(4) global [3 x {{.*}}] zeroinitializer, align 8
-// AMDGCN: @[[PARTLY_CONSTANT_FIRST:_ZGRN15partly_constant2ilE1_]] = internal 
addrspace(4) constant [3 x i32] [i32 1, i32 2, i32 3], align 4
-// AMDGCN: @[[PARTLY_CONSTANT_SECOND:_ZGRN15partly_constant2ilE2_]] = internal 
addrspace(4) global [2 x i32] zeroinitializer, align 4
-// AMDGCN: @[[PARTLY_CONSTANT_THIRD:_ZGRN15partly_constant2ilE3_]] = internal 
addrspace(4) constant [4 x i32] [i32 5, i32 6, i32 7, i32 8], align 4
+// AMDGCN: @_ZN15partly_constant2ilE ={{.*}} addrspace(1) global {{.*}} null, 
align 8
+// AMDGCN: @[[PARTLY_CONSTANT_OUTER:_ZGRN15partly_constant2ilE_]] = internal 
addrspace(1) global {{.*}} zeroinitializer, align 8
+// AMDGCN: @[[PARTLY_CONSTANT_INNER:_ZGRN15partly_constant2ilE0_]] = internal 
addrspace(1) global [3 x {{.*}}] zeroinitializer, align 8
+// AMDGCN: @[[PARTLY_CONSTANT_FIRST:_ZGRN15partly_constant2ilE1_]] = internal 
addrspace(1) constant [3 x i32] [i32 1, i32 2, i32 3], align 4
+// AMDGCN: @[[PARTLY_CONSTANT_SECOND:_ZGRN15partly_constant2ilE2_]] = internal 
addrspace(1) global [2 x i32] zeroinitializer, align 4
+// AMDGCN: @[[PARTLY_CONSTANT_THIRD:_ZGRN15partly_constant2ilE3_]] = internal 
addrspace(1) constant [4 x i32] [i32 5, i32 6, i32 7, i32 8], align 4
 
 // X86: @[[REFTMP1:.*]] = private constant [2 x i32] [i32 42, i32 43], align 4
 // X86: @[[REFTMP2:.*]] = private constant [3 x %{{.*}}] [%{{.*}} { i32 1 }, 
%{{.*}} { i32 2 }, %{{.*}} { i32 3 }], align 4
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -9362,7 +9362,9 @@
   if (AddrSpace != LangAS::Default)
 return AddrSpace;
 
-  if (CGM.isTypeConstant(D->getType(), false)) {
+  // Only promote to address space 4 if VarDecl has constant initialization.
+  if (CGM.isTypeConstant(D->getType(), false) &&
+  D->hasConstantInitialization()) {
 if (auto ConstAS = CGM.getTarget().getConstantAddressSpace())
   return ConstAS.getValue();
   }


Index: clang/test/CodeGenCXX/cxx11-extern-conste

[PATCH] D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4.

2021-12-13 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 updated this revision to Diff 394002.
estewart08 added a comment.

Resubmit patch with lint.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115661/new/

https://reviews.llvm.org/D115661

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
  clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp


Index: clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
===
--- clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
+++ clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
@@ -7,7 +7,7 @@
   static const int Foo = 123;
 };
 // X86: @_ZN1A3FooE ={{.*}} constant i32 123, align 4
-// AMD: @_ZN1A3FooE ={{.*}} addrspace(4) constant i32 123, align 4
+// AMD: @_ZN1A3FooE ={{.*}} addrspace(1) constant i32 123, align 4
 const int *p = &A::Foo; // emit available_externally
 const int A::Foo;   // convert to full definition
 
@@ -37,7 +37,7 @@
   // CXX11X86: @_ZN3Foo21ConstexprStaticMemberE = available_externally 
constant i32 42,
   // CXX17X86: @_ZN3Foo21ConstexprStaticMemberE = linkonce_odr constant i32 42,
   // CXX11AMD: @_ZN3Foo21ConstexprStaticMemberE = available_externally 
addrspace(4) constant i32 42,
-  // CXX17AMD: @_ZN3Foo21ConstexprStaticMemberE = linkonce_odr addrspace(4) 
constant i32 42,
+  // CXX17AMD: @_ZN3Foo21ConstexprStaticMemberE = linkonce_odr addrspace(4) 
constant i32 42, comdat, align 4
   static constexpr int ConstexprStaticMember = 42;
   // X86: @_ZN3Foo17ConstStaticMemberE = available_externally constant i32 43,
   // AMD: @_ZN3Foo17ConstStaticMemberE = available_externally addrspace(4) 
constant i32 43,
Index: clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
===
--- clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
+++ clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
@@ -78,12 +78,12 @@
 // X86: @[[PARTLY_CONSTANT_SECOND:_ZGRN15partly_constant2ilE2_]] = internal 
global [2 x i32] zeroinitializer, align 4
 // X86: @[[PARTLY_CONSTANT_THIRD:_ZGRN15partly_constant2ilE3_]] = internal 
constant [4 x i32] [i32 5, i32 6, i32 7, i32 8], align 4
 // AMDGCN: @_ZN15partly_constant1kE ={{.*}} addrspace(1) global i32 0, align 4
-// AMDGCN: @_ZN15partly_constant2ilE ={{.*}} addrspace(4) global {{.*}} null, 
align 8
-// AMDGCN: @[[PARTLY_CONSTANT_OUTER:_ZGRN15partly_constant2ilE_]] = internal 
addrspace(4) global {{.*}} zeroinitializer, align 8
-// AMDGCN: @[[PARTLY_CONSTANT_INNER:_ZGRN15partly_constant2ilE0_]] = internal 
addrspace(4) global [3 x {{.*}}] zeroinitializer, align 8
-// AMDGCN: @[[PARTLY_CONSTANT_FIRST:_ZGRN15partly_constant2ilE1_]] = internal 
addrspace(4) constant [3 x i32] [i32 1, i32 2, i32 3], align 4
-// AMDGCN: @[[PARTLY_CONSTANT_SECOND:_ZGRN15partly_constant2ilE2_]] = internal 
addrspace(4) global [2 x i32] zeroinitializer, align 4
-// AMDGCN: @[[PARTLY_CONSTANT_THIRD:_ZGRN15partly_constant2ilE3_]] = internal 
addrspace(4) constant [4 x i32] [i32 5, i32 6, i32 7, i32 8], align 4
+// AMDGCN: @_ZN15partly_constant2ilE ={{.*}} addrspace(1) global {{.*}} null, 
align 8
+// AMDGCN: @[[PARTLY_CONSTANT_OUTER:_ZGRN15partly_constant2ilE_]] = internal 
addrspace(1) global {{.*}} zeroinitializer, align 8
+// AMDGCN: @[[PARTLY_CONSTANT_INNER:_ZGRN15partly_constant2ilE0_]] = internal 
addrspace(1) global [3 x {{.*}}] zeroinitializer, align 8
+// AMDGCN: @[[PARTLY_CONSTANT_FIRST:_ZGRN15partly_constant2ilE1_]] = internal 
addrspace(1) constant [3 x i32] [i32 1, i32 2, i32 3], align 4
+// AMDGCN: @[[PARTLY_CONSTANT_SECOND:_ZGRN15partly_constant2ilE2_]] = internal 
addrspace(1) global [2 x i32] zeroinitializer, align 4
+// AMDGCN: @[[PARTLY_CONSTANT_THIRD:_ZGRN15partly_constant2ilE3_]] = internal 
addrspace(1) constant [4 x i32] [i32 5, i32 6, i32 7, i32 8], align 4
 
 // X86: @[[REFTMP1:.*]] = private constant [2 x i32] [i32 42, i32 43], align 4
 // X86: @[[REFTMP2:.*]] = private constant [3 x %{{.*}}] [%{{.*}} { i32 1 }, 
%{{.*}} { i32 2 }, %{{.*}} { i32 3 }], align 4
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -9362,7 +9362,9 @@
   if (AddrSpace != LangAS::Default)
 return AddrSpace;
 
-  if (CGM.isTypeConstant(D->getType(), false)) {
+  // Only promote to address space 4 if VarDecl has constant initialization.
+  if (CGM.isTypeConstant(D->getType(), false) &&
+  D->hasConstantInitialization()) {
 if (auto ConstAS = CGM.getTarget().getConstantAddressSpace())
   return ConstAS.getValue();
   }


Index: clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
===
--- clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
+++ clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
@@ -7,7 +7,7 @@
   static const int Foo = 123;
 };
 // X8

[PATCH] D115661: [clang][amdgpu] - Choose when to promote VarDecl to address space 4.

2021-12-13 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 created this revision.
estewart08 added a reviewer: JonChesterfield.
Herald added subscribers: t-tye, tpr, dstuttard, yaxunl, kzhuravl.
estewart08 requested review of this revision.
Herald added subscribers: cfe-commits, wdng.
Herald added a project: clang.

There are instances where clang codegen creates stores to
address space 4 in ctors, which causes a crash in llc.
This store was being optimized out at opt levels > 0.

For example:

pragma omp declare target
static  const double log_smallx = log2(smallx);
pragma omp end declare target

This patch ensures that any global const that does not
have constant initialization stays in address space 1.

Note - a second patch is in the works where all global
constants are placed in address space 1 during
codegen and then the opt pass InferAdressSpaces
will promote to address space 4 where necessary.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D115661

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
  clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp


Index: clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
===
--- clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
+++ clang/test/CodeGenCXX/cxx11-extern-constexpr.cpp
@@ -7,7 +7,7 @@
   static const int Foo = 123;
 };
 // X86: @_ZN1A3FooE ={{.*}} constant i32 123, align 4
-// AMD: @_ZN1A3FooE ={{.*}} addrspace(4) constant i32 123, align 4
+// AMD: @_ZN1A3FooE ={{.*}} addrspace(1) constant i32 123, align 4
 const int *p = &A::Foo; // emit available_externally
 const int A::Foo;   // convert to full definition
 
@@ -37,7 +37,7 @@
   // CXX11X86: @_ZN3Foo21ConstexprStaticMemberE = available_externally 
constant i32 42,
   // CXX17X86: @_ZN3Foo21ConstexprStaticMemberE = linkonce_odr constant i32 42,
   // CXX11AMD: @_ZN3Foo21ConstexprStaticMemberE = available_externally 
addrspace(4) constant i32 42,
-  // CXX17AMD: @_ZN3Foo21ConstexprStaticMemberE = linkonce_odr addrspace(4) 
constant i32 42,
+  // CXX17AMD: @_ZN3Foo21ConstexprStaticMemberE = linkonce_odr addrspace(4) 
constant i32 42, comdat, align 4
   static constexpr int ConstexprStaticMember = 42;
   // X86: @_ZN3Foo17ConstStaticMemberE = available_externally constant i32 43,
   // AMD: @_ZN3Foo17ConstStaticMemberE = available_externally addrspace(4) 
constant i32 43,
Index: clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
===
--- clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
+++ clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp
@@ -78,12 +78,12 @@
 // X86: @[[PARTLY_CONSTANT_SECOND:_ZGRN15partly_constant2ilE2_]] = internal 
global [2 x i32] zeroinitializer, align 4
 // X86: @[[PARTLY_CONSTANT_THIRD:_ZGRN15partly_constant2ilE3_]] = internal 
constant [4 x i32] [i32 5, i32 6, i32 7, i32 8], align 4
 // AMDGCN: @_ZN15partly_constant1kE ={{.*}} addrspace(1) global i32 0, align 4
-// AMDGCN: @_ZN15partly_constant2ilE ={{.*}} addrspace(4) global {{.*}} null, 
align 8
-// AMDGCN: @[[PARTLY_CONSTANT_OUTER:_ZGRN15partly_constant2ilE_]] = internal 
addrspace(4) global {{.*}} zeroinitializer, align 8
-// AMDGCN: @[[PARTLY_CONSTANT_INNER:_ZGRN15partly_constant2ilE0_]] = internal 
addrspace(4) global [3 x {{.*}}] zeroinitializer, align 8
-// AMDGCN: @[[PARTLY_CONSTANT_FIRST:_ZGRN15partly_constant2ilE1_]] = internal 
addrspace(4) constant [3 x i32] [i32 1, i32 2, i32 3], align 4
-// AMDGCN: @[[PARTLY_CONSTANT_SECOND:_ZGRN15partly_constant2ilE2_]] = internal 
addrspace(4) global [2 x i32] zeroinitializer, align 4
-// AMDGCN: @[[PARTLY_CONSTANT_THIRD:_ZGRN15partly_constant2ilE3_]] = internal 
addrspace(4) constant [4 x i32] [i32 5, i32 6, i32 7, i32 8], align 4
+// AMDGCN: @_ZN15partly_constant2ilE ={{.*}} addrspace(1) global {{.*}} null, 
align 8
+// AMDGCN: @[[PARTLY_CONSTANT_OUTER:_ZGRN15partly_constant2ilE_]] = internal 
addrspace(1) global {{.*}} zeroinitializer, align 8
+// AMDGCN: @[[PARTLY_CONSTANT_INNER:_ZGRN15partly_constant2ilE0_]] = internal 
addrspace(1) global [3 x {{.*}}] zeroinitializer, align 8
+// AMDGCN: @[[PARTLY_CONSTANT_FIRST:_ZGRN15partly_constant2ilE1_]] = internal 
addrspace(1) constant [3 x i32] [i32 1, i32 2, i32 3], align 4
+// AMDGCN: @[[PARTLY_CONSTANT_SECOND:_ZGRN15partly_constant2ilE2_]] = internal 
addrspace(1) global [2 x i32] zeroinitializer, align 4
+// AMDGCN: @[[PARTLY_CONSTANT_THIRD:_ZGRN15partly_constant2ilE3_]] = internal 
addrspace(1) constant [4 x i32] [i32 5, i32 6, i32 7, i32 8], align 4
 
 // X86: @[[REFTMP1:.*]] = private constant [2 x i32] [i32 42, i32 43], align 4
 // X86: @[[REFTMP2:.*]] = private constant [3 x %{{.*}}] [%{{.*}} { i32 1 }, 
%{{.*}} { i32 2 }, %{{.*}} { i32 3 }], align 4
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@

[PATCH] D104904: [OpenMP][AMDGCN] Initial math headers support

2021-07-29 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D104904#2913983 , @ye-luo wrote:

> how to get this moving?

We are working on some additions to this patch. The lit failure noted above has 
been fixed locally. I would expect an update here very soon.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104904/new/

https://reviews.llvm.org/D104904

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104904: [OpenMP][AMDGCN] Initial math headers support

2021-07-21 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added inline comments.



Comment at: clang/test/Headers/Inputs/include/cstdlib:29
 float fabs(float __x) { return __builtin_fabs(__x); }
+#endif
 

jdoerfert wrote:
> JonChesterfield wrote:
> > jdoerfert wrote:
> > > estewart08 wrote:
> > > > JonChesterfield wrote:
> > > > > jdoerfert wrote:
> > > > > > That seems to be fundamentally broken then, but let's see, maybe it 
> > > > > > will somehow work anyway.
> > > > > I thought fabs was in math, not stdlib. Not sure what this file is 
> > > > > doing but the functions above are inline and fabs isn't
> > > > I am afraid this is just a workaround to get 
> > > > openmp_device_math_isnan.cpp to pass for AMDGCN. This stems from not 
> > > > having an #ifndef OPENMP_AMDGCN in __clang_hip_cmath.h where 'using 
> > > > ::fabs' is present. Currently, OPENMP_AMDGCN uses all of the overloaded 
> > > > functions created by the HIP macros where NVPTX does not use the CUDA 
> > > > overloads. This may be a new topic of discussion.
> > > > 
> > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_cuda_cmath.h#L191
> > > > 
> > > > By using this ifndef, it seems NVPTX looses quite a few overloaded 
> > > > functions. Are these meant to eventually be present in 
> > > > openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with 
> > > > D75788.
> > > > By using this ifndef, it seems NVPTX looses quite a few overloaded 
> > > > functions. Are these meant to eventually be present in 
> > > > openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with 
> > > > D75788.
> > > 
> > > Can you provide an example that shows how we "loose" something? So an 
> > > input and command line that should work but doesn't, or that should be 
> > > compiled to something else. That would help me a lot.
> > TLDR, I think nvptx works here, but it's hard to be certain. I've put a few 
> > minutes into looking for something that doesn't work, then much longer 
> > trying to trace where the various functions come from, and have concluded 
> > that the hip cmath header diverging from the cuda cmath header is the root 
> > cause.
> > 
> > The list of functions near the top of `__clang_cuda_cmath.h` is a subset of 
> > libm, e.g.
> > ```
> > __DEVICE__ float acos(float __x) { return ::acosf(__x); }
> > but no acosh
> > ```
> > Later on in the file are:
> > ```
> > __CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, acosh)
> > ```
> > but these are guarded by `#ifndef __OPENMP_NVPTX__`, which suggests they 
> > are not included when using the header from openmp.
> > 
> > However, openmp_wrappers/cmath does include `__DEVICE__ float acosh(float 
> > __x) { return ::acoshf(__x); }` under the comment
> > `// Overloads not provided by the CUDA wrappers by by the CUDA system 
> > headers`
> > 
> > Finally there are some functions that are not in either list, such as 
> > fma(float,float,float), but which are nevertheless resolved, at a guess in 
> > a glibc header.
> > 
> > My current theory is that nvptx gets the set of functions right through a 
> > combination of cuda headers, clang cuda headers, clang openmp headers, 
> > system headers. At least, the half dozen I've tried work, and iirc it 
> > passes the OvO suite which I believe calls all of them. 
> > Wimplicit-float-conversion complains about a few but that seems minor.
> > 
> > Further, I think hip does not get this right, because the hip cmath header 
> > has diverged from the cuda one, and the amdgpu openmp implementation that 
> > tries to use the hip headers does not pass the OvO suite without some hacks.
> >   >  By using this ifndef, it seems NVPTX looses quite a few overloaded 
> > functions. Are these meant to eventually be present in 
> > openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with D75788.
> 
> > Can you provide an example that shows how we "loose" something? So an input 
> > and command line that should work but doesn't, or that should be compiled 
> > to something else. That would help me a lot.
> 
> @estewart08 Feel free to provide me with something that doesn't work even as 
> this goes in. It sounded you had some ideas and I'd like to look into that.
> TLDR, I think nvptx works here, but it's hard to be certain. I've put a few 
> minutes into looking for something that doesn't work, then much longer trying 
> to trace where the various functions come from, and have concluded that the 
> hip cmath header diverging from the cuda cmath header is the root cause.
> 
> The list of functions near the top of `__clang_cuda_cmath.h` is a subset of 
> libm, e.g.
> ```
> __DEVICE__ float acos(float __x) { return ::acosf(__x); }
> but no acosh
> ```
> Later on in the file are:
> ```
> __CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, acosh)
> ```
> but these are guarded by `#ifndef __OPENMP_NVPTX__`, which suggests they are 
> not included when using the header from openmp.
> 
> However, openmp_wrappers/cmath does include `__DEVICE__ float acosh(floa

[PATCH] D104904: [OpenMP][AMDGCN] Initial math headers support

2021-07-09 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added inline comments.



Comment at: clang/test/Headers/Inputs/include/cstdlib:29
 float fabs(float __x) { return __builtin_fabs(__x); }
+#endif
 

JonChesterfield wrote:
> jdoerfert wrote:
> > That seems to be fundamentally broken then, but let's see, maybe it will 
> > somehow work anyway.
> I thought fabs was in math, not stdlib. Not sure what this file is doing but 
> the functions above are inline and fabs isn't
I am afraid this is just a workaround to get openmp_device_math_isnan.cpp to 
pass for AMDGCN. This stems from not having an #ifndef OPENMP_AMDGCN in 
__clang_hip_cmath.h where 'using ::fabs' is present. Currently, OPENMP_AMDGCN 
uses all of the overloaded functions created by the HIP macros where NVPTX does 
not use the CUDA overloads. This may be a new topic of discussion.

https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_cuda_cmath.h#L191

By using this ifndef, it seems NVPTX looses quite a few overloaded functions. 
Are these meant to eventually be present in openmp_wrappers/cmath? Not sure 
what issues @jdoerfert ran into with D75788.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104904/new/

https://reviews.llvm.org/D104904

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104677: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn.

2021-06-23 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added inline comments.



Comment at: clang/test/Headers/hip-header.hip:21
+// RUN:   -target-cpu gfx906 -emit-llvm %s -fcuda-is-device -o - \
+// RUN:   -D__HIPCC_RTC__ -DUSE_ISNAN_WITH_INT_RETURN | FileCheck %s 
-check-prefixes=AMD_INT_RETURN
+// RUN: %clang_cc1 -include __clang_hip_runtime_wrapper.h \

yaxunl wrote:
> where is this macro used and how does it affect HIP? Thanks.
https://github.com/ROCm-Developer-Tools/llvm-project/blob/main/clang/test/Headers/Inputs/include/cmath#L85

For testing purposes we can enable certain return types for isnan.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104677/new/

https://reviews.llvm.org/D104677

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104677: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn.

2021-06-22 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 updated this revision to Diff 353773.
estewart08 added a comment.

  Add test_isnan function to hip-header.hip.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104677/new/

https://reviews.llvm.org/D104677

Files:
  clang/lib/Headers/__clang_hip_cmath.h
  clang/test/Headers/hip-header.hip
  clang/test/Headers/openmp_device_math_isnan.cpp

Index: clang/test/Headers/openmp_device_math_isnan.cpp
===
--- clang/test/Headers/openmp_device_math_isnan.cpp
+++ clang/test/Headers/openmp_device_math_isnan.cpp
@@ -1,11 +1,19 @@
 // RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
 // RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=BOOL_RETURN
+// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple amdgcn-amd-amdhsa -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=AMD_BOOL_RETURN
 // RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast
+// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast
 // RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -ffast-math -ffp-contract=fast | FileCheck %s --check-prefix=BOOL_RETURN
+// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple amdgcn-amd-amdhsa -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -ffast-math -ffp-contract=fast | FileCheck %s --check-prefix=AMD_BOOL_RETURN
 // RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -DUSE_ISNAN_WITH_INT_RETURN
+// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc -DUSE_ISNAN_WITH_INT_RETURN
 // RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -DUSE_ISNAN_WITH_INT_RETURN | FileCheck %s --check-prefix=INT_RETURN
+// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple amdgcn-amd-amdhsa -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -DUSE_ISNAN_WITH_INT_RETURN | FileCheck %s --check-prefix=AMD_INT_RETURN
 // RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast -DUSE_ISNAN_WITH_INT_RETURN
+// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast -DUSE_ISNAN_WITH_INT_RETURN
 // RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvi

[PATCH] D104677: [OpenMP] Apply fix for isnan, isinf and isinfinite for amdgcn.

2021-06-21 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 updated this revision to Diff 353527.
estewart08 added a comment.

Attempt to use clang-format.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104677/new/

https://reviews.llvm.org/D104677

Files:
  clang/lib/Headers/__clang_hip_cmath.h
  clang/test/Headers/openmp_device_math_isnan.cpp

Index: clang/test/Headers/openmp_device_math_isnan.cpp
===
--- clang/test/Headers/openmp_device_math_isnan.cpp
+++ clang/test/Headers/openmp_device_math_isnan.cpp
@@ -1,11 +1,19 @@
 // RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
 // RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=BOOL_RETURN
+// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple amdgcn-amd-amdhsa -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=AMD_BOOL_RETURN
 // RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast
+// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast
 // RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -ffast-math -ffp-contract=fast | FileCheck %s --check-prefix=BOOL_RETURN
+// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple amdgcn-amd-amdhsa -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -ffast-math -ffp-contract=fast | FileCheck %s --check-prefix=AMD_BOOL_RETURN
 // RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -DUSE_ISNAN_WITH_INT_RETURN
+// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc -DUSE_ISNAN_WITH_INT_RETURN
 // RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -DUSE_ISNAN_WITH_INT_RETURN | FileCheck %s --check-prefix=INT_RETURN
+// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple amdgcn-amd-amdhsa -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -DUSE_ISNAN_WITH_INT_RETURN | FileCheck %s --check-prefix=AMD_INT_RETURN
 // RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast -DUSE_ISNAN_WITH_INT_RETURN
+// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast -DUSE_ISNAN_WITH_INT_RETURN
 // RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fo

[PATCH] D104677: [OpenMP] Apply fix for isnan, isinf and isinfinite for amdgcn.

2021-06-21 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 created this revision.
Herald added subscribers: guansong, yaxunl.
estewart08 requested review of this revision.
Herald added a reviewer: jdoerfert.
Herald added subscribers: cfe-commits, sstefan1.
Herald added a project: clang.

This fixes issues with various return types(bool/int) and was already
in place for nvptx headers, adjusted to work for amdgcn.
Similar to D85879 .


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D104677

Files:
  clang/lib/Headers/__clang_hip_cmath.h
  clang/test/Headers/openmp_device_math_isnan.cpp

Index: clang/test/Headers/openmp_device_math_isnan.cpp
===
--- clang/test/Headers/openmp_device_math_isnan.cpp
+++ clang/test/Headers/openmp_device_math_isnan.cpp
@@ -1,11 +1,19 @@
 // RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
+// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
 // RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=BOOL_RETURN
+// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple amdgcn-amd-amdhsa -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=AMD_BOOL_RETURN
 // RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast
+// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast
 // RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -ffast-math -ffp-contract=fast | FileCheck %s --check-prefix=BOOL_RETURN
+// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple amdgcn-amd-amdhsa -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -ffast-math -ffp-contract=fast | FileCheck %s --check-prefix=AMD_BOOL_RETURN
 // RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -DUSE_ISNAN_WITH_INT_RETURN
+// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc -DUSE_ISNAN_WITH_INT_RETURN
 // RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -DUSE_ISNAN_WITH_INT_RETURN | FileCheck %s --check-prefix=INT_RETURN
+// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple amdgcn-amd-amdhsa -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -DUSE_ISNAN_WITH_INT_RETURN | FileCheck %s --check-prefix=AMD_INT_RETURN
 // RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast -DUSE_ISNAN_WITH_INT_RETURN
+// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast -DUSE_ISNAN_WITH_INT_R

[PATCH] D101911: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-05-07 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 accepted this revision.
estewart08 added a comment.
This revision is now accepted and ready to land.

LGTM as a temporary workaround until SPMD properly assigns team private 
variables.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101911/new/

https://reviews.llvm.org/D101911

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D101911: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-05-07 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D101911#2738994 , @ABataev wrote:

> Hi Ethan, try this patch if it fixes the issue.

Tested this on gfx906 and v100, the main reproducer now passes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101911/new/

https://reviews.llvm.org/D101911

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-05-05 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D99432#2736981 , @ABataev wrote:

> In D99432#2736970 , @estewart08 
> wrote:
>
>> In D99432#2728788 , @ABataev wrote:
>>
>>> In D99432#2726997 , @estewart08 
>>> wrote:
>>>
 In D99432#2726845 , @ABataev 
 wrote:

> In D99432#2726588 , @estewart08 
> wrote:
>
>> In D99432#2726391 , @ABataev 
>> wrote:
>>
>>> In D99432#2726337 , 
>>> @estewart08 wrote:
>>>
 In D99432#2726060 , @ABataev 
 wrote:

> In D99432#2726050 , 
> @estewart08 wrote:
>
>> In D99432#2726025 , 
>> @ABataev wrote:
>>
>>> In D99432#2726019 , 
>>> @estewart08 wrote:
>>>
 In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do 
 not see how this helps SPMD mode with team privatization of 
 declarations in-between target teams and parallel regions.
>>>
>>> Diв you try the reproducer with the applied patch?
>>
>> Yes, I still saw the test fail, although it was not with latest 
>> llvm-project. Are you saying the reproducer passes for you?
>
> I don't have CUDA installed but from what I see in the LLVM IR it 
> shall pass. Do you have a debug log, does it crashes or produces 
> incorrect results?

 This is on an AMDGPU but I assume the behavior would be similar for 
 NVPTX.

 It produces incorrect/incomplete results in the dist[0] index after a 
 manual reduction and in turn the final global gpu_results array is 
 incorrect.
 When thread 0 does a reduction into dist[0] it has no knowledge of 
 dist[1] having been updated by thread 1. Which tells me the array is 
 still thread private.
 Adding some printfs, looking at one teams' output:

 SPMD

   Thread 0: dist[0]: 1
   Thread 0: dist[1]: 0  // This should be 1
   After reduction into dist[0]: 1  // This should be 2
   gpu_results = [1,1]  // [2,2] expected

 Generic Mode:

   Thread 0: dist[0]: 1
   Thread 0: dist[1]: 1   
   After reduction into dist[0]: 2
   gpu_results = [2,2]
>>>
>>> Hmm, I would expect a crash if the array was allocated in the local 
>>> memory. Could you try to add some more printfs (with data and addresses 
>>> of the array) to check the results? Maybe there is a data race 
>>> somewhere in the code?
>>
>> As a reminder, each thread updates a unique index in the dist array and 
>> each team updates a unique index in gpu_results.
>>
>> SPMD - shows each thread has a unique address for dist array
>>
>>   Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
>>   Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
>>   
>>   Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
>>   Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
>>   
>>   Team 0 Thread 0: After reduction into dist[0]: 1
>>   Team 0 Thread 0: gpu_results address: 0x7f92a500
>>   --
>>   Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
>>   Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
>>   
>>   Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
>>   Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
>>   
>>   Team 1 Thread 0: After reduction into dist[0]: 1
>>   Team 1 Thread 0: gpu_results address: 0x7f92a500
>>   
>>   gpu_results[0]: 1
>>   gpu_results[1]: 1
>>
>> Generic - shows each team shares dist array address amongst threads
>>
>>   Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
>>   Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
>>   
>>   Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
>>   Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
>>   
>>   Team 0 Thread 0: After reduction into dist[0]: 2
>>   Team 0 Thread 0: gpu_results address: 0x7fabc500
>>   --
>>   Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
>>   Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
>>   
>>   Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
>>   Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14
>>   
>>   Team 1 Thread 0: After reduction into dist[0]: 2
>>   Team 1 Thread 0: gpu_results address: 0x7fabc500
>
> Co

[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-05-04 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D99432#2728788 , @ABataev wrote:

> In D99432#2726997 , @estewart08 
> wrote:
>
>> In D99432#2726845 , @ABataev wrote:
>>
>>> In D99432#2726588 , @estewart08 
>>> wrote:
>>>
 In D99432#2726391 , @ABataev 
 wrote:

> In D99432#2726337 , @estewart08 
> wrote:
>
>> In D99432#2726060 , @ABataev 
>> wrote:
>>
>>> In D99432#2726050 , 
>>> @estewart08 wrote:
>>>
 In D99432#2726025 , @ABataev 
 wrote:

> In D99432#2726019 , 
> @estewart08 wrote:
>
>> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do 
>> not see how this helps SPMD mode with team privatization of 
>> declarations in-between target teams and parallel regions.
>
> Diв you try the reproducer with the applied patch?

 Yes, I still saw the test fail, although it was not with latest 
 llvm-project. Are you saying the reproducer passes for you?
>>>
>>> I don't have CUDA installed but from what I see in the LLVM IR it shall 
>>> pass. Do you have a debug log, does it crashes or produces incorrect 
>>> results?
>>
>> This is on an AMDGPU but I assume the behavior would be similar for 
>> NVPTX.
>>
>> It produces incorrect/incomplete results in the dist[0] index after a 
>> manual reduction and in turn the final global gpu_results array is 
>> incorrect.
>> When thread 0 does a reduction into dist[0] it has no knowledge of 
>> dist[1] having been updated by thread 1. Which tells me the array is 
>> still thread private.
>> Adding some printfs, looking at one teams' output:
>>
>> SPMD
>>
>>   Thread 0: dist[0]: 1
>>   Thread 0: dist[1]: 0  // This should be 1
>>   After reduction into dist[0]: 1  // This should be 2
>>   gpu_results = [1,1]  // [2,2] expected
>>
>> Generic Mode:
>>
>>   Thread 0: dist[0]: 1
>>   Thread 0: dist[1]: 1   
>>   After reduction into dist[0]: 2
>>   gpu_results = [2,2]
>
> Hmm, I would expect a crash if the array was allocated in the local 
> memory. Could you try to add some more printfs (with data and addresses 
> of the array) to check the results? Maybe there is a data race somewhere 
> in the code?

 As a reminder, each thread updates a unique index in the dist array and 
 each team updates a unique index in gpu_results.

 SPMD - shows each thread has a unique address for dist array

   Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
   Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
   
   Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
   Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
   
   Team 0 Thread 0: After reduction into dist[0]: 1
   Team 0 Thread 0: gpu_results address: 0x7f92a500
   --
   Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
   Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
   
   Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
   Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
   
   Team 1 Thread 0: After reduction into dist[0]: 1
   Team 1 Thread 0: gpu_results address: 0x7f92a500
   
   gpu_results[0]: 1
   gpu_results[1]: 1

 Generic - shows each team shares dist array address amongst threads

   Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
   Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
   
   Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
   Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
   
   Team 0 Thread 0: After reduction into dist[0]: 2
   Team 0 Thread 0: gpu_results address: 0x7fabc500
   --
   Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
   Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
   
   Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
   Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14
   
   Team 1 Thread 0: After reduction into dist[0]: 2
   Team 1 Thread 0: gpu_results address: 0x7fabc500
>>>
>>> Could you check if it works with `-fno-openmp-cuda-parallel-target-regions` 
>>> option?
>>
>> Unfortunately that crashes:
>> llvm-project/llvm/lib/IR/Instructions.cpp:495: void 
>> llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, 
>> llvm::ArrayRef, 
>> llvm::ArrayRef >, const llvm::Twine&): 
>> Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == 
>> Args[i]->getType()) && "Calling a fun

[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D99432#2726845 , @ABataev wrote:

> In D99432#2726588 , @estewart08 
> wrote:
>
>> In D99432#2726391 , @ABataev wrote:
>>
>>> In D99432#2726337 , @estewart08 
>>> wrote:
>>>
 In D99432#2726060 , @ABataev 
 wrote:

> In D99432#2726050 , @estewart08 
> wrote:
>
>> In D99432#2726025 , @ABataev 
>> wrote:
>>
>>> In D99432#2726019 , 
>>> @estewart08 wrote:
>>>
 In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not 
 see how this helps SPMD mode with team privatization of declarations 
 in-between target teams and parallel regions.
>>>
>>> Diв you try the reproducer with the applied patch?
>>
>> Yes, I still saw the test fail, although it was not with latest 
>> llvm-project. Are you saying the reproducer passes for you?
>
> I don't have CUDA installed but from what I see in the LLVM IR it shall 
> pass. Do you have a debug log, does it crashes or produces incorrect 
> results?

 This is on an AMDGPU but I assume the behavior would be similar for NVPTX.

 It produces incorrect/incomplete results in the dist[0] index after a 
 manual reduction and in turn the final global gpu_results array is 
 incorrect.
 When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] 
 having been updated by thread 1. Which tells me the array is still thread 
 private.
 Adding some printfs, looking at one teams' output:

 SPMD

   Thread 0: dist[0]: 1
   Thread 0: dist[1]: 0  // This should be 1
   After reduction into dist[0]: 1  // This should be 2
   gpu_results = [1,1]  // [2,2] expected

 Generic Mode:

   Thread 0: dist[0]: 1
   Thread 0: dist[1]: 1   
   After reduction into dist[0]: 2
   gpu_results = [2,2]
>>>
>>> Hmm, I would expect a crash if the array was allocated in the local memory. 
>>> Could you try to add some more printfs (with data and addresses of the 
>>> array) to check the results? Maybe there is a data race somewhere in the 
>>> code?
>>
>> As a reminder, each thread updates a unique index in the dist array and each 
>> team updates a unique index in gpu_results.
>>
>> SPMD - shows each thread has a unique address for dist array
>>
>>   Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
>>   Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
>>   
>>   Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
>>   Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
>>   
>>   Team 0 Thread 0: After reduction into dist[0]: 1
>>   Team 0 Thread 0: gpu_results address: 0x7f92a500
>>   --
>>   Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
>>   Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
>>   
>>   Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
>>   Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
>>   
>>   Team 1 Thread 0: After reduction into dist[0]: 1
>>   Team 1 Thread 0: gpu_results address: 0x7f92a500
>>   
>>   gpu_results[0]: 1
>>   gpu_results[1]: 1
>>
>> Generic - shows each team shares dist array address amongst threads
>>
>>   Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
>>   Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
>>   
>>   Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
>>   Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
>>   
>>   Team 0 Thread 0: After reduction into dist[0]: 2
>>   Team 0 Thread 0: gpu_results address: 0x7fabc500
>>   --
>>   Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
>>   Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
>>   
>>   Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
>>   Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14
>>   
>>   Team 1 Thread 0: After reduction into dist[0]: 2
>>   Team 1 Thread 0: gpu_results address: 0x7fabc500
>
> Could you check if it works with `-fno-openmp-cuda-parallel-target-regions` 
> option?

Unfortunately that crashes:
llvm-project/llvm/lib/IR/Instructions.cpp:495: void 
llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, 
llvm::ArrayRef, 
llvm::ArrayRef >, const llvm::Twine&): 
Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == 
Args[i]->getType()) && "Calling a function with a bad signature!"' failed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D99432#2726391 , @ABataev wrote:

> In D99432#2726337 , @estewart08 
> wrote:
>
>> In D99432#2726060 , @ABataev wrote:
>>
>>> In D99432#2726050 , @estewart08 
>>> wrote:
>>>
 In D99432#2726025 , @ABataev 
 wrote:

> In D99432#2726019 , @estewart08 
> wrote:
>
>> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not 
>> see how this helps SPMD mode with team privatization of declarations 
>> in-between target teams and parallel regions.
>
> Diв you try the reproducer with the applied patch?

 Yes, I still saw the test fail, although it was not with latest 
 llvm-project. Are you saying the reproducer passes for you?
>>>
>>> I don't have CUDA installed but from what I see in the LLVM IR it shall 
>>> pass. Do you have a debug log, does it crashes or produces incorrect 
>>> results?
>>
>> This is on an AMDGPU but I assume the behavior would be similar for NVPTX.
>>
>> It produces incorrect/incomplete results in the dist[0] index after a manual 
>> reduction and in turn the final global gpu_results array is incorrect.
>> When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] 
>> having been updated by thread 1. Which tells me the array is still thread 
>> private.
>> Adding some printfs, looking at one teams' output:
>>
>> SPMD
>>
>>   Thread 0: dist[0]: 1
>>   Thread 0: dist[1]: 0  // This should be 1
>>   After reduction into dist[0]: 1  // This should be 2
>>   gpu_results = [1,1]  // [2,2] expected
>>
>> Generic Mode:
>>
>>   Thread 0: dist[0]: 1
>>   Thread 0: dist[1]: 1   
>>   After reduction into dist[0]: 2
>>   gpu_results = [2,2]
>
> Hmm, I would expect a crash if the array was allocated in the local memory. 
> Could you try to add some more printfs (with data and addresses of the array) 
> to check the results? Maybe there is a data race somewhere in the code?

As a reminder, each thread updates a unique index in the dist array and each 
team updates a unique index in gpu_results.

SPMD - shows each thread has a unique address for dist array

  Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
  Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
  
  Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
  Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
  
  Team 0 Thread 0: After reduction into dist[0]: 1
  Team 0 Thread 0: gpu_results address: 0x7f92a500
  --
  Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
  Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
  
  Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
  Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
  
  Team 1 Thread 0: After reduction into dist[0]: 1
  Team 1 Thread 0: gpu_results address: 0x7f92a500
  
  gpu_results[0]: 1
  gpu_results[1]: 1

Generic - shows each team shares dist array address amongst threads

  Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
  Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
  
  Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
  Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
  
  Team 0 Thread 0: After reduction into dist[0]: 2
  Team 0 Thread 0: gpu_results address: 0x7fabc500
  --
  Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
  Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
  
  Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
  Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14
  
  Team 1 Thread 0: After reduction into dist[0]: 2
  Team 1 Thread 0: gpu_results address: 0x7fabc500


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D99432#2726060 , @ABataev wrote:

> In D99432#2726050 , @estewart08 
> wrote:
>
>> In D99432#2726025 , @ABataev wrote:
>>
>>> In D99432#2726019 , @estewart08 
>>> wrote:
>>>
 In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see 
 how this helps SPMD mode with team privatization of declarations 
 in-between target teams and parallel regions.
>>>
>>> Diв you try the reproducer with the applied patch?
>>
>> Yes, I still saw the test fail, although it was not with latest 
>> llvm-project. Are you saying the reproducer passes for you?
>
> I don't have CUDA installed but from what I see in the LLVM IR it shall pass. 
> Do you have a debug log, does it crashes or produces incorrect results?

This is on an AMDGPU but I assume the behavior would be similar for NVPTX.

It produces incorrect/incomplete results in the dist[0] index after a manual 
reduction and in turn the final global gpu_results array is incorrect.
When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] 
having been updated by thread 1. Which tells me the array is still thread 
private.
Adding some printfs, looking at one teams' output:

SPMD

  Thread 0: dist[0]: 1
  Thread 0: dist[1]: 0  // This should be 1
  After reduction into dist[0]: 1  // This should be 2
  gpu_results = [1,1]  // [2,2] expected

Generic Mode:

  Thread 0: dist[0]: 1
  Thread 0: dist[1]: 1   
  After reduction into dist[0]: 2
  gpu_results = [2,2]


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D99432#2726025 , @ABataev wrote:

> In D99432#2726019 , @estewart08 
> wrote:
>
>> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see 
>> how this helps SPMD mode with team privatization of declarations in-between 
>> target teams and parallel regions.
>
> Diв you try the reproducer with the applied patch?

Yes, I still saw the test fail, although it was not with latest llvm-project. 
Are you saying the reproducer passes for you?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see how 
this helps SPMD mode with team privatization of declarations in-between target 
teams and parallel regions.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits