[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-22 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D79526#2042680 , @tra wrote:

> Reduced test case:
>
>   struct a {
> __attribute__((device)) a(short);
> __attribute__((device)) operator unsigned() const;
> __attribute__((device)) operator int() const;
>   };
>   struct b {
> a d;
>   };
>   void f(b g) { b e = g; }
>
>
> Failure:
>
>   $ bin/clang++ -x cuda aten.cc -fsyntax-only  
> --cuda-path=$HOME/local/cuda-10.1 --cuda-device-only --cuda-gpu-arch=sm_60 
> -stdlib=libc++ -std=c++17 -ferror-limit=1
>  
>   aten.cc:6:8: error: conversion from 'const a' to 'short' is ambiguous
>   struct b {
>  ^
>   aten.cc:9:21: note: in implicit copy constructor for 'b' first required here
>   void f(b g) { b e = g; }
>   ^
>   aten.cc:3:27: note: candidate function
> __attribute__((device)) operator unsigned() const;
> ^
>   aten.cc:4:27: note: candidate function
> __attribute__((device)) operator int() const;
> ^
>   aten.cc:2:34: note: passing argument to parameter here
> __attribute__((device)) a(short);
>^
>   1 error generated when compiling for sm_60.
>
>
> The same code compiles fine in C++ and I would expect it to work on device 
> side the same way.


a and b both have an implicit HD copy ctor. In device compilation, copy ctor of 
b is calling copy ctor of a. There are two candidates: implicit HD copy ctor of 
a, and device ctor a(short).

In my previous fix, I made H and implicit HD candidate equal, however I forgot 
about the relation between D candidate and HD candidate. I incorrectly made D 
favored over HD and H. This caused inferior device candidate a(short) chosen 
over copy ctor of a.

I have a fix for this https://reviews.llvm.org/D80450


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

Reduced test case:

  struct a {
__attribute__((device)) a(short);
__attribute__((device)) operator unsigned() const;
__attribute__((device)) operator int() const;
  };
  struct b {
a d;
  };
  void f(b g) { b e = g; }

Failure:

  $ bin/clang++ -x cuda aten.cc -fsyntax-only  
--cuda-path=$HOME/local/cuda-10.1 --cuda-device-only --cuda-gpu-arch=sm_60 
-stdlib=libc++ -std=c++17 -ferror-limit=1
  
  aten.cc:6:8: error: conversion from 'const a' to 'short' is ambiguous
  struct b {
 ^
  aten.cc:9:21: note: in implicit copy constructor for 'b' first required here
  void f(b g) { b e = g; }
  ^
  aten.cc:3:27: note: candidate function
__attribute__((device)) operator unsigned() const;
^
  aten.cc:4:27: note: candidate function
__attribute__((device)) operator int() const;
^
  aten.cc:2:34: note: passing argument to parameter here
__attribute__((device)) a(short);
   ^
  1 error generated when compiling for sm_60.

The same code compiles fine in C++ and I would expect it to work on device side 
the same way.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-18 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

e03394c6a6ff5832aa43259d4b8345f40ca6a22c 
 Still 
breaks some of the existing CUDA code (got failures in pytorch and Eigen). I'll 
revert the patch and will send you a reduced reproducer.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-12 Thread Yaxun Liu via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
yaxunl marked 3 inline comments as done.
Closed by commit rGe03394c6a6ff: [CUDA][HIP] Workaround for resolving host 
device function against wrong-sided… (authored by yaxunl).
Herald added a project: clang.

Changed prior to commit:
  https://reviews.llvm.org/D79526?vs=263268=263408#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526

Files:
  clang/include/clang/Sema/Sema.h
  clang/lib/Sema/SemaCUDA.cpp
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCUDA/function-overload.cu

Index: clang/test/SemaCUDA/function-overload.cu
===
--- clang/test/SemaCUDA/function-overload.cu
+++ clang/test/SemaCUDA/function-overload.cu
@@ -1,8 +1,8 @@
 // REQUIRES: x86-registered-target
 // REQUIRES: nvptx-registered-target
 
-// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify %s
-// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -std=c++14 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify %s
+// RUN: %clang_cc1 -std=c++14 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify %s
 
 #include "Inputs/cuda.h"
 
@@ -14,6 +14,13 @@
 struct HostDeviceReturnTy {};
 struct TemplateReturnTy {};
 
+struct CorrectOverloadRetTy{};
+#if __CUDA_ARCH__
+// expected-note@-2 {{candidate constructor (the implicit copy constructor) not viable: no known conversion from 'IncorrectOverloadRetTy' to 'const CorrectOverloadRetTy &' for 1st argument}}
+// expected-note@-3 {{candidate constructor (the implicit move constructor) not viable: no known conversion from 'IncorrectOverloadRetTy' to 'CorrectOverloadRetTy &&' for 1st argument}}
+#endif
+struct IncorrectOverloadRetTy{};
+
 typedef HostReturnTy (*HostFnPtr)();
 typedef DeviceReturnTy (*DeviceFnPtr)();
 typedef HostDeviceReturnTy (*HostDeviceFnPtr)();
@@ -331,9 +338,6 @@
 // If we have a mix of HD and H-only or D-only candidates in the overload set,
 // normal C++ overload resolution rules apply first.
 template  TemplateReturnTy template_vs_hd_function(T arg)
-#ifdef __CUDA_ARCH__
-//expected-note@-2 {{declared here}}
-#endif
 {
   return TemplateReturnTy();
 }
@@ -342,11 +346,13 @@
 }
 
 __host__ __device__ void test_host_device_calls_hd_template() {
-  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
-  TemplateReturnTy ret2 = template_vs_hd_function(1);
 #ifdef __CUDA_ARCH__
-  // expected-error@-2 {{reference to __host__ function 'template_vs_hd_function' in __host__ __device__ function}}
+  typedef HostDeviceReturnTy ExpectedReturnTy;
+#else
+  typedef TemplateReturnTy ExpectedReturnTy;
 #endif
+  HostDeviceReturnTy ret1 = template_vs_hd_function(1.0f);
+  ExpectedReturnTy ret2 = template_vs_hd_function(1);
 }
 
 __host__ void test_host_calls_hd_template() {
@@ -367,14 +373,14 @@
 __device__ DeviceReturnTy device_only_function(int arg) { return DeviceReturnTy(); }
 __device__ DeviceReturnTy2 device_only_function(float arg) { return DeviceReturnTy2(); }
 #ifndef __CUDA_ARCH__
-  // expected-note@-3 {{'device_only_function' declared here}}
-  // expected-note@-3 {{'device_only_function' declared here}}
+  // expected-note@-3 2{{'device_only_function' declared here}}
+  // expected-note@-3 2{{'device_only_function' declared here}}
 #endif
 __host__ HostReturnTy host_only_function(int arg) { return HostReturnTy(); }
 __host__ HostReturnTy2 host_only_function(float arg) { return HostReturnTy2(); }
 #ifdef __CUDA_ARCH__
-  // expected-note@-3 {{'host_only_function' declared here}}
-  // expected-note@-3 {{'host_only_function' declared here}}
+  // expected-note@-3 2{{'host_only_function' declared here}}
+  // expected-note@-3 2{{'host_only_function' declared here}}
 #endif
 
 __host__ __device__ void test_host_device_single_side_overloading() {
@@ -392,6 +398,37 @@
 #endif
 }
 
+// wrong-sided overloading should not cause diagnostic unless it is emitted.
+// This inline function is not emitted.
+inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_no_diag() {
+  DeviceReturnTy ret1 = device_only_function(1);
+  DeviceReturnTy2 ret2 = device_only_function(1.0f);
+  HostReturnTy ret3 = host_only_function(1);
+  HostReturnTy2 ret4 = host_only_function(1.0f);
+}
+
+// wrong-sided overloading should cause diagnostic if it is emitted.
+// This inline function is emitted since it is called by an emitted function.
+inline __host__ __device__ void test_host_device_wrong_side_overloading_inline_diag() {
+  DeviceReturnTy ret1 = device_only_function(1);
+  DeviceReturnTy2 ret2 = device_only_function(1.0f);
+#ifndef __CUDA_ARCH__
+  // expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ function}}
+  // expected-error@-3 {{reference to __device__ function 'device_only_function' in __host__ __device__ 

[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-11 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked 6 inline comments as done.
yaxunl added inline comments.



Comment at: clang/test/SemaCUDA/function-overload.cu:479
+namespace ImplicitHostDeviceVsWrongSided {
+inline CorrectOverloadRetTy callee(double x);
+#pragma clang force_cuda_host_device begin

tra wrote:
> Is `inline` necessary in these new tests? Please remove it where it's not 
> needed.
It is not needed by callee but needed by caller to make sure it causes deferred 
diagnostics. Will remove it from callees.



Comment at: clang/test/SemaCUDA/function-overload.cu:493
+namespace ImplicitHostDeviceVsSameSide {
+inline InCorrectOverloadRetTy callee(int x);
+#pragma clang force_cuda_host_device begin

tra wrote:
> Nit: `Incorrect` should not have `C` capitalized as it's one word.
will fix.



Comment at: clang/test/SemaCUDA/function-overload.cu:502-529
+// In the implicit host device function 'caller', the second 'callee' should be
+// since it has better match, even though it is an implicit host device 
function
+// whereas the first 'callee' is a host function. A diagnostic will be emitted
+// if the first 'callee' is chosen since deduced return type cannot be used
+// before it is defined.
+namespace ImplicitHostDeviceByConstExpr {
+template  a b;

tra wrote:
> Please move this test below the other two as keeping them together is useful 
> to illustrate the differences in behavior of overloading in explicit HD vs 
> implicit HD functions.
will do


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-11 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision.
tra added a comment.
This revision is now accepted and ready to land.

LGTM, modulo cosmetic test changes mentioned below.




Comment at: clang/test/SemaCUDA/function-overload.cu:479
+namespace ImplicitHostDeviceVsWrongSided {
+inline CorrectOverloadRetTy callee(double x);
+#pragma clang force_cuda_host_device begin

Is `inline` necessary in these new tests? Please remove it where it's not 
needed.



Comment at: clang/test/SemaCUDA/function-overload.cu:493
+namespace ImplicitHostDeviceVsSameSide {
+inline InCorrectOverloadRetTy callee(int x);
+#pragma clang force_cuda_host_device begin

Nit: `Incorrect` should not have `C` capitalized as it's one word.



Comment at: clang/test/SemaCUDA/function-overload.cu:502-529
+// In the implicit host device function 'caller', the second 'callee' should be
+// since it has better match, even though it is an implicit host device 
function
+// whereas the first 'callee' is a host function. A diagnostic will be emitted
+// if the first 'callee' is chosen since deduced return type cannot be used
+// before it is defined.
+namespace ImplicitHostDeviceByConstExpr {
+template  a b;

Please move this test below the other two as keeping them together is useful to 
illustrate the differences in behavior of overloading in explicit HD vs 
implicit HD functions.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-11 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 263268.
yaxunl marked 3 inline comments as done.
yaxunl added a comment.

revised by Artem's comments.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526

Files:
  clang/include/clang/Sema/Sema.h
  clang/lib/Sema/SemaCUDA.cpp
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCUDA/function-overload.cu

Index: clang/test/SemaCUDA/function-overload.cu
===
--- clang/test/SemaCUDA/function-overload.cu
+++ clang/test/SemaCUDA/function-overload.cu
@@ -1,8 +1,8 @@
 // REQUIRES: x86-registered-target
 // REQUIRES: nvptx-registered-target
 
-// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify %s
-// RUN: %clang_cc1 -std=c++11 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -std=c++14 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify %s
+// RUN: %clang_cc1 -std=c++14 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify %s
 
 #include "Inputs/cuda.h"
 
@@ -14,6 +14,13 @@
 struct HostDeviceReturnTy {};
 struct TemplateReturnTy {};
 
+struct CorrectOverloadRetTy{};
+#if __CUDA_ARCH__
+// expected-note@-2 {{candidate constructor (the implicit copy constructor) not viable: no known conversion from 'InCorrectOverloadRetTy' to 'const CorrectOverloadRetTy &' for 1st argument}}
+// expected-note@-3 {{candidate constructor (the implicit move constructor) not viable: no known conversion from 'InCorrectOverloadRetTy' to 'CorrectOverloadRetTy &&' for 1st argument}}
+#endif
+struct InCorrectOverloadRetTy{};
+
 typedef HostReturnTy (*HostFnPtr)();
 typedef DeviceReturnTy (*DeviceFnPtr)();
 typedef HostDeviceReturnTy (*HostDeviceFnPtr)();
@@ -463,3 +470,74 @@
 void foo() {
   __test();
 }
+
+// Test resolving implicit host device candidate vs wrong-sided candidate.
+// In device compilation, implicit host device caller choose implicit host
+// device candidate and wrong-sided candidate with equal preference.
+// Resolution result should not change with/without pragma.
+namespace ImplicitHostDeviceVsWrongSided {
+inline CorrectOverloadRetTy callee(double x);
+#pragma clang force_cuda_host_device begin
+inline InCorrectOverloadRetTy callee(int x);
+inline CorrectOverloadRetTy implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end
+}
+
+// Test resolving implicit host device candidate vs same-sided candidate.
+// In host compilation, implicit host device caller choose implicit host
+// device candidate and same-sided candidate with equal preference.
+// Resolution result should not change with/without pragma.
+namespace ImplicitHostDeviceVsSameSide {
+inline InCorrectOverloadRetTy callee(int x);
+#pragma clang force_cuda_host_device begin
+inline CorrectOverloadRetTy callee(double x);
+inline CorrectOverloadRetTy implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end
+}
+
+// In the implicit host device function 'caller', the second 'callee' should be
+// since it has better match, even though it is an implicit host device function
+// whereas the first 'callee' is a host function. A diagnostic will be emitted
+// if the first 'callee' is chosen since deduced return type cannot be used
+// before it is defined.
+namespace ImplicitHostDeviceByConstExpr {
+template  a b;
+auto callee(...);
+template  constexpr auto callee(d) -> decltype(0);
+struct e {
+  template  static auto g(ad, f...) {
+return h)...>;
+  }
+  struct i {
+template  static constexpr auto caller(f... k) {
+  return callee(k...);
+}
+  };
+  template  static auto h() {
+return i::caller;
+  }
+};
+class l {
+  l() {
+e::g([] {}, this);
+  }
+};
+}
+
+// Test resolving explicit host device candidate vs. wrong-sided candidate.
+// Explicit host device caller favors host device candidate against wrong-sided
+// candidate.
+namespace ExplicitHostDeviceVsWrongSided {
+inline CorrectOverloadRetTy callee(double x);
+inline __host__ __device__ InCorrectOverloadRetTy callee(int x);
+inline __host__ __device__ CorrectOverloadRetTy explicit_hd_caller() {
+  return callee(1.0);
+#if __CUDA_ARCH__
+  // expected-error@-2 {{no viable conversion from returned value of type 'InCorrectOverloadRetTy' to function return type 'CorrectOverloadRetTy'}}
+#endif
+}
+}
Index: clang/lib/Sema/SemaOverload.cpp
===
--- clang/lib/Sema/SemaOverload.cpp
+++ clang/lib/Sema/SemaOverload.cpp
@@ -9517,11 +9517,27 @@
   // in global variable initializers once proper context is added.
   if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
 if (FunctionDecl *Caller = dyn_cast(S.CurContext)) {
+  bool IsCallerImplicitHD = Sema::IsCUDAImplicitHostDeviceFunction(Caller);
+  bool IsCand1ImplicitHD =
+  Sema::IsCUDAImplicitHostDeviceFunction(Cand1.Function);
+  bool 

[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-11 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked 6 inline comments as done.
yaxunl added inline comments.



Comment at: clang/include/clang/Sema/Sema.h:11670
 
+  bool IsCUDAImplicitHostDeviceFunction(const FunctionDecl *D);
+

tra wrote:
> I think this can be `static` as it does not need Sema's state.
will do



Comment at: clang/lib/Sema/SemaCUDA.cpp:217-220
+  if (auto *A = D->getAttr())
+if (A->isImplicit())
+  return true;
+  return D->isImplicit();

tra wrote:
> Is it possible for us to ever end up here with an explicitly set attribute 
> but with an implicit function? If that were to happen, we'd return true and 
> that would be incorrect.
> Perhaps add an assert to make sure it does not happen or always return 
> `A->isImplicit()` if an attribute is already set.
will return A->isImplicit()



Comment at: clang/test/SemaCUDA/function-overload.cu:471-477
+inline double callee(double x);
+#pragma clang force_cuda_host_device begin
+inline void callee(int x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end

tra wrote:
> yaxunl wrote:
> > tra wrote:
> > > These tests only veryfy that the code compiled, but it does not guarantee 
> > > that we've picked the correct overload.
> > > You should give callees different return types and assign the result to a 
> > > variable of intended type.  See `test_host_device_calls_hd_template() ` 
> > > on line 341 for an example.
> > they have different return types. The right one returns double and the 
> > wrong one returns void. If the wrong one is chosen, there is syntax error 
> > since the caller returns double.
> Ah. I've missed it. Could you change the types to `struct 
> CorrectOverloadRetTy`/`struct IncorrectOverloadRetTy` to make it more obvious?
will do


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments.



Comment at: clang/include/clang/Sema/Sema.h:11670
 
+  bool IsCUDAImplicitHostDeviceFunction(const FunctionDecl *D);
+

I think this can be `static` as it does not need Sema's state.



Comment at: clang/lib/Sema/SemaCUDA.cpp:217-220
+  if (auto *A = D->getAttr())
+if (A->isImplicit())
+  return true;
+  return D->isImplicit();

Is it possible for us to ever end up here with an explicitly set attribute but 
with an implicit function? If that were to happen, we'd return true and that 
would be incorrect.
Perhaps add an assert to make sure it does not happen or always return 
`A->isImplicit()` if an attribute is already set.



Comment at: clang/test/SemaCUDA/function-overload.cu:471-477
+inline double callee(double x);
+#pragma clang force_cuda_host_device begin
+inline void callee(int x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end

yaxunl wrote:
> tra wrote:
> > These tests only veryfy that the code compiled, but it does not guarantee 
> > that we've picked the correct overload.
> > You should give callees different return types and assign the result to a 
> > variable of intended type.  See `test_host_device_calls_hd_template() ` on 
> > line 341 for an example.
> they have different return types. The right one returns double and the wrong 
> one returns void. If the wrong one is chosen, there is syntax error since the 
> caller returns double.
Ah. I've missed it. Could you change the types to `struct 
CorrectOverloadRetTy`/`struct IncorrectOverloadRetTy` to make it more obvious?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-10 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D79526#2027552 , @tra wrote:

> In D79526#2027470 , @yaxunl wrote:
>
> > For implicit host device functions, since they are not guaranteed to work 
> > in device compilation, we can only resolve them as if they are host 
> > functions. This causes asymmetry but implicit host device functions are 
> > originally host functions so it is biased toward host compilation in the 
> > beginning.
>
>
> I don't think that the assertion that `implicit host device functions are 
> originally host functions` is always true. While in practice most such 
> functions may indeed come from the existing host code (e.g. the standard 
> library), I don't see any inherent reason why they can't come from the code 
> written for GPU. E.g. thrust is likely to have some implicitly HD functions 
> in the code that was not intended for CPUs and your assumption will be wrong. 
> Even if such case may not exist now, it would not be unreasonable for users 
> to have such code on device. 
>  This overload resolution difference is observable and it will likely create 
> new corner cases in convoluted enough C++ code.


I agree currently it is possible to force a device function to be implicitly 
host device by pragma. However it is arguable whether we should have special 
handling of overload resolution in this case. We do special handling of 
overload resolution because we can not modify some system headers which are 
intended for host originally. If a function was originally device function, it 
is CUDA/HIP code and it should follow normal overloading resolution rule and 
should be fixed if issues occur when it is marked as a host device function.

> I think we need something more principled than "happens to work for existing 
> code".
> 
>> Only the original resolution guarantees no other issues.  For example, in 
>> the failed compilation in TF, some ctor of std::atomic becomes implicit host 
>> device function because it is constexpr. We should treated as wrong-sided in 
>> device compilation, but we should treated as same-sided in host compilation, 
>> otherwise it changes the resolution in host compilation and causes other 
>> issues.
> 
> It may be true for atomic, where we do need to have GPU-specific 
> implementation. However, I can also see classes with constexpr constructors 
> that are prefectly usable on both sides and do not have to be treated as the 
> wrong-side.

Before this patch (together with the reverted commit), the device host 
candidates are always treated with the same preference as wrong-sided 
candidates in device compilation, so a wrong-sided candidate may hide a viable 
host device candidate. This patch fixes that for most cases, including: 1. host 
compilation 2. explicit host device caller 3. explicit host device callee. Only 
in device compilation when an implicit host device caller calls an implicit 
host device callee we apply the special 'incorrect' overloading resolution 
rule. If the special handling causes undesirable effect on users code, users 
can either mark the caller or callee to be explicit host device to bypass the 
special handling.

> TBH, I do not see any reasonable way to deal with this with the current 
> implementation of how HD functions are treated. This patch and its base do 
> improve things somewhat, but it all comes at the cost of further complexity 
> and potentially paints us even deeper into a corner. Current behavior is 
> already rather hard to explain.
> 
> Some time back @wash from NVIDIA was asking about improving HD function 
> handling. Maybe it's time for all interested parties to figure out whether 
> it's time to come up with a better solution. Not in this patch, obviously.

This patch is trying to fix the incorrect overloading resolution rule about 
host device callee in host device caller. It should be favored over wrong-sided 
callee but currently it is not.

If we reject this patch, we have to bear with the incorrect overloading rule 
until a better fix is implemented.

The complexity introduced by this patch is that it needs to have special rule 
for implicit host device caller and implicit host device callee in device 
compilation, where implicit host device callee is not favored over wrong-sided 
callee to preserve the overloading resolution result as if they are both host 
callees. This is to allow some functions in system headers becoming implicitly 
host device functions without causing undeferrable diagnostics.

The complexity introduced in the compiler code is not significant: a new 
function Sema::IsCUDAImplicitHostDeviceFunction is introduced and used in 
isBetterOverloadCandidate to detect the special situation that needs special 
handling. The code for special handling is trivial.

The complexity introduced in the overloading resolution rule is somehow 
concerning.

Before this patch, the rule is: same sided candidates are favored over wrong 

[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-10 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl marked 4 inline comments as done.
yaxunl added a comment.

In D79526#2027695 , @tra wrote:

> This one is just a FYI. I've managed to reduce the failure in the first 
> version of this patch and it looks rather odd because the reduced test case 
> has nothing to do with CUDA. Instead it appears to introduce a difference in 
> compilation of regular host-only C++ code with `-x cuda` vs -x `c++`. I'm not 
> sure how/why first version caused this and why the latest one fixes it. It 
> may be worth double checking that we're not missing something here.
>
>   template  a b;
>   auto c(...);
>   template  constexpr auto c(d) -> decltype(0);
>   struct e {
> template  static auto g(ad, f...) {
>   h)...>;
> }
> struct i {
>   template  static constexpr auto j(f... k) { c(k...); 
> }
> };
> template  static auto h() { i::j; }
>   };
>   class l {
> l() {
>   e::g([] {}, this);
> }
>   };
>


function j is an implicit host device function, it calls function c. There are 
two candidates: the first one is a host function, the second one is an implicit 
host device function.

Assuming this code is originally C++ code, the author intends the second to be 
chosen since it is a better match. The code will fail to compile if the first 
one is chosen since its return type cannot be deduced.

Now we compile it as CUDA code and constexpr functions automatically become 
implicit host device function. In host compilation we do not need special 
handling since host device candidates and same-sided candidates are both 
viable. There was a bug which used special handling of implicit host device 
function in host compilation, which was fixed by my last update.

Basically we only need special handling for implicit host device function in 
device compilation. In host compilation we always use the normal overloading 
resolution. For explicit host device functions we always use the normal 
overloading resolution.




Comment at: clang/include/clang/Sema/Sema.h:11663
+bool IgnoreImplicitHDAttr = false,
+bool *IsImplicitHDAttr = nullptr);
   CUDAFunctionTarget IdentifyCUDATarget(const ParsedAttributesView );

tra wrote:
> Plumbing an optional output argument it through multiple levels of callers as 
> an output argument is rather hard to follow, especially considering that it's 
> not set in all code paths. Perhaps we can turn IsImplicitHDAttr into a 
> separate function and call it from isBetterOverloadCandidate().
will do



Comment at: clang/test/SemaCUDA/function-overload.cu:471-477
+inline double callee(double x);
+#pragma clang force_cuda_host_device begin
+inline void callee(int x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end

tra wrote:
> These tests only veryfy that the code compiled, but it does not guarantee 
> that we've picked the correct overload.
> You should give callees different return types and assign the result to a 
> variable of intended type.  See `test_host_device_calls_hd_template() ` on 
> line 341 for an example.
they have different return types. The right one returns double and the wrong 
one returns void. If the wrong one is chosen, there is syntax error since the 
caller returns double.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-10 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 263041.
yaxunl marked 2 inline comments as done.
yaxunl added a comment.

introduce Sema::IsCUDAImplicitHostDeviceFunction() and remove changes to 
IdentifyCUDATarget and IdentifyCUDAPreference. Added one more test.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526

Files:
  clang/include/clang/Sema/Sema.h
  clang/lib/Sema/SemaCUDA.cpp
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCUDA/function-overload.cu

Index: clang/test/SemaCUDA/function-overload.cu
===
--- clang/test/SemaCUDA/function-overload.cu
+++ clang/test/SemaCUDA/function-overload.cu
@@ -1,8 +1,8 @@
 // REQUIRES: x86-registered-target
 // REQUIRES: nvptx-registered-target
 
-// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify %s
-// RUN: %clang_cc1 -std=c++11 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify %s
+// RUN: %clang_cc1 -std=c++14 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify %s
+// RUN: %clang_cc1 -std=c++14 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify %s
 
 #include "Inputs/cuda.h"
 
@@ -463,3 +463,72 @@
 void foo() {
   __test();
 }
+
+// Test resolving implicit host device candidate vs wrong-sided candidate.
+// In device compilation, implicit host device caller choose implicit host
+// device candidate and wrong-sided candidate with equal preference.
+namespace ImplicitHostDeviceVsWrongSided {
+inline double callee(double x);
+#pragma clang force_cuda_host_device begin
+inline void callee(int x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end
+}
+
+// Test resolving implicit host device candidate vs wrong-sided candidate.
+// In host compilation, implicit host device caller choose implicit host
+// device candidate and same-sided candidate with equal preference.
+namespace ImplicitHostDeviceVsSameSide {
+inline void callee(int x);
+#pragma clang force_cuda_host_device begin
+inline double callee(double x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end
+}
+
+// In the implicit host device function 'caller', the second 'callee' should be
+// since it has better match, even though it is an implicit host device function
+// whereas the first 'callee' is a host function. A diagnostic will be emitted
+// if the first 'callee' is chosen since deduced return type cannot be used
+// before it is defined.
+namespace ImplicitHostDeviceByConstExpr {
+template  a b;
+auto callee(...);
+template  constexpr auto callee(d) -> decltype(0);
+struct e {
+  template  static auto g(ad, f...) {
+return h)...>;
+  }
+  struct i {
+template  static constexpr auto caller(f... k) {
+  return callee(k...);
+}
+  };
+  template  static auto h() {
+return i::caller;
+  }
+};
+class l {
+  l() {
+e::g([] {}, this);
+  }
+};
+}
+
+// Test resolving explicit host device candidate vs. wrong-sided candidate.
+// Explicit host device caller favors host device candidate against wrong-sided
+// candidate.
+namespace ExplicitHostDeviceVsWrongSided {
+inline double callee(double x);
+inline __host__ __device__ void callee(int x);
+inline __host__ __device__ double explicit_hd_caller() {
+  return callee(1.0);
+#if __CUDA_ARCH__
+  // expected-error@-2 {{cannot initialize return object of type 'double' with an rvalue of type 'void'}}
+#endif
+}
+}
Index: clang/lib/Sema/SemaOverload.cpp
===
--- clang/lib/Sema/SemaOverload.cpp
+++ clang/lib/Sema/SemaOverload.cpp
@@ -9517,11 +9517,27 @@
   // in global variable initializers once proper context is added.
   if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
 if (FunctionDecl *Caller = dyn_cast(S.CurContext)) {
+  bool IsCallerImplicitHD = S.IsCUDAImplicitHostDeviceFunction(Caller);
+  bool IsCand1ImplicitHD =
+  S.IsCUDAImplicitHostDeviceFunction(Cand1.Function);
+  bool IsCand2ImplicitHD =
+  S.IsCUDAImplicitHostDeviceFunction(Cand2.Function);
   auto P1 = S.IdentifyCUDAPreference(Caller, Cand1.Function);
   auto P2 = S.IdentifyCUDAPreference(Caller, Cand2.Function);
   assert(P1 != Sema::CFP_Never && P2 != Sema::CFP_Never);
-  auto Cand1Emittable = P1 > Sema::CFP_WrongSide;
-  auto Cand2Emittable = P2 > Sema::CFP_WrongSide;
+  // The implicit HD function may be a function in a system header which
+  // is forced by pragma. In device compilation, if we prefer HD candidates
+  // over wrong-sided candidates, overloading resolution may change, which
+  // may result in non-deferrable diagnostics. As a workaround, we let
+  // implicit HD candidates take equal preference as wrong-sided candidates.
+  // This will preserve the overloading resolution.
+  auto EmitThreshold =
+  

[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

This one is just a FYI. I've managed to reduce the failure in the first version 
of this patch and it looks rather odd because the reduced test case has nothing 
to do with CUDA. Instead it appears to introduce a difference in compilation of 
regular host-only C++ code with `-x cuda` vs -x `c++`. I'm not sure how/why 
first version caused this and why the latest one fixes it. It may be worth 
double checking that we're not missing something here.

  template  a b;
  auto c(...);
  template  constexpr auto c(d) -> decltype(0);
  struct e {
template  static auto g(ad, f...) {
  h)...>;
}
struct i {
  template  static constexpr auto j(f... k) { c(k...); }
};
template  static auto h() { i::j; }
  };
  class l {
l() {
  e::g([] {}, this);
}
  };

The latest version of this patch works, but previous one failed with an error, 
when the example was compiled as CUDA, but not, when it was compiled as C++:

  $ bin/clang++ -x cuda argmax.cc -ferror-limit=1 -fsyntax-only 
--cuda-host-only -nocudalib -nocudainc -fsized-deallocation -std=c++17
  
  argmax.cc:9:68: error: function 'c' with deduced return type cannot be used 
before it is defined
  template  static constexpr auto j(f... k) { c(k...); }
 ^
  argmax.cc:11:53: note: in instantiation of function template specialization 
'e::i::j' requested here
template  static auto h() { i::j; }
  ^
  argmax.cc:6:5: note: in instantiation of function template specialization 
'e::h' requested here
  h)...>;
  ^
  argmax.cc:15:8: note: in instantiation of function template specialization 
'e::g<(lambda at argmax.cc:15:10), l *>' requested here
  e::g([] {}, this);
 ^
  argmax.cc:2:6: note: 'c' declared here
  auto c(...);
   ^
  fatal error: too many errors emitted, stopping now [-ferror-limit=]
  2 errors generated when compiling for host.



  $ bin/clang++ -x c++ argmax.cc -ferror-limit=1 -fsyntax-only --cuda-host-only 
-nocudalib -nocudainc -fsized-deallocation -std=c++17
  
  clang-11: warning: argument unused during compilation: '-nocudainc' 
[-Wunused-command-line-argument]
  argmax.cc:11:50: warning: expression result unused [-Wunused-value]
template  static auto h() { i::j; }
   ^~~
  argmax.cc:6:5: note: in instantiation of function template specialization 
'e::h' requested here
  h)...>;
  ^
  argmax.cc:15:8: note: in instantiation of function template specialization 
'e::g<(lambda at argmax.cc:15:10), l *>' requested here
  e::g([] {}, this);
 ^
  argmax.cc:6:5: warning: expression result unused [-Wunused-value]
  h)...>;
  ^~~
  argmax.cc:15:8: note: in instantiation of function template specialization 
'e::g<(lambda at argmax.cc:15:10), l *>' requested here
  e::g([] {}, this);
 ^
  argmax.cc:3:35: warning: inline function 'c' is not defined 
[-Wundefined-inline]
  template  constexpr auto c(d) -> decltype(0);
^
  argmax.cc:9:68: note: used here
  template  static constexpr auto j(f... k) { c(k...); }
 ^
  3 warnings generated.




Comment at: clang/include/clang/Sema/Sema.h:11663
+bool IgnoreImplicitHDAttr = false,
+bool *IsImplicitHDAttr = nullptr);
   CUDAFunctionTarget IdentifyCUDATarget(const ParsedAttributesView );

Plumbing an optional output argument it through multiple levels of callers as 
an output argument is rather hard to follow, especially considering that it's 
not set in all code paths. Perhaps we can turn IsImplicitHDAttr into a separate 
function and call it from isBetterOverloadCandidate().


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: wash.
tra added a comment.

In D79526#2027470 , @yaxunl wrote:

> For implicit host device functions, since they are not guaranteed to work in 
> device compilation, we can only resolve them as if they are host functions. 
> This causes asymmetry but implicit host device functions are originally host 
> functions so it is biased toward host compilation in the beginning.


I don't think that the assertion that `implicit host device functions are 
originally host functions` is always true. While in practice most such 
functions may indeed come from the existing host code (e.g. the standard 
library), I don't see any inherent reason why they can't come from the code 
written for GPU. E.g. thrust is likely to have some implicitly HD functions in 
the code that was not intended for CPUs and your assumption will be wrong. Even 
if such case may not exist now, it would not be unreasonable for users to have 
such code on device. 
This overload resolution difference is observable and it will likely create new 
corner cases in convoluted enough C++ code.

I think we need something more principled than "happens to work for existing 
code".

> Only the original resolution guarantees no other issues.  For example, in the 
> failed compilation in TF, some ctor of std::atomic becomes implicit host 
> device function because it is constexpr. We should treated as wrong-sided in 
> device compilation, but we should treated as same-sided in host compilation, 
> otherwise it changes the resolution in host compilation and causes other 
> issues.

It may be true for atomic, where we do need to have GPU-specific 
implementation. However, I can also see classes with constexpr constructors 
that are prefectly usable on both sides and do not have to be treated as the 
wrong-side.

TBH, I do not see any reasonable way to deal with this with the current 
implementation of how HD functions are treated. This patch and its base do 
improve things somewhat, but it all comes at the cost of further complexity and 
potentially paints us even deeper into a corner. Current behavior is already 
rather hard to explain.

Some time back @wash from NVIDIA was asking about improving HD function 
handling. Maybe it's time for all interested parties to figure out whether it's 
time to come up with a better solution. Not in this patch, obviously.




Comment at: clang/test/SemaCUDA/function-overload.cu:471-477
+inline double callee(double x);
+#pragma clang force_cuda_host_device begin
+inline void callee(int x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end

These tests only veryfy that the code compiled, but it does not guarantee that 
we've picked the correct overload.
You should give callees different return types and assign the result to a 
variable of intended type.  See `test_host_device_calls_hd_template() ` on line 
341 for an example.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-08 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D79526#2027242 , @tra wrote:

> The latest version of the patch works well enough to compile tensorflow. 
> That's the good news.
>
> In D79526#2026857 , @yaxunl wrote:
>
> > Looks like we went overboard to treat implicit host device candidate as 
> > inferior. They should be treated
> >  as inferior in device compilation, not in host compilation. Here because 
> > they are treated as inferior
> >  to same-sided candidate in host compilation, they changed overload 
> > resolution in host compilation
> >  therefore caused the failure in host compilation.
> >
> > I have updated the patch to treat implicit host device candidate as 
> > inferior in device compilation.
>
>
> I'm concerned that this creates inconsistency in how overload resolution 
> works during host and device compilation.
>  In general they should behave the same. I.e. a test where this change is 
> needed during device-side compilation will require the same change on the 
> host side, if you swap H and D attributes on the functions in the test.
>
> Speaking of tests, it would be great to add a test illustrating this scenario.


I added a test at line 483 for the situation.

For implicit host device functions, since they are not guaranteed to work in 
device compilation, we can only resolve them as if they are host functions. 
This causes asymmetry but implicit host device functions are originally host 
functions so it is biased toward host compilation in the beginning. Only the 
original resolution guarantees no other issues.  For example, in the failed 
compilation in TF, some ctor of std::atomic becomes implicit host device 
function because it is constexpr. We should treated as wrong-sided in device 
compilation, but we should treated as same-sided in host compilation, otherwise 
it changes the resolution in host compilation and causes other issues.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

The latest version of the patch works well enough to compile tensorflow. That's 
the good news.

In D79526#2026857 , @yaxunl wrote:

> Looks like we went overboard to treat implicit host device candidate as 
> inferior. They should be treated
>  as inferior in device compilation, not in host compilation. Here because 
> they are treated as inferior
>  to same-sided candidate in host compilation, they changed overload 
> resolution in host compilation
>  therefore caused the failure in host compilation.
>
> I have updated the patch to treat implicit host device candidate as inferior 
> in device compilation.


I'm concerned that this creates inconsistency in how overload resolution works 
during host and device compilation.
In general they should behave the same. I.e. a test where this change is needed 
during device-side compilation will require the same change on the host side, 
if you swap H and D attributes on the functions in the test.

Speaking of tests, it would be great to add a test illustrating this scenario.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-08 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl added a comment.

In D79526#2025761 , @tra wrote:

> I've tested the patch on our sources and it still breaks tensorflow 
> compilation, though in a different way:
>
>   In file included from 
> third_party/tensorflow/core/kernels/slice_op_gpu.cu.cc:22:
>   In file included from 
> ./third_party/tensorflow/core/framework/register_types.h:20:
>   In file included from 
> ./third_party/tensorflow/core/framework/numeric_types.h:28:
>   In file included from ./third_party/tensorflow/core/platform/types.h:22:
>   In file included from ./third_party/tensorflow/core/platform/tstring.h:24:
>   In file included from ./third_party/tensorflow/core/platform/cord.h:23:
>   In file included from 
> ./third_party/tensorflow/core/platform/google/cord.h:19:
>   In file included from ./third_party/absl/strings/cord.h:89:
>   ./third_party/absl/strings/internal/cord_internal.h:34:16: error: no 
> matching constructor for initialization of 'std::atomic' (aka 
> 'atomic')
> Refcount() : count_{1} {}
>  ^ ~~~
>   
> third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1778:8:
>  note: candidate constructor (the implicit copy constructor) not viable: no 
> known conversion from 'int' to 'const std::__u::atomic' for 1st argument
>   struct atomic
>  ^
>   
> third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1784:5:
>  note: candidate constructor not viable: requires 0 arguments, but 1 was 
> provided
>   atomic() _NOEXCEPT _LIBCPP_DEFAULT
>   ^
>   
> third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1807:52:
>  error: call to deleted constructor of 
> '__atomic_base'
>   _LIBCPP_CONSTEXPR atomic(_Tp* __d) _NOEXCEPT : __base(__d) {}
>  ^  ~~~
>   ./third_party/absl/base/internal/thread_identity.h:162:66: note: in 
> instantiation of member function 
> 'std::__u::atomic::atomic' requested here
>   std::atomic bound_schedulable{nullptr};
>^
>   
> third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1675:5:
>  note: '__atomic_base' has been explicitly marked deleted here
>   __atomic_base(const __atomic_base&) = delete;
>   ^
>   
> third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1786:51:
>  error: call to implicitly-deleted copy constructor of '__atomic_base'
>   _LIBCPP_CONSTEXPR atomic(_Tp __d) _NOEXCEPT : __base(__d) {}
> ^  ~~~
>   ./third_party/absl/synchronization/mutex.h:927:25: note: in instantiation 
> of member function 'std::__u::atomic::atomic' requested here
>   inline Mutex::Mutex() : mu_(0) {
>   ^
>   
> third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1698:7:
>  note: copy constructor of '__atomic_base' is implicitly deleted 
> because base class '__atomic_base' has a deleted copy constructor
>   : public __atomic_base<_Tp, false>
> ^
>   
> third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1675:5:
>  note: '__atomic_base' has been explicitly marked deleted here
>   __atomic_base(const __atomic_base&) = delete;
>   ^
>  
>


Looks like we went overboard to treat implicit host device candidate as 
inferior. They should be treated
as inferior in device compilation, not in host compilation. Here because they 
are treated as inferior
to same-sided candidate in host compilation, they changed overload resolution 
in host compilation
therefore caused the failure in host compilation.

I have updated the patch to treat implicit host device candidate as inferior in 
device compilation.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-08 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl updated this revision to Diff 262858.
yaxunl edited the summary of this revision.
yaxunl added a comment.

fix regression. only treat implicit host device candidate inferior in device 
compilation.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526

Files:
  clang/include/clang/Sema/Sema.h
  clang/lib/Sema/SemaCUDA.cpp
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCUDA/function-overload.cu

Index: clang/test/SemaCUDA/function-overload.cu
===
--- clang/test/SemaCUDA/function-overload.cu
+++ clang/test/SemaCUDA/function-overload.cu
@@ -463,3 +463,43 @@
 void foo() {
   __test();
 }
+
+// Test resolving implicit host device candidate vs wrong-sided candidate.
+// In device compilation, implicit host device caller choose implicit host
+// device candidate and wrong-sided candidate with equal preference.
+namespace ImplicitHostDeviceVsWrongSided {
+inline double callee(double x);
+#pragma clang force_cuda_host_device begin
+inline void callee(int x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end
+}
+
+// Test resolving implicit host device candidate vs wrong-sided candidate.
+// In host compilation, implicit host device caller choose implicit host
+// device candidate and same-sided candidate with equal preference.
+namespace ImplicitHostDeviceVsWrongSided2 {
+inline void callee(int x);
+#pragma clang force_cuda_host_device begin
+inline double callee(double x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end
+}
+
+// Test resolving explicit host device candidate vs. wrong-sided candidate.
+// Explicit host device caller favors host device candidate against wrong-sided
+// candidate.
+namespace ExplicitHostDeviceVsWrongSided {
+inline double callee(double x);
+inline __host__ __device__ void callee(int x);
+inline __host__ __device__ double explicit_hd_caller() {
+  return callee(1.0);
+#if __CUDA_ARCH__
+  // expected-error@-2 {{cannot initialize return object of type 'double' with an rvalue of type 'void'}}
+#endif
+}
+}
Index: clang/lib/Sema/SemaOverload.cpp
===
--- clang/lib/Sema/SemaOverload.cpp
+++ clang/lib/Sema/SemaOverload.cpp
@@ -9517,11 +9517,29 @@
   // in global variable initializers once proper context is added.
   if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
 if (FunctionDecl *Caller = dyn_cast(S.CurContext)) {
-  auto P1 = S.IdentifyCUDAPreference(Caller, Cand1.Function);
-  auto P2 = S.IdentifyCUDAPreference(Caller, Cand2.Function);
+  bool IsCallerImplicitHD = false;
+  bool IsCand1ImplicitHD = false;
+  bool IsCand2ImplicitHD = false;
+  S.IdentifyCUDATarget(Caller, /*IgnoreImplicitHD=*/false,
+   );
+  auto P1 =
+  S.IdentifyCUDAPreference(Caller, Cand1.Function, );
+  auto P2 =
+  S.IdentifyCUDAPreference(Caller, Cand2.Function, );
   assert(P1 != Sema::CFP_Never && P2 != Sema::CFP_Never);
-  auto Cand1Emittable = P1 > Sema::CFP_WrongSide;
-  auto Cand2Emittable = P2 > Sema::CFP_WrongSide;
+  // The implicit HD function may be a function in a system header which
+  // is forced by pragma. In device compilation, if we prefer HD candidates
+  // over wrong-sided candidates, overloading resolution may change, which
+  // may result in non-deferrable diagnostics. As a workaround, we let
+  // implicit HD candidates take equal preference as wrong-sided candidates.
+  // This will preserve the overloading resolution.
+  auto EmitThreshold =
+  (S.getLangOpts().CUDAIsDevice && IsCallerImplicitHD &&
+   (IsCand1ImplicitHD || IsCand2ImplicitHD))
+  ? Sema::CFP_HostDevice
+  : Sema::CFP_WrongSide;
+  auto Cand1Emittable = P1 > EmitThreshold;
+  auto Cand2Emittable = P2 > EmitThreshold;
   if (Cand1Emittable && !Cand2Emittable)
 return true;
   if (!Cand1Emittable && Cand2Emittable)
Index: clang/lib/Sema/SemaCUDA.cpp
===
--- clang/lib/Sema/SemaCUDA.cpp
+++ clang/lib/Sema/SemaCUDA.cpp
@@ -95,17 +95,25 @@
   return CFT_Host;
 }
 
-template 
-static bool hasAttr(const FunctionDecl *D, bool IgnoreImplicitAttr) {
-  return D->hasAttrs() && llvm::any_of(D->getAttrs(), [&](Attr *Attribute) {
-   return isa(Attribute) &&
-  !(IgnoreImplicitAttr && Attribute->isImplicit());
- });
+template 
+static bool hasAttr(const FunctionDecl *D, bool IgnoreImplicitAttr,
+bool *IsImplicitHDAttr = nullptr) {
+  if (auto *A = D->getAttr()) {
+if (A->isImplicit()) {
+  if (IsImplicitHDAttr)
+*IsImplicitHDAttr = true;
+  if (IgnoreImplicitAttr)
+return false;
+}
+return true;
+  }

[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

I've tested the patch on our sources and it still breaks tensorflow 
compilation, though in a different way:

  In file included from 
third_party/tensorflow/core/kernels/slice_op_gpu.cu.cc:22:
  In file included from 
./third_party/tensorflow/core/framework/register_types.h:20:
  In file included from 
./third_party/tensorflow/core/framework/numeric_types.h:28:
  In file included from ./third_party/tensorflow/core/platform/types.h:22:
  In file included from ./third_party/tensorflow/core/platform/tstring.h:24:
  In file included from ./third_party/tensorflow/core/platform/cord.h:23:
  In file included from ./third_party/tensorflow/core/platform/google/cord.h:19:
  In file included from ./third_party/absl/strings/cord.h:89:
  ./third_party/absl/strings/internal/cord_internal.h:34:16: error: no matching 
constructor for initialization of 'std::atomic' (aka 'atomic')
Refcount() : count_{1} {}
 ^ ~~~
  
third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1778:8:
 note: candidate constructor (the implicit copy constructor) not viable: no 
known conversion from 'int' to 'const std::__u::atomic' for 1st argument
  struct atomic
 ^
  
third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1784:5:
 note: candidate constructor not viable: requires 0 arguments, but 1 was 
provided
  atomic() _NOEXCEPT _LIBCPP_DEFAULT
  ^
  
third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1807:52:
 error: call to deleted constructor of 
'__atomic_base'
  _LIBCPP_CONSTEXPR atomic(_Tp* __d) _NOEXCEPT : __base(__d) {}
 ^  ~~~
  ./third_party/absl/base/internal/thread_identity.h:162:66: note: in 
instantiation of member function 
'std::__u::atomic::atomic' requested here
  std::atomic bound_schedulable{nullptr};
   ^
  
third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1675:5:
 note: '__atomic_base' has been explicitly marked deleted here
  __atomic_base(const __atomic_base&) = delete;
  ^
  
third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1786:51:
 error: call to implicitly-deleted copy constructor of '__atomic_base'
  _LIBCPP_CONSTEXPR atomic(_Tp __d) _NOEXCEPT : __base(__d) {}
^  ~~~
  ./third_party/absl/synchronization/mutex.h:927:25: note: in instantiation of 
member function 'std::__u::atomic::atomic' requested here
  inline Mutex::Mutex() : mu_(0) {
  ^
  
third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1698:7:
 note: copy constructor of '__atomic_base' is implicitly deleted 
because base class '__atomic_base' has a deleted copy constructor
  : public __atomic_base<_Tp, false>
^
  
third_party/crosstool/v18/llvm_unstable/toolchain/bin/../include/c++/v1/atomic:1675:5:
 note: '__atomic_base' has been explicitly marked deleted here
  __atomic_base(const __atomic_base&) = delete;
  ^


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79526/new/

https://reviews.llvm.org/D79526



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79526: [CUDA][HIP] Workaround for resolving host device function against wrong-sided function

2020-05-06 Thread Yaxun Liu via Phabricator via cfe-commits
yaxunl created this revision.
yaxunl added reviewers: tra, rjmccall.

https://reviews.llvm.org/D77954 caused regressions due to diagnostics in 
implicit
host device functions.

The implicit host device functions are often functions in system headers forced 
to be device host by pragmas.

Some of them are valid host device functions that can be emitted in both host 
and device compilation.

Some of them are valid host functions but invalid device functions. In device 
compilation they incur
diagnostics. However as long as these diagnostics are deferred and these 
functions are not emitted
this is fine.

Before D77954 , in host device callers, host 
device candidates are not favored against wrong-sided candidates,
which preserves the overloading resolution result as if the caller and the 
candidates are host functions.
This makes sure the callee does not cause other issues, e.g. type mismatch, 
const-ness issues, etc. If the
selected function is a host device function, then it is a viable callee. If the 
selected function is a host
function, then the caller is not a valid host device function, and it results 
in a diagnostic but it can be deferred.

The problem is that we have to give host device candidates equal preference 
with wrong-sided candidates. If
the users really intend to favor host device candidate against wrong-sided 
candidate, they cannot get the
expected selection.

Ideally we should be able to defer all diagnostics for functions not sure to be 
emitted. In that case we can
have correct preference. If diagnostics occur due to overloading resolution 
change, as long as the function
is not emitted, it is fine.

Unfortunately it is not a trivial work to defer all diagnostics. Even deferring 
only overloading resolution related
diagnostics is not a simple work.

For now, it seems the most feasible workaround is to treat implicit host device 
function and explicit host
device function differently. Basically for implicit host device functions, keep 
the old behavior, i.e. give
host device candidates and wrong-sided candidates equal preference. For 
explicit host device functions,
favor host device candidates against wrong-sided candidates.

The rationale is that explicit host device functions are blessed by the user to 
be valid host device functions,
that is, they should not cause diagnostics in both host and device compilation. 
If diagnostics occur, user is
able to fix them. However, there is no guarantee that implicit host device 
function can be compiled in
device compilation, therefore we need to preserve its overloading resolution in 
device compilation.


https://reviews.llvm.org/D79526

Files:
  clang/include/clang/Sema/Sema.h
  clang/lib/Sema/SemaCUDA.cpp
  clang/lib/Sema/SemaOverload.cpp
  clang/test/SemaCUDA/function-overload.cu

Index: clang/test/SemaCUDA/function-overload.cu
===
--- clang/test/SemaCUDA/function-overload.cu
+++ clang/test/SemaCUDA/function-overload.cu
@@ -463,3 +463,30 @@
 void foo() {
   __test();
 }
+
+// Test resolving implicit host device candidate vs wrong-sided candidate.
+// Implicit host device caller choose implicit host device candidate and
+// wrong-sided candidate with equal preference.
+#ifdef __CUDA_ARCH__
+namespace ImplicitHostDeviceVsWrongSided {
+inline double callee(double x);
+#pragma clang force_cuda_host_device begin
+inline void callee(int x);
+inline double implicit_hd_caller() {
+  return callee(1.0);
+}
+#pragma clang force_cuda_host_device end
+}
+
+// Test resolving explicit host device candidate vs. wrong-sided candidate.
+// Explicit host device caller favors host device candidate against wrong-sided
+// candidate.
+namespace ExplicitHostDeviceVsWrongSided {
+inline double callee(double x);
+inline __host__ __device__ void callee(int x);
+inline __host__ __device__ double explicit_hd_caller() {
+  return callee(1.0);
+  // expected-error@-1 {{cannot initialize return object of type 'double' with an rvalue of type 'void'}}
+}
+}
+#endif
Index: clang/lib/Sema/SemaOverload.cpp
===
--- clang/lib/Sema/SemaOverload.cpp
+++ clang/lib/Sema/SemaOverload.cpp
@@ -9517,11 +9517,28 @@
   // in global variable initializers once proper context is added.
   if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
 if (FunctionDecl *Caller = dyn_cast(S.CurContext)) {
-  auto P1 = S.IdentifyCUDAPreference(Caller, Cand1.Function);
-  auto P2 = S.IdentifyCUDAPreference(Caller, Cand2.Function);
+  bool IsCallerImplicitHD = false;
+  bool IsCand1ImplicitHD = false;
+  bool IsCand2ImplicitHD = false;
+  S.IdentifyCUDATarget(Caller, /*IgnoreImplicitHD=*/false,
+   );
+  auto P1 =
+  S.IdentifyCUDAPreference(Caller, Cand1.Function, );
+  auto P2 =
+  S.IdentifyCUDAPreference(Caller, Cand2.Function, );
   assert(P1 !=