[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-31 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4094238 , @srj wrote:

> In D141861#4094084 , @srj wrote:
>
>> In D141861#4094079 , @jhuber6 
>> wrote:
>>
>>> In D141861#4094063 , @srj wrote:
>>>
 Yes please!
>>>
>>> Let me know if this fixes anything rG9f64fbb882dc 
>>> .
>>
>> Testing now
>
> So far, so good, let me just verify a few more things

Looks good! Thanks for the fix!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-31 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4094084 , @srj wrote:

> In D141861#4094079 , @jhuber6 wrote:
>
>> In D141861#4094063 , @srj wrote:
>>
>>> Yes please!
>>
>> Let me know if this fixes anything rG9f64fbb882dc 
>> .
>
> Testing now

So far, so good, let me just verify a few more things


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-31 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4094079 , @jhuber6 wrote:

> In D141861#4094063 , @srj wrote:
>
>> Yes please!
>
> Let me know if this fixes anything rG9f64fbb882dc 
> .

Testing now


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-31 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141861#4094063 , @srj wrote:

> Yes please!

Let me know if this fixes anything rG9f64fbb882dc 
.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-31 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4094059 , @jhuber6 wrote:

> In D141861#4094058 , @srj wrote:
>
>> In D141861#4094043 , @jhuber6 
>> wrote:
>>
>>> Would this just require checking `LLVM_BUILD_32_BITS`? Should be an easy 
>>> change.
>>
>> I think so. (It might be tempting to check `if (CMAKE_SIZEOF_VOID_P EQUAL 
>> 8)` but LLVM_BUILD_32_BITS is likely to be a more explicit signal.)
>
> SG, want me to push that real quick?

Yes please!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-31 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141861#4094058 , @srj wrote:

> In D141861#4094043 , @jhuber6 wrote:
>
>> Would this just require checking `LLVM_BUILD_32_BITS`? Should be an easy 
>> change.
>
> I think so. (It might be tempting to check `if (CMAKE_SIZEOF_VOID_P EQUAL 8)` 
> but LLVM_BUILD_32_BITS is likely to be a more explicit signal.)

SG, want me to push that real quick?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-31 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4094043 , @jhuber6 wrote:

> Would this just require checking `LLVM_BUILD_32_BITS`? Should be an easy 
> change.

I think so. (It might be tempting to check `if (CMAKE_SIZEOF_VOID_P EQUAL 8)` 
but LLVM_BUILD_32_BITS is likely to be a more explicit signal.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-31 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141861#4094036 , @srj wrote:

> In D141861#4092237 , @tra wrote:
>
>> For what it's worth, NVIDIA has started deprecating 32-bit binaries long ago 
>> (https://forums.developer.nvidia.com/t/deprecation-plans-for-32-bit-linux-x86-cuda-toolkit-and-cuda-driver/31356)
>>   and the process had finally come to the end with the release of CUDA-12:
>
> Hmm... maybe the right answer then is to just always use the dynamic-loading 
> path when doing any kind of 32-bit build.

That's probably the best option. I don't think we have much pretense of 
supporting 64-bit offloading right now. Would this just require checking 
`LLVM_BUILD_32_BITS`? Should be an easy change.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-31 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4092237 , @tra wrote:

> For what it's worth, NVIDIA has started deprecating 32-bit binaries long ago 
> (https://forums.developer.nvidia.com/t/deprecation-plans-for-32-bit-linux-x86-cuda-toolkit-and-cuda-driver/31356)
>   and the process had finally come to the end with the release of CUDA-12:

Hmm... maybe the right answer then is to just always use the dynamic-loading 
path when doing any kind of 32-bit build.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

For what it's worth, NVIDIA has started deprecating 32-bit binaries long ago 
(https://forums.developer.nvidia.com/t/deprecation-plans-for-32-bit-linux-x86-cuda-toolkit-and-cuda-driver/31356)
  and the process had finally come to the end with the release of CUDA-12:

CUDA-12 release notes say: 


> 32-bit compilation native and cross-compilation is removed from CUDA 12.0 and 
> later Toolkit. 
> Use the CUDA Toolkit from earlier releases for 32-bit compilation. CUDA 
> Driver will continue 
> to support running existing 32-bit applications on existing GPUs except 
> Hopper. 
> Hopper does not support 32-bit applications. Ada will be the last 
> architecture with driver support for 32-bit applications.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4092190 , @jhuber6 wrote:

> In D141861#4092182 , @srj wrote:
>
>> In D141861#4092096 , @srj wrote:
>>
>>> Update: I may have a way to make this work from my side; testing now.
>>
>> Alas, that didn't work, stlll broken.
>
> Interesting. It's definitely a bad problem that we find the 64-bit version 
> and try to link with it for a cross-compile. I don't want to revert the code 
> because the old `FIND_CUDA` is officially deprecated. I figured there was 
> surely a way to check if LLVM was cross compiling.

I'll see if I can get our CMake expert to weigh in :-)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141861#4092182 , @srj wrote:

> In D141861#4092096 , @srj wrote:
>
>> Update: I may have a way to make this work from my side; testing now.
>
> Alas, that didn't work, stlll broken.

Interesting. It's definitely a bad problem that we find the 64-bit version and 
try to link with it for a cross-compile. I don't want to revert the code 
because the old `FIND_CUDA` is officially deprecated. I figured there was 
surely a way to check if LLVM was cross compiling.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4092096 , @srj wrote:

> Update: I may have a way to make this work from my side; testing now.

Alas, that didn't work, stlll broken.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4092034 , @srj wrote:

> In D141861#4091987 , @jhuber6 wrote:
>
>> Can you let me know if adding this fixes it.
>
> Unfortunately, no. (That is: It does not fix it. CMAKE_CROSSCOMPILING is 
> unfortunately a subtle and flaky beast, see e.g. 
> https://gitlab.kitware.com/cmake/cmake/-/issues/21744)

Update: I may have a way to make this work from my side; testing now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4091987 , @jhuber6 wrote:

> Can you let me know if adding this fixes it.

Unfortunately, no.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141861#4091961 , @srj wrote:

> In D141861#4091949 , @jhuber6 wrote:
>
>> In D141861#4091922 , @srj wrote:
>>
>>> Crosscompiling to x86-32 on an x86-64 host doesn't strike me as 
>>> particularly weird at all (especially on Windows), but apparently it is 
>>> quite weird for LLVM at this point in time as we keep getting a lot of 
>>> different things broken there :-)
>>
>> I'm not very familiar with this type of build. Are there any variables we 
>> could pick up to just disable this if it's not building for the host system? 
>> Something like `CMAKE_CROSSCOMPILING`?
>
> I'm not an expert on the LLVM build system, so I'm not entirely sure, but I'd 
> start by examining the CMake setting `LLVM_BUILD_32_BITS` (which we set to ON 
> in this case)

Can you let me know if adding this fixes it.

  diff --git a/clang/tools/nvptx-arch/CMakeLists.txt 
b/clang/tools/nvptx-arch/CMakeLists.txt
  index 95c25dc75847..ccdba5ed69a7 100644
  --- a/clang/tools/nvptx-arch/CMakeLists.txt
  +++ b/clang/tools/nvptx-arch/CMakeLists.txt
  @@ -12,7 +12,7 @@ add_clang_tool(nvptx-arch NVPTXArch.cpp)
   find_package(CUDAToolkit QUIET)
   
   # If we found the CUDA library directly we just dynamically link against it.
  -if (CUDAToolkit_FOUND)
  +if (CUDAToolkit_FOUND AND NOT CMAKE_CROSSCOMPILING)
 target_link_libraries(nvptx-arch PRIVATE CUDA::cuda_driver)
   else()
 target_compile_definitions(nvptx-arch PRIVATE "DYNAMIC_CUDA")


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4091949 , @jhuber6 wrote:

> In D141861#4091922 , @srj wrote:
>
>> Crosscompiling to x86-32 on an x86-64 host doesn't strike me as particularly 
>> weird at all (especially on Windows), but apparently it is quite weird for 
>> LLVM at this point in time as we keep getting a lot of different things 
>> broken there :-)
>
> I'm not very familiar with this type of build. Are there any variables we 
> could pick up to just disable this if it's not building for the host system? 
> Something like `CMAKE_CROSSCOMPILING`?

I'm not an expert on the LLVM build system, so I'm not entirely sure, but I'd 
start by examining the CMake setting `LLVM_BUILD_32_BITS` (which we set to ON 
in this case)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141861#4091922 , @srj wrote:

> Crosscompiling to x86-32 on an x86-64 host doesn't strike me as particularly 
> weird at all (especially on Windows), but apparently it is quite weird for 
> LLVM at this point in time as we keep getting a lot of different things 
> broken there :-)

I'm not very familiar with this type of build. Are there any variables we could 
pick up to just disable this if it's not building for the host system? 
Something like `CMAKE_CROSSCOMPILING`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4091903 , @jhuber6 wrote:

> In D141861#4091897 , @srj wrote:
>
>> It's finding a 64-bit CUDAToolkit, which it can't link against because the 
>> rest of the build is 32-bit.
>
> Wondering why it didn't find it before then. But that's definitely a weird 
> configuration. Not sure what a good generic solution is. We could always make 
> it `dlopen` all the time.

Crosscompiling to x86-32 on an x86-64 host doesn't strike me as particularly 
weird at all (especially on Windows), but apparently it is quite weird for LLVM 
at this point in time as we keep getting a lot of different things broken there 
:-)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141861#4091897 , @srj wrote:

> It's finding a 64-bit CUDAToolkit, which it can't link against because the 
> rest of the build is 32-bit.

Wondering why it didn't find it before then. But that's definitely a weird 
configuration. Not sure what a good generic solution is. We could always make 
it `dlopen` all the time.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4091869 , @jhuber6 wrote:

> In D141861#4091851 , @srj wrote:
>
>>> https://github.com/llvm/llvm-project/commit/759dec253695f38a101c74905c819ea47392e515.
>>>  Does it work if you revert this? I wouldn't think it wouldn't affect 
>>> anything. That's the only change that happened after the 16 release as far 
>>> as I'm aware.
>>
>> Reverting this (well, actually just monkey-patching in the old code) does 
>> indeed make things build correctly again -- a bit surprising to me too, but 
>> that's CMake for you. Reverting the change seems appropriate pending a 
>> tested fix.
>
> That's bizarre, that means it's finding `CUDAToolkit` but can't link against 
> it?

It's finding a 64-bit CUDAToolkit, which it can't link against because the rest 
of the build is 32-bit.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141861#4091851 , @srj wrote:

>> https://github.com/llvm/llvm-project/commit/759dec253695f38a101c74905c819ea47392e515.
>>  Does it work if you revert this? I wouldn't think it wouldn't affect 
>> anything. That's the only change that happened after the 16 release as far 
>> as I'm aware.
>
> Reverting this (well, actually just monkey-patching in the old code) does 
> indeed make things build correctly again -- a bit surprising to me too, but 
> that's CMake for you. Reverting the change seems appropriate pending a tested 
> fix.

That's bizarre, that means it's finding `CUDAToolkit` but can't link against it?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

> https://github.com/llvm/llvm-project/commit/759dec253695f38a101c74905c819ea47392e515.
>  Does it work if you revert this? I wouldn't think it wouldn't affect 
> anything. That's the only change that happened after the 16 release as far as 
> I'm aware.

Reverting this (well, actually just monkey-patching in the old code) does 
indeed make things build correctly again -- a bit surprising to me too, but 
that's CMake for you. Reverting the change seems appropriate pending a tested 
fix.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141861#4091408 , @srj wrote:

> In D141861#4091403 , @jhuber6 wrote:
>
>> In D141861#4091383 , @srj wrote:
>>
>>> It looks like this change (but not the rG4ce454c654bd 
>>> ) is 
>>> in the 17 branch, as the latter is now failing in the same way for 
>>> crosscompiles.
>>
>> It looks like it's there for me, see 
>> https://github.com/llvm/llvm-project/blob/main/clang/tools/nvptx-arch/NVPTXArch.cpp#L20.
>>  What is the issue? I made a slight tweak a few days ago on the 17 branch 
>> that updated how we could the CUDA driver.
>
> We just started testing with the 17 branch this morning, can you point me at 
> your tweak?

https://github.com/llvm/llvm-project/commit/759dec253695f38a101c74905c819ea47392e515.
 Does it work if you revert this? I wouldn't think it wouldn't affect anything. 
That's the only change that happened after the 16 release as far as I'm aware.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4091403 , @jhuber6 wrote:

> In D141861#4091383 , @srj wrote:
>
>> It looks like this change (but not the rG4ce454c654bd 
>> ) is in 
>> the 17 branch, as the latter is now failing in the same way for 
>> crosscompiles.
>
> It looks like it's there for me, see 
> https://github.com/llvm/llvm-project/blob/main/clang/tools/nvptx-arch/NVPTXArch.cpp#L20.
>  What is the issue? I made a slight tweak a few days ago on the 17 branch 
> that updated how we could the CUDA driver.

We just started testing with the 17 branch this morning, can you point me at 
your tweak?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141861#4091383 , @srj wrote:

> It looks like this change (but not the rG4ce454c654bd 
> ) is in 
> the 17 branch, as the latter is now failing in the same way for crosscompiles.

It looks like it's there for me, see 
https://github.com/llvm/llvm-project/blob/main/clang/tools/nvptx-arch/NVPTXArch.cpp#L20.
 What is the issue? I made a slight tweak a few days ago on the 17 branch that 
updated how we could the CUDA driver.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-30 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

In D141861#4060100 , @jhuber6 wrote:

> In D141861#4060028 , @srj wrote:
>
>> This change appears to have broken the build when crosscompiling to x86-32 
>> on a Linux x86-64 system; on the Halide buildbots, we now fail at link time 
>> with
>>
>>   FAILED: bin/nvptx-arch 
>>   : && /usr/bin/g++-7  -m32 -Wno-psabi -fPIC -fno-semantic-interposition 
>> -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra 
>> -Wno-unused-parameter -Wwrite-strings -Wcast-qual 
>> -Wno-missing-field-initializers -pedantic -Wno-long-long 
>> -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-noexcept-type 
>> -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation 
>> -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common 
>> -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG 
>> -Wl,-rpath-link,/home/halidenightly/build_bot/worker/llvm-16-x86-32-linux/llvm-build/./lib
>>   -Wl,--gc-sections 
>> tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o -o 
>> bin/nvptx-arch  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMSupport.a  
>> -lpthread  -lrt  -ldl  -lpthread  -lm  lib/libLLVMDemangle.a && :
>>   /usr/bin/ld: 
>> tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in 
>> function `handleError(cudaError_enum)':
>>   NVPTXArch.cpp:(.text._ZL11handleError14cudaError_enum+0x2b): undefined 
>> reference to `cuGetErrorString'
>>   /usr/bin/ld: 
>> tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in 
>> function `main':
>>   NVPTXArch.cpp:(.text.startup.main+0xcf): undefined reference to `cuInit'
>>   /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0xf9): undefined reference 
>> to `cuDeviceGetCount'
>>   /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x11e): undefined reference 
>> to `cuDeviceGet'
>>   /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x131): undefined reference 
>> to `cuDeviceGetAttribute'
>>   /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x146): undefined reference 
>> to `cuDeviceGetAttribute'
>>   collect2: error: ld returned 1 exit status
>>
>> I'm guessing that the problem here is that the build machine has Cuda 
>> installed (so the headers are found), but no 32-bit version of Cuda (so 
>> linking fails).
>>
>> Probably easy to fix, but as of right now, our 32-bit testing is dead in the 
>> water; could someone please revert this pending a proper fix?
>
> Can you let me know if rG4ce454c654bd 
>  solves 
> it? I'm guessing the problem is arising when we find the libraries at build 
> configure time but not at build time so we might need another check as well.

It looks like this change (but not the rG4ce454c654bd 
) is in 
the 17 branch, as the latter is now failing in the same way for crosscompiles.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-17 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141861#4060028 , @srj wrote:

> This change appears to have broken the build when crosscompiling to x86-32 on 
> a Linux x86-64 system; on the Halide buildbots, we now fail at link time with
>
>   FAILED: bin/nvptx-arch 
>   : && /usr/bin/g++-7  -m32 -Wno-psabi -fPIC -fno-semantic-interposition 
> -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra 
> -Wno-unused-parameter -Wwrite-strings -Wcast-qual 
> -Wno-missing-field-initializers -pedantic -Wno-long-long 
> -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-noexcept-type 
> -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation 
> -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common 
> -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG 
> -Wl,-rpath-link,/home/halidenightly/build_bot/worker/llvm-16-x86-32-linux/llvm-build/./lib
>   -Wl,--gc-sections 
> tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o -o 
> bin/nvptx-arch  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMSupport.a  -lpthread 
>  -lrt  -ldl  -lpthread  -lm  lib/libLLVMDemangle.a && :
>   /usr/bin/ld: 
> tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in 
> function `handleError(cudaError_enum)':
>   NVPTXArch.cpp:(.text._ZL11handleError14cudaError_enum+0x2b): undefined 
> reference to `cuGetErrorString'
>   /usr/bin/ld: 
> tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in 
> function `main':
>   NVPTXArch.cpp:(.text.startup.main+0xcf): undefined reference to `cuInit'
>   /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0xf9): undefined reference 
> to `cuDeviceGetCount'
>   /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x11e): undefined reference 
> to `cuDeviceGet'
>   /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x131): undefined reference 
> to `cuDeviceGetAttribute'
>   /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x146): undefined reference 
> to `cuDeviceGetAttribute'
>   collect2: error: ld returned 1 exit status
>
> I'm guessing that the problem here is that the build machine has Cuda 
> installed (so the headers are found), but no 32-bit version of Cuda (so 
> linking fails).
>
> Probably easy to fix, but as of right now, our 32-bit testing is dead in the 
> water; could someone please revert this pending a proper fix?

Can you let me know if rG4ce454c654bd 
 solves 
it? I'm guessing the problem is arising when we find the libraries at build 
configure time but not at build time so we might need another check as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-17 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D141861#4060028 , @srj wrote:

> This change appears to have broken the build when crosscompiling to x86-32 on 
> a Linux x86-64 system; on the Halide buildbots, we now fail at link time with
>
>   FAILED: bin/nvptx-arch 
>   : && /usr/bin/g++-7  -m32 -Wno-psabi -fPIC -fno-semantic-interposition 
> -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra 
> -Wno-unused-parameter -Wwrite-strings -Wcast-qual 
> -Wno-missing-field-initializers -pedantic -Wno-long-long 
> -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-noexcept-type 
> -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation 
> -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common 
> -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG 
> -Wl,-rpath-link,/home/halidenightly/build_bot/worker/llvm-16-x86-32-linux/llvm-build/./lib
>   -Wl,--gc-sections 
> tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o -o 
> bin/nvptx-arch  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMSupport.a  -lpthread 
>  -lrt  -ldl  -lpthread  -lm  lib/libLLVMDemangle.a && :
>   /usr/bin/ld: 
> tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in 
> function `handleError(cudaError_enum)':
>   NVPTXArch.cpp:(.text._ZL11handleError14cudaError_enum+0x2b): undefined 
> reference to `cuGetErrorString'
>   /usr/bin/ld: 
> tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in 
> function `main':
>   NVPTXArch.cpp:(.text.startup.main+0xcf): undefined reference to `cuInit'
>   /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0xf9): undefined reference 
> to `cuDeviceGetCount'
>   /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x11e): undefined reference 
> to `cuDeviceGet'
>   /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x131): undefined reference 
> to `cuDeviceGetAttribute'
>   /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x146): undefined reference 
> to `cuDeviceGetAttribute'
>   collect2: error: ld returned 1 exit status

I'm guessing here it's finding the `cuda.h` header but not the libraries? Maybe 
we should determine this via the CMake and add a definition instead of 
`has_include`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-17 Thread Steven Johnson via Phabricator via cfe-commits
srj added a comment.

This change appears to have broken the build when crosscompiling to x86-32 on a 
Linux x86-64 system; on the Halide buildbots, we now fail at link time with

  FAILED: bin/nvptx-arch 
  : && /usr/bin/g++-7  -m32 -Wno-psabi -fPIC -fno-semantic-interposition 
-fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra 
-Wno-unused-parameter -Wwrite-strings -Wcast-qual 
-Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough 
-Wno-maybe-uninitialized -Wno-noexcept-type -Wdelete-non-virtual-dtor 
-Wno-comment -Wno-misleading-indentation -fdiagnostics-color 
-ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual 
-fno-strict-aliasing -O3 -DNDEBUG 
-Wl,-rpath-link,/home/halidenightly/build_bot/worker/llvm-16-x86-32-linux/llvm-build/./lib
  -Wl,--gc-sections 
tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o -o 
bin/nvptx-arch  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMSupport.a  -lpthread  
-lrt  -ldl  -lpthread  -lm  lib/libLLVMDemangle.a && :
  /usr/bin/ld: 
tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in 
function `handleError(cudaError_enum)':
  NVPTXArch.cpp:(.text._ZL11handleError14cudaError_enum+0x2b): undefined 
reference to `cuGetErrorString'
  /usr/bin/ld: 
tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in 
function `main':
  NVPTXArch.cpp:(.text.startup.main+0xcf): undefined reference to `cuInit'
  /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0xf9): undefined reference to 
`cuDeviceGetCount'
  /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x11e): undefined reference to 
`cuDeviceGet'
  /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x131): undefined reference to 
`cuDeviceGetAttribute'
  /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x146): undefined reference to 
`cuDeviceGetAttribute'
  collect2: error: ld returned 1 exit status


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-16 Thread Joseph Huber via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG9954516ffb10: [nvptx-arch] Dynamically load the CUDA runtime 
if not found during the build (authored by jhuber6).

Changed prior to commit:
  https://reviews.llvm.org/D141861?vs=489585=489609#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

Files:
  clang/tools/nvptx-arch/CMakeLists.txt
  clang/tools/nvptx-arch/NVPTXArch.cpp

Index: clang/tools/nvptx-arch/NVPTXArch.cpp
===
--- clang/tools/nvptx-arch/NVPTXArch.cpp
+++ clang/tools/nvptx-arch/NVPTXArch.cpp
@@ -11,6 +11,12 @@
 //
 //===--===//
 
+#include "llvm/Support/DynamicLibrary.h"
+#include "llvm/Support/Error.h"
+#include 
+#include 
+#include 
+
 #if defined(__has_include)
 #if __has_include("cuda.h")
 #include "cuda.h"
@@ -23,11 +29,53 @@
 #endif
 
 #if !CUDA_HEADER_FOUND
-int main() { return 1; }
-#else
+typedef enum cudaError_enum {
+  CUDA_SUCCESS = 0,
+  CUDA_ERROR_NO_DEVICE = 100,
+} CUresult;
 
-#include 
-#include 
+typedef enum CUdevice_attribute_enum {
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76,
+} CUdevice_attribute;
+
+typedef uint32_t CUdevice;
+
+CUresult (*cuInit)(unsigned int);
+CUresult (*cuDeviceGetCount)(int *);
+CUresult (*cuGetErrorString)(CUresult, const char **);
+CUresult (*cuDeviceGet)(CUdevice *, int);
+CUresult (*cuDeviceGetAttribute)(int *, CUdevice_attribute, CUdevice);
+
+constexpr const char *DynamicCudaPath = "libcuda.so";
+
+llvm::Error loadCUDA() {
+  std::string ErrMsg;
+  auto DynlibHandle = std::make_unique(
+  llvm::sys::DynamicLibrary::getPermanentLibrary(DynamicCudaPath, ));
+  if (!DynlibHandle->isValid()) {
+return llvm::createStringError(llvm::inconvertibleErrorCode(),
+   "Failed to 'dlopen' %s\n", DynamicCudaPath);
+  }
+#define DYNAMIC_INIT(SYMBOL)   \
+  {\
+void *SymbolPtr = DynlibHandle->getAddressOfSymbol(#SYMBOL);   \
+if (!SymbolPtr)\
+  return llvm::createStringError(llvm::inconvertibleErrorCode(),   \
+ "Failed to 'dlsym' " #SYMBOL);\
+SYMBOL = reinterpret_cast(SymbolPtr);\
+  }
+  DYNAMIC_INIT(cuInit);
+  DYNAMIC_INIT(cuDeviceGetCount);
+  DYNAMIC_INIT(cuGetErrorString);
+  DYNAMIC_INIT(cuDeviceGet);
+  DYNAMIC_INIT(cuDeviceGetAttribute);
+#undef DYNAMIC_INIT
+  return llvm::Error::success();
+}
+#else
+llvm::Error loadCUDA() { return llvm::Error::success(); }
+#endif
 
 static int handleError(CUresult Err) {
   const char *ErrStr = nullptr;
@@ -38,7 +86,13 @@
   return EXIT_FAILURE;
 }
 
-int main() {
+int main(int argc, char *argv[]) {
+  // Attempt to load the NVPTX driver runtime.
+  if (llvm::Error Err = loadCUDA()) {
+logAllUnhandledErrors(std::move(Err), llvm::errs());
+return EXIT_FAILURE;
+  }
+
   if (CUresult Err = cuInit(0)) {
 if (Err == CUDA_ERROR_NO_DEVICE)
   return EXIT_SUCCESS;
@@ -68,5 +122,3 @@
   }
   return EXIT_SUCCESS;
 }
-
-#endif
Index: clang/tools/nvptx-arch/CMakeLists.txt
===
--- clang/tools/nvptx-arch/CMakeLists.txt
+++ clang/tools/nvptx-arch/CMakeLists.txt
@@ -6,6 +6,8 @@
 # //
 # //======//
 
+set(LLVM_LINK_COMPONENTS Support)
+add_clang_tool(nvptx-arch NVPTXArch.cpp)
 
 # TODO: This is deprecated. Since CMake 3.17 we can use FindCUDAToolkit instead.
 find_package(CUDA QUIET)
@@ -15,14 +17,8 @@
   find_library(cuda-library NAMES cuda HINTS "${CUDA_LIBDIR}/stubs")
 endif()
 
-if (NOT CUDA_FOUND OR NOT cuda-library)
-  message(STATUS "Not building nvptx-arch: cuda runtime not found")
-  return()
+# If we found the CUDA library directly we just dynamically link against it.
+if (CUDA_FOUND AND cuda-library)
+  target_include_directories(nvptx-arch PRIVATE ${CUDA_INCLUDE_DIRS})
+  target_link_libraries(nvptx-arch PRIVATE ${cuda-library})
 endif()
-
-add_clang_tool(nvptx-arch NVPTXArch.cpp)
-
-set_target_properties(nvptx-arch PROPERTIES INSTALL_RPATH_USE_LINK_PATH ON)
-target_include_directories(nvptx-arch PRIVATE ${CUDA_INCLUDE_DIRS})
-
-target_link_libraries(nvptx-arch PRIVATE ${cuda-library})
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-16 Thread Shilei Tian via Phabricator via cfe-commits
tianshilei1992 accepted this revision.
tianshilei1992 added a comment.
This revision is now accepted and ready to land.

Yeah, otherwise I suppose there will be some errors when compiling OpenMP 
program when there is no CUDA installed.




Comment at: clang/tools/nvptx-arch/NVPTXArch.cpp:89
 
 int main() {
+  // Attempt to load the NVPTX driver runtime.

unrelated: I always prefer:
```
int main(int argc, char *argv[]) {
  return 0;
}
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141861/new/

https://reviews.llvm.org/D141861

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D141861: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

2023-01-16 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 created this revision.
jhuber6 added reviewers: jdoerfert, tianshilei1992, tra, yaxunl, 
JonChesterfield.
Herald added subscribers: mattd, gchakrabarti, asavonic.
Herald added a project: All.
jhuber6 requested review of this revision.
Herald added subscribers: cfe-commits, jholewinski.
Herald added a project: clang.

Much like the changes in D141859 , this patch 
allows the `nvptx-arch`
tool to be built and provided with every distrubition of LLVM / Clang.
This will make it more reliable for our toolchains to depend on. The
changes here configure a version that dynamically loads CUDA if it was
not found at build time.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D141861

Files:
  clang/tools/nvptx-arch/CMakeLists.txt
  clang/tools/nvptx-arch/NVPTXArch.cpp

Index: clang/tools/nvptx-arch/NVPTXArch.cpp
===
--- clang/tools/nvptx-arch/NVPTXArch.cpp
+++ clang/tools/nvptx-arch/NVPTXArch.cpp
@@ -11,6 +11,12 @@
 //
 //===--===//
 
+#include "llvm/Support/DynamicLibrary.h"
+#include "llvm/Support/Error.h"
+#include 
+#include 
+#include 
+
 #if defined(__has_include)
 #if __has_include("cuda.h")
 #include "cuda.h"
@@ -23,11 +29,53 @@
 #endif
 
 #if !CUDA_HEADER_FOUND
-int main() { return 1; }
-#else
+typedef enum cudaError_enum {
+  CUDA_SUCCESS = 0,
+  CUDA_ERROR_NO_DEVICE = 100,
+} CUresult;
 
-#include 
-#include 
+typedef enum CUdevice_attribute_enum {
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76,
+} CUdevice_attribute;
+
+typedef uint32_t CUdevice;
+
+CUresult (*cuInit)(unsigned int);
+CUresult (*cuDeviceGetCount)(int *);
+CUresult (*cuGetErrorString)(CUresult, const char **);
+CUresult (*cuDeviceGet)(CUdevice *, int);
+CUresult (*cuDeviceGetAttribute)(int *, CUdevice_attribute, CUdevice);
+
+constexpr const char *DynamicCudaPath = "libcuda.so";
+
+llvm::Error loadCUDA() {
+  std::string ErrMsg;
+  auto DynlibHandle = std::make_unique(
+  llvm::sys::DynamicLibrary::getPermanentLibrary(DynamicCudaPath, ));
+  if (!DynlibHandle->isValid()) {
+return llvm::createStringError(llvm::inconvertibleErrorCode(),
+   "Failed to 'dlopen' %s\n", DynamicCudaPath);
+  }
+#define DYNAMIC_INIT(SYMBOL)   \
+  {\
+void *SymbolPtr = DynlibHandle->getAddressOfSymbol(#SYMBOL);   \
+if (!SymbolPtr)\
+  return llvm::createStringError(llvm::inconvertibleErrorCode(),   \
+ "Failed to 'dlsym' " #SYMBOL);\
+SYMBOL = reinterpret_cast(SymbolPtr);\
+  }
+  DYNAMIC_INIT(cuInit);
+  DYNAMIC_INIT(cuDeviceGetCount);
+  DYNAMIC_INIT(cuGetErrorString);
+  DYNAMIC_INIT(cuDeviceGet);
+  DYNAMIC_INIT(cuDeviceGetAttribute);
+#undef DYNAMIC_INIT
+  return llvm::Error::success();
+}
+#else
+llvm::Error loadCUDA() { return llvm::Error::success(); }
+#endif
 
 static int handleError(CUresult Err) {
   const char *ErrStr = nullptr;
@@ -39,6 +87,12 @@
 }
 
 int main() {
+  // Attempt to load the NVPTX driver runtime.
+  if (llvm::Error Err = loadCUDA()) {
+logAllUnhandledErrors(std::move(Err), llvm::errs());
+return EXIT_FAILURE;
+  }
+
   if (CUresult Err = cuInit(0)) {
 if (Err == CUDA_ERROR_NO_DEVICE)
   return EXIT_SUCCESS;
@@ -68,5 +122,3 @@
   }
   return EXIT_SUCCESS;
 }
-
-#endif
Index: clang/tools/nvptx-arch/CMakeLists.txt
===
--- clang/tools/nvptx-arch/CMakeLists.txt
+++ clang/tools/nvptx-arch/CMakeLists.txt
@@ -6,6 +6,8 @@
 # //
 # //======//
 
+set(LLVM_LINK_COMPONENTS Support)
+add_clang_tool(nvptx-arch NVPTXArch.cpp)
 
 # TODO: This is deprecated. Since CMake 3.17 we can use FindCUDAToolkit instead.
 find_package(CUDA QUIET)
@@ -15,14 +17,8 @@
   find_library(cuda-library NAMES cuda HINTS "${CUDA_LIBDIR}/stubs")
 endif()
 
-if (NOT CUDA_FOUND OR NOT cuda-library)
-  message(STATUS "Not building nvptx-arch: cuda runtime not found")
-  return()
+# If we found the CUDA library directly we just dynamically link against it.
+if (CUDA_FOUND AND cuda-library)
+  target_include_directories(nvptx-arch PRIVATE ${CUDA_INCLUDE_DIRS})
+  target_link_libraries(nvptx-arch PRIVATE ${cuda-library})
 endif()
-
-add_clang_tool(nvptx-arch NVPTXArch.cpp)
-
-set_target_properties(nvptx-arch PROPERTIES INSTALL_RPATH_USE_LINK_PATH ON)
-target_include_directories(nvptx-arch PRIVATE ${CUDA_INCLUDE_DIRS})
-
-target_link_libraries(nvptx-arch PRIVATE ${cuda-library})
___
cfe-commits mailing list