[PATCH] D114326: Update the list of CUDA versions up to 11.5

Mojca Miklavec via Phabricator via cfe-commits Wed, 24 Nov 2021 11:47:49 -0800

mojca added inline comments.


================
Comment at: clang/lib/Driver/ToolChains/Cuda.cpp:131
+  std::initializer_list<const char *> Versions = {
+      "11.5", "11.4", "11.3", "11.2", "11.1", "11.0", "10.2", "10.1",
+      "10.0", "9.2",  "9.1",  "9.0",  "8.0",  "7.5",  "7.0"};
----------------
tra wrote:
> tra wrote:
> > mojca wrote:
> > > carlosgalvezp wrote:
> > > > mojca wrote:
> > > > > tra wrote:
> > > > > > tra wrote:
> > > > > > > mojca wrote:
> > > > > > > > kadircet wrote:
> > > > > > > > > looks like the list is getting big and hard to maintain. 
> > > > > > > > > considering that this is done only once per compiler 
> > > > > > > > > invocation (and we check for existence of directories down in 
> > > > > > > > > the loop anyway). what about throwing in an extra directory 
> > > > > > > > > listing to base-directories mentioned down below and populate 
> > > > > > > > > `Candidates` while preserving the newest-version-first order?
> > > > > > > > I totally agree with the sentiment, and that was my initial 
> > > > > > > > thought as well, but with zero experience I was too scared to 
> > > > > > > > make any more significant changes.
> > > > > > > > 
> > > > > > > > I can try to come up with a new patch (that doesn't need 
> > > > > > > > further maintenance whenever a new CUDA version gets released) 
> > > > > > > > if that's what you are suggesting. I would nevertheless merge 
> > > > > > > > this one, and prepare a new more advanced patch separately, but 
> > > > > > > > that's finally your call.
> > > > > > > > 
> > > > > > > > What's your suggestion about D.SysRoot + "Program Files/..."? 
> > > > > > > > At the time when this function gets called it looks like 
> > > > > > > > D.SysRoot is empty (or at least my debugger says so) and in my 
> > > > > > > > case it resolves to D: while the CUDA support is installed 
> > > > > > > > under C:.
> > > > > > > > 
> > > > > > > > Is there any special LLVM-specific/preferrable way to iterate 
> > > > > > > > through directories?
> > > > > > > > 
> > > > > > > > (What I also miss a bit in the whole process in an option to 
> > > > > > > > simply say "I want CUDA 11.1" without the need to explicitly 
> > > > > > > > spell out the full path.)
> > > > > > > > 
> > > > > > > > If you provide me give some general guidelines, I'll prepare 
> > > > > > > > another, hopefully more future-proof patch.
> > > > > > > > 
> > > > > > > > (Side note: I'm not sure if I'm calling clang-format correctly, 
> > > > > > > > but if I call it, it keeps reformatting the rest of this file.)
> > > > > > > This whole list may no longer be particularly useful. The most 
> > > > > > > common use case on Linux, AFAICT, is to install only one CUDA 
> > > > > > > version using system-provided package manager.
> > > > > > > E.g. 
> > > > > > > https://packages.ubuntu.com/focal/amd64/nvidia-cuda-toolkit/filelist
> > > > > > > 
> > > > > > > TBH, I'm tempted to limit autodetection to only that one 
> > > > > > > system-default version and require user to use --cuda-path if 
> > > > > > > they need something else.
> > > > > > I think on windows (I mean the windows environment itself, not 
> > > > > > WSL), CUDA installer sets an environment variable which could be 
> > > > > > used to detect the default CUDA version, so it may warrant a 
> > > > > > windows-specific way to find it. 
> > > > > On Windows this is certainly not the case. Unless the installation is 
> > > > > changed manually, one always gets the new version installed into a 
> > > > > new directory.
> > > > > 
> > > > > I really do need multiple versions on Windows (and the ability to 
> > > > > pick an older one) if I want to compile a binary that works on 
> > > > > someone else's computer (if I compile against the latest CUDA, users 
> > > > > need "the latest" drivers that may sometimes not even be available 
> > > > > for their machine).
> > > > > 
> > > > > (That said, at least when using CMake, the selection needs to be done 
> > > > > by CMake anyway, and maybe CMake could be forced to specify the 
> > > > > correct flag automatically.)
> > > > > 
> > > > > So even if the functionality gets removed from non-Windows platforms, 
> > > > > it would be really nice to keep it for Windows.
> > > > > 
> > > > > Now, there are three "conflicting" feedbacks/suggestions above. I can 
> > > > > try to improve support, but it would be really helpful to reach the 
> > > > > consensus of what needs to be done before coding it.
> > > > > one always gets the new version installed into a new directory.
> > > > A similar thing happens on Linux.
> > > > 
> > > > > users need "the latest" drivers
> > > > Since CUDA 10.2, there's some "[[ 
> > > > https://docs.nvidia.com/deploy/cuda-compatibility/ | compatibility mode 
> > > > ]]" that allows to run newer CUDA on older drivers. As long as you are 
> > > > not using the latest features, of course.
> > > > 
> > > > I'm personally all up for removing redundancy and duplication. 
> > > I'm following https://docs.nvidia.com/cuda/wsl-user-guide/index.html 
> > > right now and the NVIDIA's "official packages" for Ubuntu get installed 
> > > under `/usr/local/cuda-11.x`.
> > > 
> > > That sounds significant enough to me to argue against the removal of 
> > > versioned folders from search.
> > > I'm following https://docs.nvidia.com/cuda/wsl-user-guide/index.html 
> > > right now and the NVIDIA's "official packages" for Ubuntu get installed 
> > > under /usr/local/cuda-11.x.
> > 
> > OK, that's something that will be used often enough, so still need to deal 
> > with versioned directories. :-(
> > 
> > > That sounds significant enough to me to argue against the removal of 
> > > versioned folders from search.
> > 
> > I'm not against probing for multiple versions in principle.
> > 
> > What I'm saying is that:
> > - blindly increasing the number of probed directories comes with a price 
> > and should be done cautiously.
> > - it may be a good time to revisit how we detect CUDA installations and 
> > make sure it makes sense now.
> > 
> > There are two considerations.
> > 
> > First is the overhead it adds to compiler driver. While for most users 
> > running locally it's negligible, it would be noticed for the users who may 
> > have /usr/local mounted over NFS, which is not that unusual in 
> > institutional environments. Another thing to consider that it will be 
> > noticed by *all* such users, even if they don't use CUDA. E.g. all C++ 
> > compilations will be probing for those directories.
> > 
> > The other issue is that we should differentiate between finding the 
> > canonical location of CUDA SDK, vs picking one out of many CUDA SDK 
> > versions that may be installed simultaneously. I'd argue that in case of 
> > having multiple versions compiler has no business picking one version over 
> > another (though we could make an attempt to use the most recent one as we 
> > do now). It's up to the user to explicitly specify the one they want. When 
> > we allow to pick one version out of many, it makes it way too easy for a 
> > user to end up with a mix of the default version and the version they want, 
> > all they need to do is to forget `--cuda-path` somewhere in their build.
> > 
> > 
> > For this patch, I propose to drop the 9.x and 8.x so we keep the number of 
> > probed paths under control.
> > Since CUDA 10.2, there's some "compatibility mode" that allows to run newer 
> > CUDA on older drivers.
> 
> This only works with very specific versions of the drivers and those are not 
> very common on the end-user machines. It's mostly for the datacenter use.
> I think on windows (I mean the windows environment itself, not WSL), CUDA 
> installer sets an environment variable which could be used to detect the 
> default CUDA version, so it may warrant a windows-specific way to find it.

I see that I have
```
CUDA_PATH       = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4
CUDA_PATH_V11_4 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4
CUDA_PATH_V11_3 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3
```
etc. Using those directly might in fact be a good idea.

> The other issue is that we should differentiate between finding the canonical 
> location of CUDA SDK, vs picking one out of many CUDA SDK versions that may 
> be installed simultaneously.

Agreed. (I also wish clang would accept something like `--cuda-version=11.4` 
that would automatically work on Windows and Linux whenever CUDA resides under 
`/usr/local/cuda-x.y`)

I wanted to add another strange observation. Today I installed the latest CUDA 
(11.5) and clang ("14") under Ubuntu 20.04 inside WSL 2 on Windows 11 
(equivalent to a regular Ubuntu, I would say). I had to explicitly provide the 
`-L` flag despite using `--cuda-path` (else the linker fails) which sounds 
relatively strange to me:
```
clang++ --cuda-path=/usr/local/cuda-11.5 -l cudart -L 
/usr/local/cuda-11.5/lib64 hello.c -o hello
```

> First is the overhead it adds to compiler driver. While for most users 
> running locally it's negligible, it would be noticed for the users who may 
> have /usr/local mounted over NFS, which is not that unusual in institutional 
> environments. Another thing to consider that it will be noticed by *all* such 
> users, even if they don't use CUDA. E.g. all C++ compilations will be probing 
> for those directories.

Are you saying that clang executes this portion of the code even when it knows 
that it's not compiling *.cu files? Why does it need to know anything at all 
about directories related to CUDA? (I'm curious, but there's no real need to 
answer, it's outside the scope of this ticket ;)

> For this patch, I propose to drop the 9.x and 8.x so we keep the number of 
> probed paths under control.

Do you want me to upload another patch that only keeps versions 10.0 up to 11.5 
on the list, and then open a new ticket in the tracker describing what the 
rework should be, and potentially start working towards that goal as a separate 
patch?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114326/new/

https://reviews.llvm.org/D114326

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D114326: Update the list of CUDA versions up to 11.5

Reply via email to