jhuber6 added inline comments.

Comment at: clang/tools/nvptx-arch/CMakeLists.txt:19
+if (NOT CUDA_FOUND OR NOT cuda-library)
+  message(STATUS "Not building nvptx-arch: cuda runtime not found")
+  return()
tra wrote:
> Nit: libcuda.so is part of the NVIDIA driver which provides NVIDIA driver API 
> , It has nothing to do with the CUDA runtime.
> Here, it's actually not even the libcuda.so itself that's not found, but it's 
> stub. 
> I think a sensible error here should say "Failed to find stubs/libcuda.so in 
Good point. Never thought about the difference because they're both called 
`cuda` somewhere.

Comment at: clang/tools/nvptx-arch/CMakeLists.txt:25
+set_target_properties(nvptx-arch PROPERTIES INSTALL_RPATH_USE_LINK_PATH ON)
+target_include_directories(nvptx-arch PRIVATE ${CUDA_INCLUDE_DIRS})
tra wrote:
> Does it mean that the executable will have RPATH pointing to 
> CUDA_LIBDIR/stubs?
> This should not be necessary. The stub shipped with CUDA comes as 
> "libcuda.so" only. It's SONAME is libcuda.so.1, but there's no symlink with 
> that name in stubs, so RPATH pointing there will do nothing. At runtime, 
> dynamic linker will attempt to open libcuda.so.1 and it will only be found 
> among the actual libraries installed by NVIDIA drivers.
Interesting, I can probably delete it. Another thing I mostly just copied from 
the existing tool.

Comment at: clang/tools/nvptx-arch/NVPTXArch.cpp:26
+int main() { return 1; }
tra wrote:
> How do we distinguish "we didn't have CUDA at build time" reported here from 
> "some driver API failed with CUDA_ERROR_INVALID_VALUE=1" ?
I guess the latter would print an error message. We do the same thing with the 
`amdgpu-arch` so I just copied it.

Comment at: clang/tools/nvptx-arch/NVPTXArch.cpp:34
+  const char *ErrStr = nullptr;
+  CUresult Result = cuGetErrorString(Err, &ErrStr);
+  if (Result != CUDA_SUCCESS)
tra wrote:
> One problem with this approach is that `nvptx-arch` will fail to run on a 
> machine without NVIDIA drivers installed because dynamic linker will not find 
> `libcuda.so.1`.
> Ideally we want it to run on any machine and fail the way we want.
> A typical way to achieve that is to dlopen("libcuda.so.1"), and obtain the 
> pointers to the functions we need via `dlsym()`.
We do this in the OpenMP runtime. I mostly copied this approach from the 
existing `amdgpu-arch` but we could change both to use this method.

Comment at: clang/tools/nvptx-arch/NVPTXArch.cpp:63
+    printf("sm_%d%d\n", Major, Minor);
+  }
tra wrote:
> jhuber6 wrote:
> > tianshilei1992 wrote:
> > > Do we want to include device number here?
> > For `amdgpu-arch` and here we just have it implicitly in the order, so the 
> > n-th line is the n-th device, i.e.
> > ```
> > sm_70 // device 0
> > sm_80 // device 1
> > sm_70 // device 2
> > ```
> NVIDIA GPU enumeration order is more or less arbitrary. By default it's 
> arranged by "sort of fastest GPU first", but can be rearranged in order of 
> PCI(e) bus IDs or in an arbitrary user-specified order using 
> `CUDA_VISIBLE_DEVICES`. Printing compute capability in the enumeration order 
> is pretty much all the user needs.  If we want to print something uniquely 
> identifying the device, we would need to pring the device UUID, similarly to 
> what `nvidia-smi -L` does. Or PCIe bus IDs. In other words -- we can uniquely 
> identify devices, but there's no such thing as inherent canonical order among 
> the devices.
I think it's mostly just important that it prints a valid GPU. Most of the uses 
for this tool will just be "Give me a valid GPU I can run on this machine".

  rG LLVM Github Monorepo



cfe-commits mailing list

Reply via email to