On Mon, Apr 5, 2021 at 7:33 PM Jeff Hammond <jeff.scie...@gmail.com> wrote:
> NVCC has supported multi-versioned "fat" binaries since I worked for > Argonne. Libraries should figure out what the oldest hardware they are > about is and then compile for everything from that point forward. Kepler > (3.5) is oldest version any reasonable person should be thinking about at > this point. The oldest thing I know of in the DOE HPC fleet is Pascal > (6.x). Volta and Turing are 7.x and Ampere is 8.x. > > The biggest architectural changes came with unified memory ( > https://developer.nvidia.com/blog/unified-memory-in-cuda-6/) and > cooperative (https://developer.nvidia.com/blog/cooperative-groups/ in > CUDA 9) but Kokkos doesn't use the latter. Both features can be used on > quite old GPU architectures, although the performance is better on newer > ones. > > I haven't dug into what Kokkos and PETSc are doing but the direct use of > this stuff in CUDA is well-documented, certainly as well as the CPU > switches for x86 binaries in the Intel compiler are. > > > https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities > > Devices with the same major revision number are of the same core > architecture. The major revision number is 8 for devices based on the NVIDIA > Ampere GPU architecture, 7 for devices based on the Volta architecture, 6 > for devices based on the Pascal architecture, 5 for devices based on the > Maxwell architecture, 3 for devices based on the Kepler architecture, 2 > for devices based on the Fermi architecture, and 1 for devices based on > the Tesla architecture. > Kokkos has config options Kokkos_ARCH_TURING75, Kokkos_ARCH_VOLTA70, Kokkos_ARCH_VOLTA72. Any idea how one can map compute capability versions to arch names? > > > > https://docs.nvidia.com/cuda/pascal-compatibility-guide/index.html#building-pascal-compatible-apps-using-cuda-8-0 > > https://docs.nvidia.com/cuda/volta-compatibility-guide/index.html#building-volta-compatible-apps-using-cuda-9-0 > > https://docs.nvidia.com/cuda/turing-compatibility-guide/index.html#building-turing-compatible-apps-using-cuda-10-0 > > https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#building-ampere-compatible-apps-using-cuda-11-0 > > Programmatic querying can be done with the following ( > https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html): > > cudaDeviceGetAttribute > > - > > cudaDevAttrComputeCapabilityMajor > > <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg49e2f8c2c0bd6fe264f2fc970912e5cd220ff111a6616ab512e229d8f2f8bf87>: > Major compute capability version number; > - > > cudaDevAttrComputeCapabilityMinor > > <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gg49e2f8c2c0bd6fe264f2fc970912e5cd2c981c76c9de58d39502e483a7b484c7>: > Minor compute capability version number; > > The compiler help tells me this, which can be cross-referenced with CUDA > documentation above. > > $ /usr/local/cuda-10.0/bin/nvcc -h > > > Usage : nvcc [options] <inputfile> > > > ... > > > Options for steering GPU code generation. > > ========================================= > > > --gpu-architecture <arch> (-arch) > > > Specify the name of the class of NVIDIA 'virtual' GPU > architecture for which > > the CUDA input files must be compiled. > > With the exception as described for the shorthand below, the > architecture > > specified with this option must be a 'virtual' architecture (such > as compute_50). > > Normally, this option alone does not trigger assembly of the > generated PTX > > for a 'real' architecture (that is the role of nvcc option > '--gpu-code', > > see below); rather, its purpose is to control preprocessing and > compilation > > of the input to PTX. > > For convenience, in case of simple nvcc compilations, the > following shorthand > > is supported. If no value for option '--gpu-code' is specified, > then the > > value of this option defaults to the value of > '--gpu-architecture'. In this > > situation, as only exception to the description above, the value > specified > > for '--gpu-architecture' may be a 'real' architecture (such as a > sm_50), > > in which case nvcc uses the specified 'real' architecture and its > closest > > 'virtual' architecture as effective architecture values. For > example, 'nvcc > > --gpu-architecture=sm_50' is equivalent to 'nvcc > --gpu-architecture=compute_50 > > --gpu-code=sm_50,compute_50'. > > Allowed values for this option: > 'compute_30','compute_32','compute_35', > > > 'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61', > > > 'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35', > > > 'sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72', > > 'sm_75'. > > > --gpu-code <code>,... (-code) > > > Specify the name of the NVIDIA GPU to assemble and optimize PTX > for. > > nvcc embeds a compiled code image in the resulting executable for > each specified > > <code> architecture, which is a true binary load image for each > 'real' architecture > > (such as sm_50), and PTX code for the 'virtual' architecture > (such as compute_50). > > During runtime, such embedded PTX code is dynamically compiled by > the CUDA > > runtime system if no binary load image is found for the 'current' > GPU. > > Architectures specified for options '--gpu-architecture' and > '--gpu-code' > > may be 'virtual' as well as 'real', but the <code> architectures > must be > > compatible with the <arch> architecture. When the '--gpu-code' > option is > > used, the value for the '--gpu-architecture' option must be a > 'virtual' PTX > > architecture. > > For instance, '--gpu-architecture=compute_35' is not compatible > with '--gpu-code=sm_30', > > because the earlier compilation stages will assume the > availability of 'compute_35' > > features that are not present on 'sm_30'. > > Allowed values for this option: > 'compute_30','compute_32','compute_35', > > > 'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61', > > > 'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35', > > > 'sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72', > > 'sm_75'. > > > --generate-code <specification>,... (-gencode) > > > This option provides a generalization of the > '--gpu-architecture=<arch> --gpu-code=<code>, > > ...' option combination for specifying nvcc behavior with respect > to code > > generation. Where use of the previous options generates code for > different > > 'real' architectures with the PTX for the same 'virtual' > architecture, option > > '--generate-code' allows multiple PTX generations for different > 'virtual' > > architectures. In fact, '--gpu-architecture=<arch> > --gpu-code=<code>, > > ...' is equivalent to '--generate-code > arch=<arch>,code=<code>,...'. > > '--generate-code' options may be repeated for different virtual > architectures. > > Allowed keywords for this option: 'arch','code'. > > On Mon, Apr 5, 2021 at 1:19 PM Satish Balay via petsc-dev < > petsc-dev@mcs.anl.gov> wrote: > >> This is nvidia mess-up. Why isn't there a command that give me these >> values [if they insist on this interface for nvcc] >> >> I see Barry want configure to do something here - but whatever we do - we >> would be shifting the problem around. >> [even if we detect stuff - build box might not have the GPU used for >> runs.] >> >> We have --with-cuda-arch - which I tried to remove from configure - but >> its come back in a different form (--with-cuda-gencodearch) >> >> And I see other packages: >> >> --with-kokkos-cuda-arch >> >> Wrt spack - I'm having to do: >> >> spack install xsdk+cuda ^magma cuda_arch=60 >> >> [magma uses CudaPackage() infrastructure in spack] >> >> Satish >> >> On Mon, 5 Apr 2021, Mills, Richard Tran via petsc-dev wrote: >> >> > You raise a good point, Barry. I've been completely mystified by what >> some of these names even mean. What does "PASCAL60" vs. "PASCAL61" even >> mean? Do you know of where this is even documented? I can't really find >> anything about it in the Kokkos documentation. The only thing I can really >> find is an issue or two about "hey, shouldn't our CMake stuff figure this >> out automatically" and then some posts about why it can't really do that. >> Not encouraging. >> > >> > --Richard >> > >> > On 4/3/21 8:42 PM, Barry Smith wrote: >> > >> > >> > It would be very nice to NOT require PETSc users to provide this >> flag, how the heck will they know what it should be when we cannot automate >> it ourselves? >> > >> > Any ideas of how this can be determined based on the current system? >> NVIDIA does not help since these "advertising" names don't seem to >> trivially map to information you can get from a particular GPU when you >> logged into it. For example nvidia-smi doesn't use these names directly. Is >> there some mapping from nvidia-smi to these names we could use? If we are >> serious about having a non-trivial number of users utilizing GPUs, which we >> need to be for future, we cannot have this absurd demands in our >> installation process. >> > >> > Barry >> > >> > Does spack have some magic for this we could use? >> > >> > >> > >> > >> >> > > -- > Jeff Hammond > jeff.scie...@gmail.com > http://jeffhammond.github.io/ >