Source: kokkos
Severity: wishlist
X-Debbugs-Cc: c...@slerp.xyz

Dear Maintainer,

The kokkos project has an AMD ROCm backend for hardware acceleration on
AMD GPUs. It would be good to enable this functionality. The AMD backend
depends on the following packages:

hipcc
librocthrust-dev

The kokkos library uses an unusual set of options to choose the AMD GPU
targets. Please see this table from the Spack package [1] that translates
from the LLVM target name to the kokkos architecture suffix:

    amdgpu_arch_map = {
        "gfx900": "vega900",
        "gfx906": "vega906",
        "gfx908": "vega908",
        "gfx90a": "vega90A",
        "gfx940": "amd_gfx940",
        "gfx942": "amd_gfx942",
        "gfx1030": "navi1030",
        "gfx1100": "navi1100",
    }

The names are somewhat misleading because gfx908 and gfx90a are not
vega. The gfx908 and gfx90a architectures were arcturus and aldebaran,
respectively. The option name doesn't affect anything, but perhaps that
clarification will help avoid confusion in the future.

I would suggest enabling all of the architectures listed above, except
perhaps gfx940, as that architecture was only used for early MI300
engineering samples. All retail MI300 hardware is gfx942.

The kokkos architecture names would be used with the Kokkos_ARCH_
prefix, so I believe this would result in the following flags:

-DKokkos_ARCH_vega900=ON
-DKokkos_ARCH_vega906=ON
-DKokkos_ARCH_vega908=ON
-DKokkos_ARCH_vega90A=ON
-DKokkos_ARCH_amd_gfx942=ON
-DKokkos_ARCH_navi1030=ON
-DKokkos_ARCH_navi1100=ON

You may also need:
-DKokkos_ENABLE_HIP=ON
-DKokkos_ENABLE_ROCTHRUST=ON

Compiling custom kernels will require the use of hipcc as the C++
compiler. This can be done with CXX=hipcc before calling into CMake. Any
options that are incompatible with device code can be restricted to host
code by prefixing -Xarch_host. For an example of this, see the rocthrust
package [2].

I would suggest adding -DROCPRIM_USE_ARCH_CONVERSION to the
DEB_CXXFLAGS_MAINT_PREPEND flags when building kokkos. This is a rocprim
build option that affects rocthrust. It causes gfx902, gfx909, and
gfx90c to use gfx900 code paths and causes gfx1031, gfx1032, gfx1033,
gfx1034, gfx1035, and gfx1036 to use gfx1030 code paths. It may or may
not be sufficient to kokkos support to those architectures, but it will
at least be necessary.

If you require access to compatible hardware to test this functionality,
please reach out privately and I may be able to help.

Sincerely,
Cory Bloor

[1]: 
https://github.com/spack/spack/blob/0191e15a6a0c86520152694d2a75c3e595763d0a/var/spack/repos/builtin/packages/kokkos/package.py#L163-L172
[2]: 
https://salsa.debian.org/rocm-team/rocthrust/-/blob/debian/5.7.1-3/debian/rules#L8-10

-- System Information:
Debian Release: trixie/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.10.11-amd64 (SMP w/32 CPU threads; PREEMPT)
Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: unable to detect

Reply via email to