Hello, Ludovic Courtès <ludovic.cour...@inria.fr> writes:
> Yeah, we could think about a transformation option. Maybe > ‘--with-configure-flags=python-pytorch=-DAMDGPU_TARGETS=xyz’ would work, > and if not, we can come up with a specific transformation and/or an > procedure that takes a list of architectures and returns a package. I think that would work for python-pytorch itself, but it would need to be set for all ROCm dependencies as well. It would be good to make sure that the targets for a package are a subset of the intersection of the targets specified for its dependencies. >>>> - Many tests assume a GPU to be present, so they need to be disabled. >>> >>> Yes. I/we’d like to eventually support that. (There’d need to be some >>> annotation in derivations or packages specifying what hardware is >>> required, and ‘cuirass remote-worker’, ‘guix offload’, etc. would need >>> to honor that.) >> >> That sounds like a good idea, could this also include CPU ISA >> extensions, such as AVX2 and AVX-512? > > That’d be great, yes. Don’t hold your breath though as I/we haven’t > scheduled work on this yet. If you’re interested in working on it, we > can discuss it of course. I am definitively interested, but am not familiar with Cuirass. Would this also require support by the build daemon to determine which hardware is available? >> I think the issue is simply that elf-file? just checks the magic bytes >> and has-elf-header? checks for the entire header. If the former returns >> #t and the latter #f, an error is raised by parse-elf in guix/elf.scm. >> It seems some ROCm (or tensile?) ELF files have another header format. > > Uh, never came across such a situation. What’s so special about those > ELF files? How are they created? After checking again, I noticed that the error actually only occurs for rocblas. :) Here, the problematic ELF files are generated by Tensile [1], and are installed in lib/rocblas/library (by library/src/CMakeLists.txt, which calls a CMake function from the Tensile package). They are shared object libraries for the GPU architecture(s) [2]. Tensile uses `clang-offload-builder` (from rocm-toolchain) to create the files, and it seems to me that the "ELF" header comes from there, but I don't know why it is special. Thanks, David [1] https://github.com/ROCm/Tensile/blob/17df881bde80fc20f997dfb290f4bb4b0e05a7e9/Tensile/TensileCreateLibrary.py#L283 [2] https://github.com/ROCm/Tensile/wiki/TensileCreateLibrary#code-object-libraries