[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD
The same problem can be reproduced with the sbsa 510 driver from the NVIDIA CUDA repository (see https://developer.nvidia.com/cuda- downloads?target_os=Linux_arch=arm64-sbsa=Native=Ubuntu_version=22.04_type=deb_network) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971978 Title: Driver binaries fail to load on arm64 through LXD To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD
> Separately i'm failing to get even as far as the bug description gets > as i get failures on missing newuidmap binary and what not: The log looks exactly like what you should see. The error about missing newuidmap/newgidmap is expected as LXD writes the uidmap for the container directly and doesn't use the two tools to do that. The first two errors is what we look for: lxc evident-oyster 20220509154201.271 ERROR conf - conf.c:run_buffer:321 - Script exited with status 1 lxc evident-oyster 20220509154201.271 ERROR conf - conf.c:lxc_setup:4400 - Failed to run mount hooks Any error following after this is just a consequence of the start process failing here. If you have the NVIDIA drivers installed, as Stephane mentions, and also set $ lxc config set c0 raw.lxc lxc.log.level=0 you will get the error as given in the description. If you don't enable trace logging you will always remain with just lxc evident-oyster 20220509154201.271 ERROR conf - conf.c:lxc_setup:4400 - Failed to run mount hooks -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971978 Title: Driver binaries fail to load on arm64 through LXD To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD
For nvidia.runtime=true to work, you need an NVIDIA driver as well as the CUDA library on the host. The libnvidia-container part is identical on both architectures and has been used by Anbox before, so we're pretty confident it works. Just not on 22.04 hosts. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971978 Title: Driver binaries fail to load on arm64 through LXD To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD
Separately i'm failing to get even as far as the bug description gets, as i get failures on missing newuidmap binary and what not: $ lxc info --show-log local:evident-oyster Name: evident-oyster Status: STOPPED Type: container Architecture: aarch64 Created: 2022/05/09 15:41 UTC Last Used: 2022/05/09 15:42 UTC Log: lxc evident-oyster 20220509154201.159 WARN conf - conf.c:lxc_map_ids:3592 - newuidmap binary is missing lxc evident-oyster 20220509154201.160 WARN conf - conf.c:lxc_map_ids:3598 - newgidmap binary is missing lxc evident-oyster 20220509154201.165 WARN conf - conf.c:lxc_map_ids:3592 - newuidmap binary is missing lxc evident-oyster 20220509154201.165 WARN conf - conf.c:lxc_map_ids:3598 - newgidmap binary is missing lxc evident-oyster 20220509154201.271 ERRORconf - conf.c:run_buffer:321 - Script exited with status 1 lxc evident-oyster 20220509154201.271 ERRORconf - conf.c:lxc_setup:4400 - Failed to run mount hooks lxc evident-oyster 20220509154201.271 ERRORstart - start.c:do_start:1275 - Failed to setup container "evident-oyster" lxc evident-oyster 20220509154201.271 ERRORsync - sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4) lxc evident-oyster 20220509154201.278 WARN network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 0 from "eth0" to its initial name "veth9d667e66" lxc evident-oyster 20220509154201.278 ERRORlxccontainer - lxccontainer.c:wait_on_daemonized_start:877 - Received container state "ABORTING" instead of "RUNNING" lxc evident-oyster 20220509154201.278 ERRORstart - start.c:__lxc_start:2074 - Failed to spawn container "evident-oyster" lxc evident-oyster 20220509154201.278 WARN start - start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 17 for process 5868 lxc evident-oyster 20220509154206.359 WARN conf - conf.c:lxc_map_ids:3592 - newuidmap binary is missing lxc evident-oyster 20220509154206.359 WARN conf - conf.c:lxc_map_ids:3598 - newgidmap binary is missing lxc 20220509154206.394 ERRORaf_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response lxc 20220509154206.394 ERRORcommands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state" As a precaution, can we double check and compare how different lxd snaps are between x86_64 and aarch64 builds? For example to exclude differences in staged binaries / packages / etc. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971978 Title: Driver binaries fail to load on arm64 through LXD To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD
Right, nvidia-container-cli is specifically designed to use files from the host (outside of snap environment) as the files it loads (through dlopen) cannot be bundled (cuda, driver files, ...). nvidia-container-cli has logic to effectively chroot prior to processing any of the dlopen. It's then expected that the driver libraries on the host are generally built conservatively and can be loaded even by a slightly older version. It's true that moving LXD to core22 would certainly solve the error here, though it would trade it for another problem, which is that those same libraries, which nvidia-container-cli passes through to the container would then only work on 22.04 containers or up. On amd64 22.04, this all works, including passing through the driver libraries and binaries from the host to a container as old as Ubuntu 18.04. So something weird happened with the equivalent arm64 build here. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971978 Title: Driver binaries fail to load on arm64 through LXD To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD
The LXD snap itself isn't the problem. It only includes the nvidia- container-cli utility (see https://github.com/NVIDIA/libnvidia- container) which works as expected but fails to map to the driver binaries from the host into the container environment due to the incompatible symbols. This way you make e.g. bionic LXD containers work with the host NVIDIA driver binaries from jammy. This works well with the instructions given above on amd64 on Ubuntu 22.04 with the same 510 driver from the archive. For whatever reason the arm64 binaries are different though and depend on symbols not available in < 22.04 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971978 Title: Driver binaries fail to load on arm64 through LXD To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD
It would be worth checking with lxd team if the lxd snap is building compatible container nvidia bits on arm64. Cause it looks like maybe one needs lxd based on core22 base to have this all work. or like wait for us to provide 510 nvidia drivers on arm64 on focal; and use that with focal based lxd snap. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971978 Title: Driver binaries fail to load on arm64 through LXD To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs