[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD

2022-05-18 Thread Simon Fels
The same problem can be reproduced with the sbsa 510 driver from the
NVIDIA CUDA repository (see https://developer.nvidia.com/cuda-
downloads?target_os=Linux_arch=arm64-sbsa=Native=Ubuntu_version=22.04_type=deb_network)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971978

Title:
  Driver binaries fail to load on arm64 through LXD

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD

2022-05-09 Thread Simon Fels
> Separately i'm failing to get even as far as the bug description gets
> as i get failures on missing newuidmap binary and what not:

The log looks exactly like what you should see. The error about missing
newuidmap/newgidmap is expected as LXD writes the uidmap for the
container directly and doesn't use the two tools to do that. The first
two errors is what we look for:

lxc evident-oyster 20220509154201.271 ERROR conf - conf.c:run_buffer:321 - 
Script exited with status 1
lxc evident-oyster 20220509154201.271 ERROR conf - conf.c:lxc_setup:4400 - 
Failed to run mount hooks

Any error following after this is just a consequence of the start
process failing here.

If you have the NVIDIA drivers installed, as Stephane mentions, and also
set

$ lxc config set c0 raw.lxc lxc.log.level=0

you will get the error as given in the description. If you don't enable
trace logging you will always remain with just

lxc evident-oyster 20220509154201.271 ERROR conf - conf.c:lxc_setup:4400
- Failed to run mount hooks

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971978

Title:
  Driver binaries fail to load on arm64 through LXD

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD

2022-05-09 Thread Stéphane Graber
For nvidia.runtime=true to work, you need an NVIDIA driver as well as
the CUDA library on the host.

The libnvidia-container part is identical on both architectures and has
been used by Anbox before, so we're pretty confident it works. Just not
on 22.04 hosts.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971978

Title:
  Driver binaries fail to load on arm64 through LXD

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD

2022-05-09 Thread Dimitri John Ledkov
Separately i'm failing to get even as far as the bug description gets,
as i get failures on missing newuidmap binary and what not:

$ lxc info --show-log local:evident-oyster
Name: evident-oyster
Status: STOPPED
Type: container
Architecture: aarch64
Created: 2022/05/09 15:41 UTC
Last Used: 2022/05/09 15:42 UTC

Log:

lxc evident-oyster 20220509154201.159 WARN conf - conf.c:lxc_map_ids:3592 - 
newuidmap binary is missing
lxc evident-oyster 20220509154201.160 WARN conf - conf.c:lxc_map_ids:3598 - 
newgidmap binary is missing
lxc evident-oyster 20220509154201.165 WARN conf - conf.c:lxc_map_ids:3592 - 
newuidmap binary is missing
lxc evident-oyster 20220509154201.165 WARN conf - conf.c:lxc_map_ids:3598 - 
newgidmap binary is missing
lxc evident-oyster 20220509154201.271 ERRORconf - conf.c:run_buffer:321 - 
Script exited with status 1
lxc evident-oyster 20220509154201.271 ERRORconf - conf.c:lxc_setup:4400 - 
Failed to run mount hooks
lxc evident-oyster 20220509154201.271 ERRORstart - start.c:do_start:1275 - 
Failed to setup container "evident-oyster"
lxc evident-oyster 20220509154201.271 ERRORsync - sync.c:sync_wait:34 - An 
error occurred in another process (expected sequence number 4)
lxc evident-oyster 20220509154201.278 WARN network - 
network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 
0 from "eth0" to its initial name "veth9d667e66"
lxc evident-oyster 20220509154201.278 ERRORlxccontainer - 
lxccontainer.c:wait_on_daemonized_start:877 - Received container state 
"ABORTING" instead of "RUNNING"
lxc evident-oyster 20220509154201.278 ERRORstart - start.c:__lxc_start:2074 
- Failed to spawn container "evident-oyster"
lxc evident-oyster 20220509154201.278 WARN start - start.c:lxc_abort:1039 - 
No such process - Failed to send SIGKILL via pidfd 17 for process 5868
lxc evident-oyster 20220509154206.359 WARN conf - conf.c:lxc_map_ids:3592 - 
newuidmap binary is missing
lxc evident-oyster 20220509154206.359 WARN conf - conf.c:lxc_map_ids:3598 - 
newgidmap binary is missing
lxc 20220509154206.394 ERRORaf_unix - 
af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - 
Failed to receive response
lxc 20220509154206.394 ERRORcommands - commands.c:lxc_cmd_rsp_recv_fds:127 
- Failed to receive file descriptors for command "get_state"


As a precaution, can we double check and compare how different lxd snaps are 
between x86_64 and aarch64 builds? For example to exclude differences in staged 
binaries / packages / etc.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971978

Title:
  Driver binaries fail to load on arm64 through LXD

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD

2022-05-06 Thread Stéphane Graber
Right, nvidia-container-cli is specifically designed to use files from
the host (outside of snap environment) as the files it loads (through
dlopen) cannot be bundled (cuda, driver files, ...).

nvidia-container-cli has logic to effectively chroot prior to processing any of 
the dlopen.
It's then expected that the driver libraries on the host are generally built 
conservatively and can be loaded even by a slightly older version.

It's true that moving LXD to core22 would certainly solve the error
here, though it would trade it for another problem, which is that those
same libraries, which nvidia-container-cli passes through to the
container would then only work on 22.04 containers or up.

On amd64 22.04, this all works, including passing through the driver
libraries and binaries from the host to a container as old as Ubuntu
18.04. So something weird happened with the equivalent arm64 build here.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971978

Title:
  Driver binaries fail to load on arm64 through LXD

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD

2022-05-06 Thread Simon Fels
The LXD snap itself isn't the problem. It only includes the nvidia-
container-cli utility (see https://github.com/NVIDIA/libnvidia-
container) which works as expected but fails to map to the driver
binaries from the host into the container environment due to the
incompatible symbols. This way you make e.g. bionic LXD containers work
with the host NVIDIA driver binaries from jammy.

This works well with the instructions given above on amd64 on Ubuntu
22.04 with the same 510 driver from the archive. For whatever reason the
arm64 binaries are different though and depend on symbols not available
in < 22.04

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971978

Title:
  Driver binaries fail to load on arm64 through LXD

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1971978] Re: Driver binaries fail to load on arm64 through LXD

2022-05-06 Thread Dimitri John Ledkov
It would be worth checking with lxd team if the lxd snap is building
compatible container nvidia bits on arm64.

Cause it looks like maybe one needs lxd based on core22 base to have
this all work.

or like wait for us to provide 510 nvidia drivers on arm64 on focal; and
use that with focal based lxd snap.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971978

Title:
  Driver binaries fail to load on arm64 through LXD

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-510/+bug/1971978/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs