Public bug reported:

Discovered on bionic, arm64 (Moonshot, verified on multiple swirlix
cartridges), 4.15.0-22-generic.

After deploying the nova-compute Juju charm, on subsequent reboots,
within a few seconds after complete boot, everything will freeze and
eventually display on the serial console (just these, no traces):

[  188.010510] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [juju-log:2272]
[  216.010292] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [juju-log:2272]

(From here on, "lock up" refers to that sequence: boot a kernel, it
completes boot to login prompt, then everything freezes a few seconds
later, then BUGs.)

It's usually but not always juju-log, sometimes a relation-ids or
similar.  I was able to briefly notice that it was in its startup
config-changed hook.

I've separated out and tested nearly everything it does during its
startup config-changed (sets up bridging, writes some config files,
restarts libvirtd/nova-compute/etc) without being able to trigger the
bug, but I suspect proximity to boot is a factor.  If I disable jujud-
unit-nova-compute startup, boot, log in, re-enable and start (by which
time over a minute or so has elapsed from boot finish), it will not lock
up.  Similarly, if I wrap the jujud startup in a `strace -Ff -o
/var/log/strace.log` (which slows it down massively), it will not lock
up.  Watched pot syndrome.

I've tried kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/ .
I noticed most of the recent arm64 mainline kernels had failed builds,
notified the kernel team channel and apw fixed the issue and started
some rebuilds.

What I've discovered (after many dead ends and a futile bisection) is
that mainline builds before the rebuilds lock up, but fixed mainline
builds initiated by apw DO NOT lock up.  e.g. 4.16.3-041603.201804190730
locks up, but 4.16.6-041606.201806042022 does not lock up.  (4.16.4 and
4.16.5 appear to have never been rebuilt and don't have arm64 debs, and
that period is what I tried to bisect after figuring a fix must be in
there.)

But when I try to compile any of these recent kernels myself, they lock
up when booted.  Same kernel configs, tried on both bionic and in a
cosmic chroot, tried both native arm64 compile and cross-compile from
amd64. e.g. 4.16.6-041606.201806042022 from k.u.c does not lock up, but
when I build it myself, it does.

TBC, I've verified lock ups on the following kernels (all assume kernel
configs from their respective Ubuntu or k.u.c mainline builds):

- 4.15.0-22-generic from bionic (both Ubuntu-provided and my own recompile)
- v4.16 (and all point releases)
- v4.17

As I write this, my compiled v4.10 DOES NOT appear to lock up.  I will
attempt to bisect at a macro level from 4.10..4.15 and dig deeper.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-22-generic 4.15.0-22.24
ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
Uname: Linux 4.15.0-22-generic aarch64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116,  1 Jun  2 04:22 seq
 crw-rw---- 1 root audio 116, 33 Jun  2 04:22 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: arm64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
Date: Fri Jun  8 00:13:05 2018
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
PciMultimedia:
 
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB:
 
ProcKernelCmdLine: console=ttyS0,9600n8r ro
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-22-generic N/A
 linux-backports-modules-4.15.0-22-generic  N/A
 linux-firmware                             1.173.1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Confirmed


** Tags: apport-bug arm64 bionic uec-images

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1775732

Title:
  arm64 soft lock crashes on nova-compute charm running

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1775732/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to