Just curious if you made progress with the bisect?  If you need, I can
assist with the bisect and build test kernels for you.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1775732

Title:
  arm64 soft lock crashes on nova-compute charm running

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Confirmed

Bug description:
  Discovered on bionic, arm64 (Moonshot, verified on multiple swirlix
  cartridges), 4.15.0-22-generic.

  After deploying the nova-compute Juju charm, on subsequent reboots,
  within a few seconds after complete boot, everything will freeze and
  eventually display on the serial console (just these, no traces):

  [  188.010510] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! 
[juju-log:2272]
  [  216.010292] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! 
[juju-log:2272]

  (From here on, "lock up" refers to that sequence: boot a kernel, it
  completes boot to login prompt, then everything freezes a few seconds
  later, then BUGs.)

  It's usually but not always juju-log, sometimes a relation-ids or
  similar.  I was able to briefly notice that it was in its startup
  config-changed hook.

  I've separated out and tested nearly everything it does during its
  startup config-changed (sets up bridging, writes some config files,
  restarts libvirtd/nova-compute/etc) without being able to trigger the
  bug, but I suspect proximity to boot is a factor.  If I disable jujud-
  unit-nova-compute startup, boot, log in, re-enable and start (by which
  time over a minute or so has elapsed from boot finish), it will not
  lock up.  Similarly, if I wrap the jujud startup in a `strace -Ff -o
  /var/log/strace.log` (which slows it down massively), it will not lock
  up.  Watched pot syndrome.

  I've tried kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/
  .  I noticed most of the recent arm64 mainline kernels had failed
  builds, notified the kernel team channel and apw fixed the issue and
  started some rebuilds.

  What I've discovered (after many dead ends and a futile bisection) is
  that mainline builds before the rebuilds lock up, but fixed mainline
  builds initiated by apw DO NOT lock up.  e.g.
  4.16.3-041603.201804190730 locks up, but 4.16.6-041606.201806042022
  does not lock up.  (4.16.4 and 4.16.5 appear to have never been
  rebuilt and don't have arm64 debs, and that period is what I tried to
  bisect after figuring a fix must be in there.)

  But when I try to compile any of these recent kernels myself, they
  lock up when booted.  Same kernel configs, tried on both bionic and in
  a cosmic chroot, tried both native arm64 compile and cross-compile
  from amd64. e.g. 4.16.6-041606.201806042022 from k.u.c does not lock
  up, but when I build it myself, it does.

  TBC, I've verified lock ups on the following kernels (all assume
  kernel configs from their respective Ubuntu or k.u.c mainline builds):

  - 4.15.0-22-generic from bionic (both Ubuntu-provided and my own recompile)
  - v4.16 (and all point releases)
  - v4.17

  As I write this, my compiled v4.10 DOES NOT appear to lock up.  I will
  attempt to bisect at a macro level from 4.10..4.15 and dig deeper.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-22-generic 4.15.0-22.24
  ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
  Uname: Linux 4.15.0-22-generic aarch64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Jun  2 04:22 seq
   crw-rw---- 1 root audio 116, 33 Jun  2 04:22 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.2
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Fri Jun  8 00:13:05 2018
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: console=ttyS0,9600n8r ro
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-22-generic N/A
   linux-backports-modules-4.15.0-22-generic  N/A
   linux-firmware                             1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1775732/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to