apport information

** Attachment added: "HookError_generic.txt"
   
https://bugs.launchpad.net/bugs/1766543/+attachment/5168118/+files/HookError_generic.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1766543

Title:
  instance deletion takes a while and blocks nova-compute

Status in linux package in Ubuntu:
  Confirmed
Status in nova package in Ubuntu:
  New

Bug description:
  Hi,

  I have a cloud running xenial/mitaka (with 18.02 charms).

  Sometimes, an instance will take minutes to delete. I tracked down the
  time taken to be file deletion :

  Apr 23 07:23:00 hostname nova-compute[54255]: 2018-04-23 07:23:00.920
  54255 INFO nova.virt.libvirt.driver [req-
  35ccfe64-9280-4de6-ae88-045ca91bf90f bc0ab055427645aca4ed09266e85b1db
  1cb457a8302543fea067e5f14b5241e7 - - -] [instance: 97731f51-63be-4056
  -869f-084b38580b9a] Deleting instance files
  /srv/nova/instances/97731f51-63be-4056-869f-084b38580b9a_del

  Apr 23 07:27:33 hostname nova-compute[54255]: 2018-04-23 07:27:33.767
  54255 INFO nova.virt.libvirt.driver [req-
  35ccfe64-9280-4de6-ae88-045ca91bf90f bc0ab055427645aca4ed09266e85b1db
  1cb457a8302543fea067e5f14b5241e7 - - -] [instance: 97731f51-63be-4056
  -869f-084b38580b9a] Deletion of /srv/nova/instances/97731f51-63be-4056
  -869f-084b38580b9a_del complete

  
  As you can see, 4 minutes and 33 seconds have elapsed between the 2 lines. 
nova-compute logs absolutely _nothing_ during this time. Periodic tasks are not 
run, etc... Generally, a deletion takes a few seconds top.

  The logs above are generally immediately followed by :

  Apr 23 07:27:33 hostname nova-compute[54255]: 2018-04-23 07:27:33.771
  54255 DEBUG oslo.messaging._drivers.impl_rabbit [req-
  35ccfe64-9280-4de6-ae88-045ca91bf90f bc0ab055427645aca4ed09266e85b1db
  1cb457a8302543fea067e5f14b5241e7 - - -] Received recoverable error
  from kombu: on_error /usr/lib/python2.7/dist-
  packages/oslo_messaging/_drivers/impl_rabbit.py:683

  (which is error: [Errno 104] Connection reset by peer)

  because nova-compute doesn't even maintain the rabbitmq connection (on
  the rabbitmq server I can see errors about "Missed heartbeats from
  client, timeout: 60s").

  So nova-compute appears to be "frozen" during several minutes. This
  can cause problems because events can be missed, etc...

  We have telegraf on this host, and there's little to no CPU, disk,
  network or memory activity at that time. Nothing relevant in kern.log
  either. And this is happening on 3 different architectures, so this is
  all very puzzling.

  Is nova-compute supposed to be totally stuck while deleting instance
  files ? Have you ever seen something similar ?

  I'm going to try to repro on queens.

  Thanks
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Jun 23 14:23 seq
   crw-rw---- 1 root audio 116, 33 Jun 23 14:23 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.18
  Architecture: ppc64el
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  Package: nova (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen-256color
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: root=UUID=bf574d52-09d7-4c9d-beb8-ccc542d50654 ro 
console=tty0 console=ttyS1,115200 panic=30 raid=noautodetect
  ProcLoadAvg: 10.43 10.79 9.76 11/2237 15123
  ProcSwaps:
   Filename                             Type            Size    Used    Priority
   /swap.img                               file         8388544 0       -1
  ProcVersion: Linux version 4.4.0-128-generic (buildd@bos02-ppc64el-001) (gcc 
version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) ) #154-Ubuntu SMP 
Fri May 25 14:13:59 UTC 2018
  ProcVersionSignature: Ubuntu 4.4.0-128.154-generic 4.4.131
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-128-generic N/A
   linux-backports-modules-4.4.0-128-generic  N/A
   linux-firmware                             1.157.20
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial uec-images xenial uec-images xenial uec-images
  Uname: Linux 4.4.0-128-generic ppc64le
  UnreportableReason: The report belongs to a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: False
  cpu_cores: Number of cores present = 20
  cpu_coreson: Number of cores online = 20
  cpu_dscr: DSCR is 0
  cpu_freq:
   min: 1.201 GHz (cpu 48)
   max: 3.710 GHz (cpu 112)
   avg: 2.615 GHz
  cpu_runmode:
   Could not retrieve current diagnostics mode,
   No kernel interface to firmware
  cpu_smt: SMT is off

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1766543/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to