subject:"\[Kernel\-packages\] \[Bug 1882623\] Re\: VM enter into hung status after triggering a crash"

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

2022-07-25 Thread Mahesh Saptasagar

Hi Team,

When kdumping on 4.15.18 kernel in an Azure instance, we observe quite
frequently a stall on the kdump kernel, it gets blocked and soon we see
a stack like the following:

INFO: rcu_sched detected stalls on CPUs/tasks:
1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
(detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) 
->state=0x402 ->cpu=1

We tried porting this change
https://lore.kernel.org/lkml/20171031221849.12117-1-mikel...@exchange.microsoft.com/
which seems to be the potential cause of above issue, but after this
regular azure instance is hanging during the boot.

Please let me know if I am missing any patches to solve this issue.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure-4.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1882623

Title:
  VM enter into hung status after triggering a crash

Status in linux-azure package in Ubuntu:
  Fix Released
Status in linux-azure-4.15 package in Ubuntu:
  Fix Released
Status in linux-azure source package in Bionic:
  Fix Released
Status in linux-azure-4.15 source package in Bionic:
  Fix Released

Bug description:
  [Impact]

  * When kdumping on trusty/4.15 in an Azure instance, we observe quite
  frequently a stall on the kdump kernel, it gets blocked and soon we
  see a stack like the following:

  [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
  [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
  [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
  [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1

  * By using the Azure serial console, we collected a sysrq-w when the issue 
happens:
  [  529.515013] sysrq: Show Blocked State
  [  529.517730]   taskPC stack   pid father
  [  529.519006] kworker/u4:2D094  2 0x8000
  [  529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? check_preempt_wakeup+0x162/0x260
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __synchronize_srcu.part.14+0x67/0x80
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  ? __switch_to_asm+0x41/0x70
  [  529.519006]  synchronize_srcu+0xd1/0xd6
  [  529.519006]  fsnotify_mark_destroy_workfn+0x6d/0xc0
  [  529.519006]  process_one_work+0x14e/0x390
  [  529.519006]  worker_thread+0x1cc/0x3d0
  [  529.519006]  kthread+0x105/0x140
  [  529.519006]  ? max_active_store+0x60/0x60
  [  529.519006]  ? kthread_bind+0x20/0x20
  [  529.519006]  ret_from_fork+0x35/0x40
  [  529.519006] udevadm D0   544  1 0x
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __flush_work.isra.29+0x119/0x1b0
  [  529.519006]  ? destroy_worker+0x90/0x90
  [  529.519006]  flush_delayed_work+0x3f/0x50
  [  529.519006]  fsnotify_wait_marks_destroyed+0x15/0x20
  [  529.519006]  fsnotify_destroy_group+0x4e/0xc0
  [  529.519006]  inotify_release+0x1e/0x50
  [  529.519006]  __fput+0xea/0x220
  [  529.519006]  fput+0xe/0x10
  [  529.519006]  task_work_run+0x8c/0xb0
  [  529.519006]  exit_to_usermode_loop+0x70/0xa9
  [  529.519006]  do_syscall_64+0x1b5/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  [  529.519006] dhclientD0   573572 0x
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? aa_profile_af_perm+0xb4/0xf0
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __wait_rcu_gp+0x123/0x150
  [  529.519006]  synchronize_sched+0x4e/0x60
  [  529.519006]  ? __call_rcu+0x2f0/0x2f0
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  synchronize_net+0x1c/0x30
  [  529.519006]  __unregister_prot_hook+0xcd/0xf0
  [  529.519006]  packet_do_bind+0x1bd/0x250
  [  529.519006]  packet_bind+0x2f/0x50
  [  529.519006]  SYSC_bind+0xd8/0x110
  [  529.519006]  ? sock_alloc_file+0x91/0x130
  [  529.519006]  SyS_bind+0xe/0x10
  [  529.519006]  do_syscall_64+0x80/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6

  * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce
  the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect
  led us to the

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

2022-02-23 Thread Krzysztof Kozlowski

** Changed in: linux-azure-4.15 (Ubuntu)
   Status: Fix Committed => Fix Released

** Changed in: linux-azure (Ubuntu Bionic)
   Status: Fix Committed => Fix Released

** Changed in: linux-azure (Ubuntu)
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure-4.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1882623

Title:
  VM enter into hung status after triggering a crash

Status in linux-azure package in Ubuntu:
  Fix Released
Status in linux-azure-4.15 package in Ubuntu:
  Fix Released
Status in linux-azure source package in Bionic:
  Fix Released
Status in linux-azure-4.15 source package in Bionic:
  Fix Released

Bug description:
  [Impact]

  * When kdumping on trusty/4.15 in an Azure instance, we observe quite
  frequently a stall on the kdump kernel, it gets blocked and soon we
  see a stack like the following:

  [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
  [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
  [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
  [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1

  * By using the Azure serial console, we collected a sysrq-w when the issue 
happens:
  [  529.515013] sysrq: Show Blocked State
  [  529.517730]   taskPC stack   pid father
  [  529.519006] kworker/u4:2D094  2 0x8000
  [  529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? check_preempt_wakeup+0x162/0x260
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __synchronize_srcu.part.14+0x67/0x80
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  ? __switch_to_asm+0x41/0x70
  [  529.519006]  synchronize_srcu+0xd1/0xd6
  [  529.519006]  fsnotify_mark_destroy_workfn+0x6d/0xc0
  [  529.519006]  process_one_work+0x14e/0x390
  [  529.519006]  worker_thread+0x1cc/0x3d0
  [  529.519006]  kthread+0x105/0x140
  [  529.519006]  ? max_active_store+0x60/0x60
  [  529.519006]  ? kthread_bind+0x20/0x20
  [  529.519006]  ret_from_fork+0x35/0x40
  [  529.519006] udevadm D0   544  1 0x
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __flush_work.isra.29+0x119/0x1b0
  [  529.519006]  ? destroy_worker+0x90/0x90
  [  529.519006]  flush_delayed_work+0x3f/0x50
  [  529.519006]  fsnotify_wait_marks_destroyed+0x15/0x20
  [  529.519006]  fsnotify_destroy_group+0x4e/0xc0
  [  529.519006]  inotify_release+0x1e/0x50
  [  529.519006]  __fput+0xea/0x220
  [  529.519006]  fput+0xe/0x10
  [  529.519006]  task_work_run+0x8c/0xb0
  [  529.519006]  exit_to_usermode_loop+0x70/0xa9
  [  529.519006]  do_syscall_64+0x1b5/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  [  529.519006] dhclientD0   573572 0x
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? aa_profile_af_perm+0xb4/0xf0
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __wait_rcu_gp+0x123/0x150
  [  529.519006]  synchronize_sched+0x4e/0x60
  [  529.519006]  ? __call_rcu+0x2f0/0x2f0
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  synchronize_net+0x1c/0x30
  [  529.519006]  __unregister_prot_hook+0xcd/0xf0
  [  529.519006]  packet_do_bind+0x1bd/0x250
  [  529.519006]  packet_bind+0x2f/0x50
  [  529.519006]  SYSC_bind+0xd8/0x110
  [  529.519006]  ? sock_alloc_file+0x91/0x130
  [  529.519006]  SyS_bind+0xe/0x10
  [  529.519006]  do_syscall_64+0x80/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6

  * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce
  the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect
  led us to the fix - the following patch, when backported to a
  problematic version, fixes the issue: d8e462e19305 ("Drivers: hv:
  vmbus: Implement Direct Mode for stimer0")

  * In Azure/Hyper-V, before the aforementioned commit, timer interrupts
  were passed to the hypervisor through a vmbus message, a mechanism of
  communication of hyper-v guests/hypervisor. With the patch, a check is
  made (through MSR-like mechanism) and if the hypervisor supports, a
  direct timer IRQ mechanism is put in-pla

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

2021-09-07 Thread Launchpad Bug Tracker

This bug was fixed in the package linux-azure-4.15 - 4.15.0-1123.136

---
linux-azure-4.15 (4.15.0-1123.136) bionic; urgency=medium

  * bionic/linux-azure-4.15: 4.15.0-1123.136 -proposed tracker (LP:
#1939816)

  * VM enter into hung status after triggering a crash (LP: #1882623)
- Drivers: hv: vmbus: Implement Direct Mode for stimer0

  [ Ubuntu: 4.15.0-156.163 ]

  * bionic/linux: 4.15.0-156.163 -proposed tracker (LP: #1940162)
  * linux (LP: #1940564)
- SAUCE: Revert "scsi: core: Cap scsi_host cmd_per_lun at can_queue"
  * fails to launch linux L2 guests on AMD (LP: #1940134) // CVE-2021-3653
- KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl
  (CVE-2021-3653)
  * fails to launch linux L2 guests on AMD (LP: #1940134)
- SAUCE: Revert "UBUNTU: SAUCE: KVM: nSVM: avoid picking up unsupported bits
  from L2 in int_ctl"

  [ Ubuntu: 4.15.0-155.162 ]

  * bionic/linux: 4.15.0-155.162 -proposed tracker (LP: #1939833)
  * Packaging resync (LP: #1786013)
- debian/dkms-versions -- update from kernel-versions (main/2021.08.16)
  * CVE-2021-3656
- SAUCE: KVM: nSVM: always intercept VMLOAD/VMSAVE when nested
  * CVE-2021-3653
- SAUCE: KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl
  * dev_forward_skb: do not scrub skb mark within the same name space
(LP: #1935040)
- dev_forward_skb: do not scrub skb mark within the same name space
  * 'ptrace trace' needed to readlink() /proc/*/ns/* files on older kernels
(LP: #1890848)
- apparmor: fix ptrace read check
  * Bionic update: upstream stable patchset 2021-08-03 (LP: #1938824)
- ALSA: usb-audio: fix rate on Ozone Z90 USB headset
- media: dvb-usb: fix wrong definition
- Input: usbtouchscreen - fix control-request directions
- net: can: ems_usb: fix use-after-free in ems_usb_disconnect()
- usb: gadget: eem: fix echo command packet response issue
- USB: cdc-acm: blacklist Heimann USB Appset device
- ntfs: fix validity check for file name attribute
- iov_iter_fault_in_readable() should do nothing in xarray case
- Input: joydev - prevent use of not validated data in JSIOCSBTNMAP ioctl
- ARM: dts: at91: sama5d4: fix pinctrl muxing
- btrfs: send: fix invalid path for unlink operations after parent
  orphanization
- btrfs: clear defrag status of a root if starting transaction fails
- ext4: cleanup in-core orphan list if ext4_truncate() failed to get a
  transaction handle
- ext4: fix kernel infoleak via ext4_extent_header
- ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit
- ext4: remove check for zero nr_to_scan in ext4_es_scan()
- ext4: fix avefreec in find_group_orlov
- ext4: use ext4_grp_locked_error in mb_find_extent
- can: gw: synchronize rcu operations before removing gw job entry
- can: peak_pciefd: pucan_handle_status(): fix a potential starvation issue 
in
  TX path
- SUNRPC: Fix the batch tasks count wraparound.
- SUNRPC: Should wake up the privileged task firstly.
- s390/cio: dont call css_wait_for_slow_path() inside a lock
- rtc: stm32: Fix unbalanced clk_disable_unprepare() on probe error path
- iio: ltr501: mark register holding upper 8 bits of ALS_DATA{0,1} and 
PS_DATA
  as volatile, too
- iio: ltr501: ltr559: fix initialization of LTR501_ALS_CONTR
- iio: ltr501: ltr501_read_ps(): add missing endianness conversion
- serial: sh-sci: Stop dmaengine transfer in sci_stop_tx()
- serial_cs: Add Option International GSM-Ready 56K/ISDN modem
- serial_cs: remove wrong GLOBETROTTER.cis entry
- ath9k: Fix kernel NULL pointer dereference during ath_reset_internal()
- ssb: sdio: Don't overwrite const buffer if block_write fails
- rsi: Assign beacon rate settings to the correct rate_info descriptor field
- seq_buf: Make trace_seq_putmem_hex() support data longer than 8
- fuse: check connected before queueing on fpq->io
- spi: Make of_register_spi_device also set the fwnode
- spi: spi-loopback-test: Fix 'tx_buf' might be 'rx_buf'
- spi: spi-topcliff-pch: Fix potential double free in
  pch_spi_process_messages()
- spi: omap-100k: Fix the length judgment problem
- crypto: nx - add missing MODULE_DEVICE_TABLE
- media: cpia2: fix memory leak in cpia2_usb_probe
- media: cobalt: fix race condition in setting HPD
- media: pvrusb2: fix warning in pvr2_i2c_core_done
- crypto: qat - check return code of qat_hal_rd_rel_reg()
- crypto: qat - remove unused macro in FW loader
- media: em28xx: Fix possible memory leak of em28xx struct
- media: v4l2-core: Avoid the dangling pointer in v4l2_fh_release
- media: bt8xx: Fix a missing check bug in bt878_probe
- media: st-hva: Fix potential NULL pointer dereferences
- media: dvd_usb: memory leak in cinergyt2_fe_attach
- mmc: via-sdmmc: add a check against NULL pointer dereference
- crypto: shash - avoid comparing pointe

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

2021-09-03 Thread Marcelo Cerri

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure-4.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1882623

Title:
  VM enter into hung status after triggering a crash

Status in linux-azure package in Ubuntu:
  In Progress
Status in linux-azure-4.15 package in Ubuntu:
  Fix Committed
Status in linux-azure source package in Bionic:
  Fix Committed
Status in linux-azure-4.15 source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

  * When kdumping on trusty/4.15 in an Azure instance, we observe quite
  frequently a stall on the kdump kernel, it gets blocked and soon we
  see a stack like the following:

  [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
  [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
  [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
  [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1

  * By using the Azure serial console, we collected a sysrq-w when the issue 
happens:
  [  529.515013] sysrq: Show Blocked State
  [  529.517730]   taskPC stack   pid father
  [  529.519006] kworker/u4:2D094  2 0x8000
  [  529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? check_preempt_wakeup+0x162/0x260
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __synchronize_srcu.part.14+0x67/0x80
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  ? __switch_to_asm+0x41/0x70
  [  529.519006]  synchronize_srcu+0xd1/0xd6
  [  529.519006]  fsnotify_mark_destroy_workfn+0x6d/0xc0
  [  529.519006]  process_one_work+0x14e/0x390
  [  529.519006]  worker_thread+0x1cc/0x3d0
  [  529.519006]  kthread+0x105/0x140
  [  529.519006]  ? max_active_store+0x60/0x60
  [  529.519006]  ? kthread_bind+0x20/0x20
  [  529.519006]  ret_from_fork+0x35/0x40
  [  529.519006] udevadm D0   544  1 0x
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __flush_work.isra.29+0x119/0x1b0
  [  529.519006]  ? destroy_worker+0x90/0x90
  [  529.519006]  flush_delayed_work+0x3f/0x50
  [  529.519006]  fsnotify_wait_marks_destroyed+0x15/0x20
  [  529.519006]  fsnotify_destroy_group+0x4e/0xc0
  [  529.519006]  inotify_release+0x1e/0x50
  [  529.519006]  __fput+0xea/0x220
  [  529.519006]  fput+0xe/0x10
  [  529.519006]  task_work_run+0x8c/0xb0
  [  529.519006]  exit_to_usermode_loop+0x70/0xa9
  [  529.519006]  do_syscall_64+0x1b5/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  [  529.519006] dhclientD0   573572 0x
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? aa_profile_af_perm+0xb4/0xf0
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __wait_rcu_gp+0x123/0x150
  [  529.519006]  synchronize_sched+0x4e/0x60
  [  529.519006]  ? __call_rcu+0x2f0/0x2f0
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  synchronize_net+0x1c/0x30
  [  529.519006]  __unregister_prot_hook+0xcd/0xf0
  [  529.519006]  packet_do_bind+0x1bd/0x250
  [  529.519006]  packet_bind+0x2f/0x50
  [  529.519006]  SYSC_bind+0xd8/0x110
  [  529.519006]  ? sock_alloc_file+0x91/0x130
  [  529.519006]  SyS_bind+0xe/0x10
  [  529.519006]  do_syscall_64+0x80/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6

  * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce
  the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect
  led us to the fix - the following patch, when backported to a
  problematic version, fixes the issue: d8e462e19305 ("Drivers: hv:
  vmbus: Implement Direct Mode for stimer0")

  * In Azure/Hyper-V, before the aforementioned commit, timer interrupts
  were passed to the hypervisor through a vmbus message, a mechanism of
  communication of hyper-v guests/hypervisor. With the patch, a check is
  made (through MSR-like mechanism) and if the hypervisor supports, a
  direct timer IRQ mechanism is put in-place instead of the vmbus
  channel.

  * Our theory is that on kdump kernel, specially due to the single cpu
  nature, the vmbus-messaged timer IRQ could interfere with s

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

2021-08-24 Thread Ubuntu Kernel Bot

This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure-4.15 in Ubuntu.
https://bugs.launchpad.net/bugs/1882623

Title:
  VM enter into hung status after triggering a crash

Status in linux-azure package in Ubuntu:
  In Progress
Status in linux-azure-4.15 package in Ubuntu:
  Fix Committed
Status in linux-azure source package in Bionic:
  Fix Committed
Status in linux-azure-4.15 source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

  * When kdumping on trusty/4.15 in an Azure instance, we observe quite
  frequently a stall on the kdump kernel, it gets blocked and soon we
  see a stack like the following:

  [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
  [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
  [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
  [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1

  * By using the Azure serial console, we collected a sysrq-w when the issue 
happens:
  [  529.515013] sysrq: Show Blocked State
  [  529.517730]   taskPC stack   pid father
  [  529.519006] kworker/u4:2D094  2 0x8000
  [  529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? check_preempt_wakeup+0x162/0x260
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __synchronize_srcu.part.14+0x67/0x80
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  ? __switch_to_asm+0x41/0x70
  [  529.519006]  synchronize_srcu+0xd1/0xd6
  [  529.519006]  fsnotify_mark_destroy_workfn+0x6d/0xc0
  [  529.519006]  process_one_work+0x14e/0x390
  [  529.519006]  worker_thread+0x1cc/0x3d0
  [  529.519006]  kthread+0x105/0x140
  [  529.519006]  ? max_active_store+0x60/0x60
  [  529.519006]  ? kthread_bind+0x20/0x20
  [  529.519006]  ret_from_fork+0x35/0x40
  [  529.519006] udevadm D0   544  1 0x
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __flush_work.isra.29+0x119/0x1b0
  [  529.519006]  ? destroy_worker+0x90/0x90
  [  529.519006]  flush_delayed_work+0x3f/0x50
  [  529.519006]  fsnotify_wait_marks_destroyed+0x15/0x20
  [  529.519006]  fsnotify_destroy_group+0x4e/0xc0
  [  529.519006]  inotify_release+0x1e/0x50
  [  529.519006]  __fput+0xea/0x220
  [  529.519006]  fput+0xe/0x10
  [  529.519006]  task_work_run+0x8c/0xb0
  [  529.519006]  exit_to_usermode_loop+0x70/0xa9
  [  529.519006]  do_syscall_64+0x1b5/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  [  529.519006] dhclientD0   573572 0x
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? aa_profile_af_perm+0xb4/0xf0
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __wait_rcu_gp+0x123/0x150
  [  529.519006]  synchronize_sched+0x4e/0x60
  [  529.519006]  ? __call_rcu+0x2f0/0x2f0
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  synchronize_net+0x1c/0x30
  [  529.519006]  __unregister_prot_hook+0xcd/0xf0
  [  529.519006]  packet_do_bind+0x1bd/0x250
  [  529.519006]  packet_bind+0x2f/0x50
  [  529.519006]  SYSC_bind+0xd8/0x110
  [  529.519006]  ? sock_alloc_file+0x91/0x130
  [  529.519006]  SyS_bind+0xe/0x10
  [  529.519006]  do_syscall_64+0x80/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6

  * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce
  the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect
  led us to the fix - the following patch, when backported to a
  problematic version, fixes the issue: d8e462e1930

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

2021-08-23 Thread Krzysztof Kozlowski

** Also affects: linux-azure-4.15 (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: linux-azure-4.15 (Ubuntu)
   Status: New => Fix Committed

** Changed in: linux-azure-4.15 (Ubuntu Bionic)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1882623

Title:
  VM enter into hung status after triggering a crash

Status in linux-azure package in Ubuntu:
  In Progress
Status in linux-azure-4.15 package in Ubuntu:
  Fix Committed
Status in linux-azure source package in Bionic:
  Fix Committed
Status in linux-azure-4.15 source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

  * When kdumping on trusty/4.15 in an Azure instance, we observe quite
  frequently a stall on the kdump kernel, it gets blocked and soon we
  see a stack like the following:

  [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
  [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
  [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
  [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1

  * By using the Azure serial console, we collected a sysrq-w when the issue 
happens:
  [  529.515013] sysrq: Show Blocked State
  [  529.517730]   taskPC stack   pid father
  [  529.519006] kworker/u4:2D094  2 0x8000
  [  529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? check_preempt_wakeup+0x162/0x260
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __synchronize_srcu.part.14+0x67/0x80
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  ? __switch_to_asm+0x41/0x70
  [  529.519006]  synchronize_srcu+0xd1/0xd6
  [  529.519006]  fsnotify_mark_destroy_workfn+0x6d/0xc0
  [  529.519006]  process_one_work+0x14e/0x390
  [  529.519006]  worker_thread+0x1cc/0x3d0
  [  529.519006]  kthread+0x105/0x140
  [  529.519006]  ? max_active_store+0x60/0x60
  [  529.519006]  ? kthread_bind+0x20/0x20
  [  529.519006]  ret_from_fork+0x35/0x40
  [  529.519006] udevadm D0   544  1 0x
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __flush_work.isra.29+0x119/0x1b0
  [  529.519006]  ? destroy_worker+0x90/0x90
  [  529.519006]  flush_delayed_work+0x3f/0x50
  [  529.519006]  fsnotify_wait_marks_destroyed+0x15/0x20
  [  529.519006]  fsnotify_destroy_group+0x4e/0xc0
  [  529.519006]  inotify_release+0x1e/0x50
  [  529.519006]  __fput+0xea/0x220
  [  529.519006]  fput+0xe/0x10
  [  529.519006]  task_work_run+0x8c/0xb0
  [  529.519006]  exit_to_usermode_loop+0x70/0xa9
  [  529.519006]  do_syscall_64+0x1b5/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  [  529.519006] dhclientD0   573572 0x
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? aa_profile_af_perm+0xb4/0xf0
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __wait_rcu_gp+0x123/0x150
  [  529.519006]  synchronize_sched+0x4e/0x60
  [  529.519006]  ? __call_rcu+0x2f0/0x2f0
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  synchronize_net+0x1c/0x30
  [  529.519006]  __unregister_prot_hook+0xcd/0xf0
  [  529.519006]  packet_do_bind+0x1bd/0x250
  [  529.519006]  packet_bind+0x2f/0x50
  [  529.519006]  SYSC_bind+0xd8/0x110
  [  529.519006]  ? sock_alloc_file+0x91/0x130
  [  529.519006]  SyS_bind+0xe/0x10
  [  529.519006]  do_syscall_64+0x80/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6

  * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce
  the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect
  led us to the fix - the following patch, when backported to a
  problematic version, fixes the issue: d8e462e19305 ("Drivers: hv:
  vmbus: Implement Direct Mode for stimer0")

  * In Azure/Hyper-V, before the aforementioned commit, timer interrupts
  were passed to the hypervisor through a vmbus message, a mechanism of
  communication of hyper-v guests/hypervisor. With the patch, a check is
  made (through MSR-like mechanism) and if the hypervisor supports, a
  direct timer IRQ mechanism is put in-place inste

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

2021-08-20 Thread Kelsey Skunberg

** Changed in: linux-azure (Ubuntu Bionic)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1882623

Title:
  VM enter into hung status after triggering a crash

Status in linux-azure package in Ubuntu:
  In Progress
Status in linux-azure source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

  * When kdumping on trusty/4.15 in an Azure instance, we observe quite
  frequently a stall on the kdump kernel, it gets blocked and soon we
  see a stack like the following:

  [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
  [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
  [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
  [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1

  * By using the Azure serial console, we collected a sysrq-w when the issue 
happens:
  [  529.515013] sysrq: Show Blocked State
  [  529.517730]   taskPC stack   pid father
  [  529.519006] kworker/u4:2D094  2 0x8000
  [  529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? check_preempt_wakeup+0x162/0x260
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __synchronize_srcu.part.14+0x67/0x80
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  ? __switch_to_asm+0x41/0x70
  [  529.519006]  synchronize_srcu+0xd1/0xd6
  [  529.519006]  fsnotify_mark_destroy_workfn+0x6d/0xc0
  [  529.519006]  process_one_work+0x14e/0x390
  [  529.519006]  worker_thread+0x1cc/0x3d0
  [  529.519006]  kthread+0x105/0x140
  [  529.519006]  ? max_active_store+0x60/0x60
  [  529.519006]  ? kthread_bind+0x20/0x20
  [  529.519006]  ret_from_fork+0x35/0x40
  [  529.519006] udevadm D0   544  1 0x
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  ? try_to_wake_up+0x4a/0x460
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __flush_work.isra.29+0x119/0x1b0
  [  529.519006]  ? destroy_worker+0x90/0x90
  [  529.519006]  flush_delayed_work+0x3f/0x50
  [  529.519006]  fsnotify_wait_marks_destroyed+0x15/0x20
  [  529.519006]  fsnotify_destroy_group+0x4e/0xc0
  [  529.519006]  inotify_release+0x1e/0x50
  [  529.519006]  __fput+0xea/0x220
  [  529.519006]  fput+0xe/0x10
  [  529.519006]  task_work_run+0x8c/0xb0
  [  529.519006]  exit_to_usermode_loop+0x70/0xa9
  [  529.519006]  do_syscall_64+0x1b5/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  [  529.519006] dhclientD0   573572 0x
  [  529.519006] Call Trace:
  [  529.519006]  __schedule+0x292/0x880
  [  529.519006]  schedule+0x36/0x80
  [  529.519006]  schedule_timeout+0x1d5/0x2f0
  [  529.519006]  ? aa_profile_af_perm+0xb4/0xf0
  [  529.519006]  wait_for_completion+0xa5/0x110
  [  529.519006]  ? wake_up_q+0x80/0x80
  [  529.519006]  __wait_rcu_gp+0x123/0x150
  [  529.519006]  synchronize_sched+0x4e/0x60
  [  529.519006]  ? __call_rcu+0x2f0/0x2f0
  [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
  [  529.519006]  synchronize_net+0x1c/0x30
  [  529.519006]  __unregister_prot_hook+0xcd/0xf0
  [  529.519006]  packet_do_bind+0x1bd/0x250
  [  529.519006]  packet_bind+0x2f/0x50
  [  529.519006]  SYSC_bind+0xd8/0x110
  [  529.519006]  ? sock_alloc_file+0x91/0x130
  [  529.519006]  SyS_bind+0xe/0x10
  [  529.519006]  do_syscall_64+0x80/0x1e0
  [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6

  * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce
  the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect
  led us to the fix - the following patch, when backported to a
  problematic version, fixes the issue: d8e462e19305 ("Drivers: hv:
  vmbus: Implement Direct Mode for stimer0")

  * In Azure/Hyper-V, before the aforementioned commit, timer interrupts
  were passed to the hypervisor through a vmbus message, a mechanism of
  communication of hyper-v guests/hypervisor. With the patch, a check is
  made (through MSR-like mechanism) and if the hypervisor supports, a
  direct timer IRQ mechanism is put in-place instead of the vmbus
  channel.

  * Our theory is that on kdump kernel, specially due to the single cpu
  nature, the vmbus-messaged timer IRQ could interfere with scheduling
  and create a dead-lock condition, which is what we observe from the
  stack traces. Hence, we hereby propose to backport

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

2021-07-23 Thread Guilherme G. Piccoli

** Description changed:

- Create VM on Azure using Canonical UbuntuServer 14.04.5-LTS latest using
- size Standard DS2 v2
+ [Impact]
  
- Install kexec-tools kdump-tools makedumpfile, configure boot kernel
- parameter crashkernel=512M
+ * When kdumping on trusty/4.15 in an Azure instance, we observe quite
+ frequently a stall on the kdump kernel, it gets blocked and soon we see
+ a stack like the following:
  
- Run sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools and
- sed -i 's/LOAD_KEXEC=true/LOAD_KEXEC=false/g' /etc/default/kexec
+ [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
+ [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
+ [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
+ [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
  
- Reboot VM, make sure crash kernel memory reserved successfully
+ * By using the Azure serial console, we collected a sysrq-w when the issue 
happens:
+ [  529.515013] sysrq: Show Blocked State
+ [  529.517730]   taskPC stack   pid father
+ [  529.519006] kworker/u4:2D094  2 0x8000
+ [  529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn
+ [  529.519006] Call Trace:
+ [  529.519006]  __schedule+0x292/0x880
+ [  529.519006]  schedule+0x36/0x80
+ [  529.519006]  schedule_timeout+0x1d5/0x2f0
+ [  529.519006]  ? check_preempt_wakeup+0x162/0x260
+ [  529.519006]  wait_for_completion+0xa5/0x110
+ [  529.519006]  ? wake_up_q+0x80/0x80
+ [  529.519006]  __synchronize_srcu.part.14+0x67/0x80
+ [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
+ [  529.519006]  ? __switch_to_asm+0x41/0x70
+ [  529.519006]  synchronize_srcu+0xd1/0xd6
+ [  529.519006]  fsnotify_mark_destroy_workfn+0x6d/0xc0
+ [  529.519006]  process_one_work+0x14e/0x390
+ [  529.519006]  worker_thread+0x1cc/0x3d0
+ [  529.519006]  kthread+0x105/0x140
+ [  529.519006]  ? max_active_store+0x60/0x60
+ [  529.519006]  ? kthread_bind+0x20/0x20
+ [  529.519006]  ret_from_fork+0x35/0x40
+ [  529.519006] udevadm D0   544  1 0x
+ [  529.519006] Call Trace:
+ [  529.519006]  __schedule+0x292/0x880
+ [  529.519006]  schedule+0x36/0x80
+ [  529.519006]  schedule_timeout+0x1d5/0x2f0
+ [  529.519006]  ? try_to_wake_up+0x4a/0x460
+ [  529.519006]  ? try_to_wake_up+0x4a/0x460
+ [  529.519006]  wait_for_completion+0xa5/0x110
+ [  529.519006]  ? wake_up_q+0x80/0x80
+ [  529.519006]  __flush_work.isra.29+0x119/0x1b0
+ [  529.519006]  ? destroy_worker+0x90/0x90
+ [  529.519006]  flush_delayed_work+0x3f/0x50
+ [  529.519006]  fsnotify_wait_marks_destroyed+0x15/0x20
+ [  529.519006]  fsnotify_destroy_group+0x4e/0xc0
+ [  529.519006]  inotify_release+0x1e/0x50
+ [  529.519006]  __fput+0xea/0x220
+ [  529.519006]  fput+0xe/0x10
+ [  529.519006]  task_work_run+0x8c/0xb0
+ [  529.519006]  exit_to_usermode_loop+0x70/0xa9
+ [  529.519006]  do_syscall_64+0x1b5/0x1e0
+ [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
+ [  529.519006] dhclientD0   573572 0x
+ [  529.519006] Call Trace:
+ [  529.519006]  __schedule+0x292/0x880
+ [  529.519006]  schedule+0x36/0x80
+ [  529.519006]  schedule_timeout+0x1d5/0x2f0
+ [  529.519006]  ? aa_profile_af_perm+0xb4/0xf0
+ [  529.519006]  wait_for_completion+0xa5/0x110
+ [  529.519006]  ? wake_up_q+0x80/0x80
+ [  529.519006]  __wait_rcu_gp+0x123/0x150
+ [  529.519006]  synchronize_sched+0x4e/0x60
+ [  529.519006]  ? __call_rcu+0x2f0/0x2f0
+ [  529.519006]  ? trace_raw_output_rcu_utilization+0x50/0x50
+ [  529.519006]  synchronize_net+0x1c/0x30
+ [  529.519006]  __unregister_prot_hook+0xcd/0xf0
+ [  529.519006]  packet_do_bind+0x1bd/0x250
+ [  529.519006]  packet_bind+0x2f/0x50
+ [  529.519006]  SYSC_bind+0xd8/0x110
+ [  529.519006]  ? sock_alloc_file+0x91/0x130
+ [  529.519006]  SyS_bind+0xe/0x10
+ [  529.519006]  do_syscall_64+0x80/0x1e0
+ [  529.519006]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
  
- Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c >
- /proc/sysrq-trigger, VM will reboot automatically and after reboot,
- check crash dump is generated under /var/crash
+ * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce
+ the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect led
+ us to the fix - the following patch, when backported to a problematic
+ version, fixes the issue: d8e462e19305 ("Drivers: hv: vmbus: Implement
+ Direct Mode for stimer0")
  
- Install linux-azure kernel
+ * In Azure/Hyper-V, before the aforementioned commit, timer interrupts
+ were passed to the hypervisor through a vmbus message, a mechanism of
+ communication of hyper-v guests/hypervisor. With the patch, a check is
+ made (through MSR-like mechanism) and if the hypervisor supports, a
+ direct timer IRQ mechanism is put in-place instead of the vmbus channel.
  
- Enable private-ppa canonical-kernel-esm
+ * Our theory is that on

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

2021-01-22 Thread Guilherme G. Piccoli

** Changed in: linux-azure (Ubuntu)
   Status: Confirmed => In Progress

** Changed in: linux-azure (Ubuntu)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1882623

Title:
  VM enter into hung status after triggering a crash

Status in linux-azure package in Ubuntu:
  In Progress

Bug description:
  Create VM on Azure using Canonical UbuntuServer 14.04.5-LTS latest
  using size Standard DS2 v2

  Install kexec-tools kdump-tools makedumpfile, configure boot kernel
  parameter crashkernel=512M

  Run sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools and
  sed -i 's/LOAD_KEXEC=true/LOAD_KEXEC=false/g' /etc/default/kexec

  Reboot VM, make sure crash kernel memory reserved successfully

  Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c >
  /proc/sysrq-trigger, VM will reboot automatically and after reboot,
  check crash dump is generated under /var/crash

  Install linux-azure kernel

  Enable private-ppa canonical-kernel-esm

  Install kernel linux-azure, reboot VM, kernel version
  4.15.0-1084-azure

  Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c >
  /proc/sysrq-trigger, VM entered into a hung status.

  Attach whole serial console
  [   65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
  [   65.456004]  1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
  [   65.456004]  (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
  [   65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
  [  360.020026] INFO: rcu_sched detected stalls on CPUs/tasks:
  [  360.024015]  1-...!: (10 GPs behind) idle=b34/0/0 softirq=1/1 fqs=1
  [  360.024015]  (detected by 0, t=15002 jiffies, g=717, c=716, q=6429)
  [  360.024015] rcu_sched kthread starved for 15000 jiffies! g717 c716 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
  [  420.048010] INFO: rcu_sched detected stalls on CPUs/tasks:
  [  420.052006]  1-...!: (0 ticks this GP) idle=f94/0/0 softirq=1/1 fqs=0
  [  420.052006]  (detected by 0, t=15002 jiffies, g=718, c=717, q=6429)
  [  420.052006] rcu_sched kthread starved for 15002 jiffies! g718 c717 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1882623/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

2020-11-24 Thread Launchpad Bug Tracker

Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: linux-azure (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1882623

Title:
  VM enter into hung status after triggering a crash

Status in linux-azure package in Ubuntu:
  Confirmed

Bug description:
  Create VM on Azure using Canonical UbuntuServer 14.04.5-LTS latest
  using size Standard DS2 v2

  Install kexec-tools kdump-tools makedumpfile, configure boot kernel
  parameter crashkernel=512M

  Run sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools and
  sed -i 's/LOAD_KEXEC=true/LOAD_KEXEC=false/g' /etc/default/kexec

  Reboot VM, make sure crash kernel memory reserved successfully

  Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c >
  /proc/sysrq-trigger, VM will reboot automatically and after reboot,
  check crash dump is generated under /var/crash

  Install linux-azure kernel

  Enable private-ppa canonical-kernel-esm

  Install kernel linux-azure, reboot VM, kernel version
  4.15.0-1084-azure

  Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c >
  /proc/sysrq-trigger, VM entered into a hung status.

  Attach whole serial console
  [   65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
  [   65.456004]  1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
  [   65.456004]  (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
  [   65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
  [  360.020026] INFO: rcu_sched detected stalls on CPUs/tasks:
  [  360.024015]  1-...!: (10 GPs behind) idle=b34/0/0 softirq=1/1 fqs=1
  [  360.024015]  (detected by 0, t=15002 jiffies, g=717, c=716, q=6429)
  [  360.024015] rcu_sched kthread starved for 15000 jiffies! g717 c716 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
  [  420.048010] INFO: rcu_sched detected stalls on CPUs/tasks:
  [  420.052006]  1-...!: (0 ticks this GP) idle=f94/0/0 softirq=1/1 fqs=0
  [  420.052006]  (detected by 0, t=15002 jiffies, g=718, c=717, q=6429)
  [  420.052006] rcu_sched kthread starved for 15002 jiffies! g718 c717 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1882623/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

2020-07-13 Thread lilideng

Last time I did verification against gallery image, the default kernel version 
is 4.4.0-148-generic.
And I tried below kernel, found the kernel linux-image-4.15.0-1042-azure is the 
first one which we hit this bug, it is not specific for the esm kernel.

#   Kernel Version/Package  Result
0   Default kernel 4.4.0-148-generic => Good
1   linux-image-4.15.0-1023-azure => Good
2   linux-image-4.15.0-1030-azure   
3   linux-image-4.15.0-1031-azure   
4   linux-image-4.15.0-1032-azure   
5   linux-image-4.15.0-1035-azure   
6   linux-image-4.15.0-1036-azure => Good 
7   linux-image-4.15.0-1037-azure   
8   linux-image-4.15.0-1039-azure   
9   linux-image-4.15.0-1040-azure => Good
10  linux-image-4.15.0-1041-azure => Good
11  linux-image-4.15.0-1042-azure => Bad
12  linux-image-4.15.0-1045-azure => Bad

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1882623

Title:
  VM enter into hung status after triggering a crash

Status in linux-azure package in Ubuntu:
  New

Bug description:
  Create VM on Azure using Canonical UbuntuServer 14.04.5-LTS latest
  using size Standard DS2 v2

  Install kexec-tools kdump-tools makedumpfile, configure boot kernel
  parameter crashkernel=512M

  Run sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools and
  sed -i 's/LOAD_KEXEC=true/LOAD_KEXEC=false/g' /etc/default/kexec

  Reboot VM, make sure crash kernel memory reserved successfully

  Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c >
  /proc/sysrq-trigger, VM will reboot automatically and after reboot,
  check crash dump is generated under /var/crash

  Install linux-azure kernel

  Enable private-ppa canonical-kernel-esm

  Install kernel linux-azure, reboot VM, kernel version
  4.15.0-1084-azure

  Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c >
  /proc/sysrq-trigger, VM entered into a hung status.

  Attach whole serial console
  [   65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
  [   65.456004]  1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
  [   65.456004]  (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
  [   65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
  [  360.020026] INFO: rcu_sched detected stalls on CPUs/tasks:
  [  360.024015]  1-...!: (10 GPs behind) idle=b34/0/0 softirq=1/1 fqs=1
  [  360.024015]  (detected by 0, t=15002 jiffies, g=717, c=716, q=6429)
  [  360.024015] rcu_sched kthread starved for 15000 jiffies! g717 c716 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
  [  420.048010] INFO: rcu_sched detected stalls on CPUs/tasks:
  [  420.052006]  1-...!: (0 ticks this GP) idle=f94/0/0 softirq=1/1 fqs=0
  [  420.052006]  (detected by 0, t=15002 jiffies, g=718, c=717, q=6429)
  [  420.052006] rcu_sched kthread starved for 15002 jiffies! g718 c717 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1882623/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash

11 matches

Site Navigation

Mail list logo

Footer information