[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash
Hi Team, When kdumping on 4.15.18 kernel in an Azure instance, we observe quite frequently a stall on the kdump kernel, it gets blocked and soon we see a stack like the following: INFO: rcu_sched detected stalls on CPUs/tasks: 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 We tried porting this change https://lore.kernel.org/lkml/20171031221849.12117-1-mikel...@exchange.microsoft.com/ which seems to be the potential cause of above issue, but after this regular azure instance is hanging during the boot. Please let me know if I am missing any patches to solve this issue. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure-4.15 in Ubuntu. https://bugs.launchpad.net/bugs/1882623 Title: VM enter into hung status after triggering a crash Status in linux-azure package in Ubuntu: Fix Released Status in linux-azure-4.15 package in Ubuntu: Fix Released Status in linux-azure source package in Bionic: Fix Released Status in linux-azure-4.15 source package in Bionic: Fix Released Bug description: [Impact] * When kdumping on trusty/4.15 in an Azure instance, we observe quite frequently a stall on the kdump kernel, it gets blocked and soon we see a stack like the following: [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks: [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 * By using the Azure serial console, we collected a sysrq-w when the issue happens: [ 529.515013] sysrq: Show Blocked State [ 529.517730] taskPC stack pid father [ 529.519006] kworker/u4:2D094 2 0x8000 [ 529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? check_preempt_wakeup+0x162/0x260 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __synchronize_srcu.part.14+0x67/0x80 [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 [ 529.519006] ? __switch_to_asm+0x41/0x70 [ 529.519006] synchronize_srcu+0xd1/0xd6 [ 529.519006] fsnotify_mark_destroy_workfn+0x6d/0xc0 [ 529.519006] process_one_work+0x14e/0x390 [ 529.519006] worker_thread+0x1cc/0x3d0 [ 529.519006] kthread+0x105/0x140 [ 529.519006] ? max_active_store+0x60/0x60 [ 529.519006] ? kthread_bind+0x20/0x20 [ 529.519006] ret_from_fork+0x35/0x40 [ 529.519006] udevadm D0 544 1 0x [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? try_to_wake_up+0x4a/0x460 [ 529.519006] ? try_to_wake_up+0x4a/0x460 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __flush_work.isra.29+0x119/0x1b0 [ 529.519006] ? destroy_worker+0x90/0x90 [ 529.519006] flush_delayed_work+0x3f/0x50 [ 529.519006] fsnotify_wait_marks_destroyed+0x15/0x20 [ 529.519006] fsnotify_destroy_group+0x4e/0xc0 [ 529.519006] inotify_release+0x1e/0x50 [ 529.519006] __fput+0xea/0x220 [ 529.519006] fput+0xe/0x10 [ 529.519006] task_work_run+0x8c/0xb0 [ 529.519006] exit_to_usermode_loop+0x70/0xa9 [ 529.519006] do_syscall_64+0x1b5/0x1e0 [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 [ 529.519006] dhclientD0 573572 0x [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? aa_profile_af_perm+0xb4/0xf0 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __wait_rcu_gp+0x123/0x150 [ 529.519006] synchronize_sched+0x4e/0x60 [ 529.519006] ? __call_rcu+0x2f0/0x2f0 [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 [ 529.519006] synchronize_net+0x1c/0x30 [ 529.519006] __unregister_prot_hook+0xcd/0xf0 [ 529.519006] packet_do_bind+0x1bd/0x250 [ 529.519006] packet_bind+0x2f/0x50 [ 529.519006] SYSC_bind+0xd8/0x110 [ 529.519006] ? sock_alloc_file+0x91/0x130 [ 529.519006] SyS_bind+0xe/0x10 [ 529.519006] do_syscall_64+0x80/0x1e0 [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect led us to the
[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash
** Changed in: linux-azure-4.15 (Ubuntu) Status: Fix Committed => Fix Released ** Changed in: linux-azure (Ubuntu Bionic) Status: Fix Committed => Fix Released ** Changed in: linux-azure (Ubuntu) Status: In Progress => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure-4.15 in Ubuntu. https://bugs.launchpad.net/bugs/1882623 Title: VM enter into hung status after triggering a crash Status in linux-azure package in Ubuntu: Fix Released Status in linux-azure-4.15 package in Ubuntu: Fix Released Status in linux-azure source package in Bionic: Fix Released Status in linux-azure-4.15 source package in Bionic: Fix Released Bug description: [Impact] * When kdumping on trusty/4.15 in an Azure instance, we observe quite frequently a stall on the kdump kernel, it gets blocked and soon we see a stack like the following: [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks: [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 * By using the Azure serial console, we collected a sysrq-w when the issue happens: [ 529.515013] sysrq: Show Blocked State [ 529.517730] taskPC stack pid father [ 529.519006] kworker/u4:2D094 2 0x8000 [ 529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? check_preempt_wakeup+0x162/0x260 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __synchronize_srcu.part.14+0x67/0x80 [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 [ 529.519006] ? __switch_to_asm+0x41/0x70 [ 529.519006] synchronize_srcu+0xd1/0xd6 [ 529.519006] fsnotify_mark_destroy_workfn+0x6d/0xc0 [ 529.519006] process_one_work+0x14e/0x390 [ 529.519006] worker_thread+0x1cc/0x3d0 [ 529.519006] kthread+0x105/0x140 [ 529.519006] ? max_active_store+0x60/0x60 [ 529.519006] ? kthread_bind+0x20/0x20 [ 529.519006] ret_from_fork+0x35/0x40 [ 529.519006] udevadm D0 544 1 0x [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? try_to_wake_up+0x4a/0x460 [ 529.519006] ? try_to_wake_up+0x4a/0x460 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __flush_work.isra.29+0x119/0x1b0 [ 529.519006] ? destroy_worker+0x90/0x90 [ 529.519006] flush_delayed_work+0x3f/0x50 [ 529.519006] fsnotify_wait_marks_destroyed+0x15/0x20 [ 529.519006] fsnotify_destroy_group+0x4e/0xc0 [ 529.519006] inotify_release+0x1e/0x50 [ 529.519006] __fput+0xea/0x220 [ 529.519006] fput+0xe/0x10 [ 529.519006] task_work_run+0x8c/0xb0 [ 529.519006] exit_to_usermode_loop+0x70/0xa9 [ 529.519006] do_syscall_64+0x1b5/0x1e0 [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 [ 529.519006] dhclientD0 573572 0x [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? aa_profile_af_perm+0xb4/0xf0 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __wait_rcu_gp+0x123/0x150 [ 529.519006] synchronize_sched+0x4e/0x60 [ 529.519006] ? __call_rcu+0x2f0/0x2f0 [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 [ 529.519006] synchronize_net+0x1c/0x30 [ 529.519006] __unregister_prot_hook+0xcd/0xf0 [ 529.519006] packet_do_bind+0x1bd/0x250 [ 529.519006] packet_bind+0x2f/0x50 [ 529.519006] SYSC_bind+0xd8/0x110 [ 529.519006] ? sock_alloc_file+0x91/0x130 [ 529.519006] SyS_bind+0xe/0x10 [ 529.519006] do_syscall_64+0x80/0x1e0 [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect led us to the fix - the following patch, when backported to a problematic version, fixes the issue: d8e462e19305 ("Drivers: hv: vmbus: Implement Direct Mode for stimer0") * In Azure/Hyper-V, before the aforementioned commit, timer interrupts were passed to the hypervisor through a vmbus message, a mechanism of communication of hyper-v guests/hypervisor. With the patch, a check is made (through MSR-like mechanism) and if the hypervisor supports, a direct timer IRQ mechanism is put in-pla
[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash
This bug was fixed in the package linux-azure-4.15 - 4.15.0-1123.136 --- linux-azure-4.15 (4.15.0-1123.136) bionic; urgency=medium * bionic/linux-azure-4.15: 4.15.0-1123.136 -proposed tracker (LP: #1939816) * VM enter into hung status after triggering a crash (LP: #1882623) - Drivers: hv: vmbus: Implement Direct Mode for stimer0 [ Ubuntu: 4.15.0-156.163 ] * bionic/linux: 4.15.0-156.163 -proposed tracker (LP: #1940162) * linux (LP: #1940564) - SAUCE: Revert "scsi: core: Cap scsi_host cmd_per_lun at can_queue" * fails to launch linux L2 guests on AMD (LP: #1940134) // CVE-2021-3653 - KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653) * fails to launch linux L2 guests on AMD (LP: #1940134) - SAUCE: Revert "UBUNTU: SAUCE: KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl" [ Ubuntu: 4.15.0-155.162 ] * bionic/linux: 4.15.0-155.162 -proposed tracker (LP: #1939833) * Packaging resync (LP: #1786013) - debian/dkms-versions -- update from kernel-versions (main/2021.08.16) * CVE-2021-3656 - SAUCE: KVM: nSVM: always intercept VMLOAD/VMSAVE when nested * CVE-2021-3653 - SAUCE: KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl * dev_forward_skb: do not scrub skb mark within the same name space (LP: #1935040) - dev_forward_skb: do not scrub skb mark within the same name space * 'ptrace trace' needed to readlink() /proc/*/ns/* files on older kernels (LP: #1890848) - apparmor: fix ptrace read check * Bionic update: upstream stable patchset 2021-08-03 (LP: #1938824) - ALSA: usb-audio: fix rate on Ozone Z90 USB headset - media: dvb-usb: fix wrong definition - Input: usbtouchscreen - fix control-request directions - net: can: ems_usb: fix use-after-free in ems_usb_disconnect() - usb: gadget: eem: fix echo command packet response issue - USB: cdc-acm: blacklist Heimann USB Appset device - ntfs: fix validity check for file name attribute - iov_iter_fault_in_readable() should do nothing in xarray case - Input: joydev - prevent use of not validated data in JSIOCSBTNMAP ioctl - ARM: dts: at91: sama5d4: fix pinctrl muxing - btrfs: send: fix invalid path for unlink operations after parent orphanization - btrfs: clear defrag status of a root if starting transaction fails - ext4: cleanup in-core orphan list if ext4_truncate() failed to get a transaction handle - ext4: fix kernel infoleak via ext4_extent_header - ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit - ext4: remove check for zero nr_to_scan in ext4_es_scan() - ext4: fix avefreec in find_group_orlov - ext4: use ext4_grp_locked_error in mb_find_extent - can: gw: synchronize rcu operations before removing gw job entry - can: peak_pciefd: pucan_handle_status(): fix a potential starvation issue in TX path - SUNRPC: Fix the batch tasks count wraparound. - SUNRPC: Should wake up the privileged task firstly. - s390/cio: dont call css_wait_for_slow_path() inside a lock - rtc: stm32: Fix unbalanced clk_disable_unprepare() on probe error path - iio: ltr501: mark register holding upper 8 bits of ALS_DATA{0,1} and PS_DATA as volatile, too - iio: ltr501: ltr559: fix initialization of LTR501_ALS_CONTR - iio: ltr501: ltr501_read_ps(): add missing endianness conversion - serial: sh-sci: Stop dmaengine transfer in sci_stop_tx() - serial_cs: Add Option International GSM-Ready 56K/ISDN modem - serial_cs: remove wrong GLOBETROTTER.cis entry - ath9k: Fix kernel NULL pointer dereference during ath_reset_internal() - ssb: sdio: Don't overwrite const buffer if block_write fails - rsi: Assign beacon rate settings to the correct rate_info descriptor field - seq_buf: Make trace_seq_putmem_hex() support data longer than 8 - fuse: check connected before queueing on fpq->io - spi: Make of_register_spi_device also set the fwnode - spi: spi-loopback-test: Fix 'tx_buf' might be 'rx_buf' - spi: spi-topcliff-pch: Fix potential double free in pch_spi_process_messages() - spi: omap-100k: Fix the length judgment problem - crypto: nx - add missing MODULE_DEVICE_TABLE - media: cpia2: fix memory leak in cpia2_usb_probe - media: cobalt: fix race condition in setting HPD - media: pvrusb2: fix warning in pvr2_i2c_core_done - crypto: qat - check return code of qat_hal_rd_rel_reg() - crypto: qat - remove unused macro in FW loader - media: em28xx: Fix possible memory leak of em28xx struct - media: v4l2-core: Avoid the dangling pointer in v4l2_fh_release - media: bt8xx: Fix a missing check bug in bt878_probe - media: st-hva: Fix potential NULL pointer dereferences - media: dvd_usb: memory leak in cinergyt2_fe_attach - mmc: via-sdmmc: add a check against NULL pointer dereference - crypto: shash - avoid comparing pointe
[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash
** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure-4.15 in Ubuntu. https://bugs.launchpad.net/bugs/1882623 Title: VM enter into hung status after triggering a crash Status in linux-azure package in Ubuntu: In Progress Status in linux-azure-4.15 package in Ubuntu: Fix Committed Status in linux-azure source package in Bionic: Fix Committed Status in linux-azure-4.15 source package in Bionic: Fix Committed Bug description: [Impact] * When kdumping on trusty/4.15 in an Azure instance, we observe quite frequently a stall on the kdump kernel, it gets blocked and soon we see a stack like the following: [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks: [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 * By using the Azure serial console, we collected a sysrq-w when the issue happens: [ 529.515013] sysrq: Show Blocked State [ 529.517730] taskPC stack pid father [ 529.519006] kworker/u4:2D094 2 0x8000 [ 529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? check_preempt_wakeup+0x162/0x260 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __synchronize_srcu.part.14+0x67/0x80 [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 [ 529.519006] ? __switch_to_asm+0x41/0x70 [ 529.519006] synchronize_srcu+0xd1/0xd6 [ 529.519006] fsnotify_mark_destroy_workfn+0x6d/0xc0 [ 529.519006] process_one_work+0x14e/0x390 [ 529.519006] worker_thread+0x1cc/0x3d0 [ 529.519006] kthread+0x105/0x140 [ 529.519006] ? max_active_store+0x60/0x60 [ 529.519006] ? kthread_bind+0x20/0x20 [ 529.519006] ret_from_fork+0x35/0x40 [ 529.519006] udevadm D0 544 1 0x [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? try_to_wake_up+0x4a/0x460 [ 529.519006] ? try_to_wake_up+0x4a/0x460 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __flush_work.isra.29+0x119/0x1b0 [ 529.519006] ? destroy_worker+0x90/0x90 [ 529.519006] flush_delayed_work+0x3f/0x50 [ 529.519006] fsnotify_wait_marks_destroyed+0x15/0x20 [ 529.519006] fsnotify_destroy_group+0x4e/0xc0 [ 529.519006] inotify_release+0x1e/0x50 [ 529.519006] __fput+0xea/0x220 [ 529.519006] fput+0xe/0x10 [ 529.519006] task_work_run+0x8c/0xb0 [ 529.519006] exit_to_usermode_loop+0x70/0xa9 [ 529.519006] do_syscall_64+0x1b5/0x1e0 [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 [ 529.519006] dhclientD0 573572 0x [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? aa_profile_af_perm+0xb4/0xf0 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __wait_rcu_gp+0x123/0x150 [ 529.519006] synchronize_sched+0x4e/0x60 [ 529.519006] ? __call_rcu+0x2f0/0x2f0 [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 [ 529.519006] synchronize_net+0x1c/0x30 [ 529.519006] __unregister_prot_hook+0xcd/0xf0 [ 529.519006] packet_do_bind+0x1bd/0x250 [ 529.519006] packet_bind+0x2f/0x50 [ 529.519006] SYSC_bind+0xd8/0x110 [ 529.519006] ? sock_alloc_file+0x91/0x130 [ 529.519006] SyS_bind+0xe/0x10 [ 529.519006] do_syscall_64+0x80/0x1e0 [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect led us to the fix - the following patch, when backported to a problematic version, fixes the issue: d8e462e19305 ("Drivers: hv: vmbus: Implement Direct Mode for stimer0") * In Azure/Hyper-V, before the aforementioned commit, timer interrupts were passed to the hypervisor through a vmbus message, a mechanism of communication of hyper-v guests/hypervisor. With the patch, a check is made (through MSR-like mechanism) and if the hypervisor supports, a direct timer IRQ mechanism is put in-place instead of the vmbus channel. * Our theory is that on kdump kernel, specially due to the single cpu nature, the vmbus-messaged timer IRQ could interfere with s
[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed- bionic'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure-4.15 in Ubuntu. https://bugs.launchpad.net/bugs/1882623 Title: VM enter into hung status after triggering a crash Status in linux-azure package in Ubuntu: In Progress Status in linux-azure-4.15 package in Ubuntu: Fix Committed Status in linux-azure source package in Bionic: Fix Committed Status in linux-azure-4.15 source package in Bionic: Fix Committed Bug description: [Impact] * When kdumping on trusty/4.15 in an Azure instance, we observe quite frequently a stall on the kdump kernel, it gets blocked and soon we see a stack like the following: [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks: [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 * By using the Azure serial console, we collected a sysrq-w when the issue happens: [ 529.515013] sysrq: Show Blocked State [ 529.517730] taskPC stack pid father [ 529.519006] kworker/u4:2D094 2 0x8000 [ 529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? check_preempt_wakeup+0x162/0x260 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __synchronize_srcu.part.14+0x67/0x80 [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 [ 529.519006] ? __switch_to_asm+0x41/0x70 [ 529.519006] synchronize_srcu+0xd1/0xd6 [ 529.519006] fsnotify_mark_destroy_workfn+0x6d/0xc0 [ 529.519006] process_one_work+0x14e/0x390 [ 529.519006] worker_thread+0x1cc/0x3d0 [ 529.519006] kthread+0x105/0x140 [ 529.519006] ? max_active_store+0x60/0x60 [ 529.519006] ? kthread_bind+0x20/0x20 [ 529.519006] ret_from_fork+0x35/0x40 [ 529.519006] udevadm D0 544 1 0x [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? try_to_wake_up+0x4a/0x460 [ 529.519006] ? try_to_wake_up+0x4a/0x460 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __flush_work.isra.29+0x119/0x1b0 [ 529.519006] ? destroy_worker+0x90/0x90 [ 529.519006] flush_delayed_work+0x3f/0x50 [ 529.519006] fsnotify_wait_marks_destroyed+0x15/0x20 [ 529.519006] fsnotify_destroy_group+0x4e/0xc0 [ 529.519006] inotify_release+0x1e/0x50 [ 529.519006] __fput+0xea/0x220 [ 529.519006] fput+0xe/0x10 [ 529.519006] task_work_run+0x8c/0xb0 [ 529.519006] exit_to_usermode_loop+0x70/0xa9 [ 529.519006] do_syscall_64+0x1b5/0x1e0 [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 [ 529.519006] dhclientD0 573572 0x [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? aa_profile_af_perm+0xb4/0xf0 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __wait_rcu_gp+0x123/0x150 [ 529.519006] synchronize_sched+0x4e/0x60 [ 529.519006] ? __call_rcu+0x2f0/0x2f0 [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 [ 529.519006] synchronize_net+0x1c/0x30 [ 529.519006] __unregister_prot_hook+0xcd/0xf0 [ 529.519006] packet_do_bind+0x1bd/0x250 [ 529.519006] packet_bind+0x2f/0x50 [ 529.519006] SYSC_bind+0xd8/0x110 [ 529.519006] ? sock_alloc_file+0x91/0x130 [ 529.519006] SyS_bind+0xe/0x10 [ 529.519006] do_syscall_64+0x80/0x1e0 [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect led us to the fix - the following patch, when backported to a problematic version, fixes the issue: d8e462e1930
[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash
** Also affects: linux-azure-4.15 (Ubuntu) Importance: Undecided Status: New ** Changed in: linux-azure-4.15 (Ubuntu) Status: New => Fix Committed ** Changed in: linux-azure-4.15 (Ubuntu Bionic) Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1882623 Title: VM enter into hung status after triggering a crash Status in linux-azure package in Ubuntu: In Progress Status in linux-azure-4.15 package in Ubuntu: Fix Committed Status in linux-azure source package in Bionic: Fix Committed Status in linux-azure-4.15 source package in Bionic: Fix Committed Bug description: [Impact] * When kdumping on trusty/4.15 in an Azure instance, we observe quite frequently a stall on the kdump kernel, it gets blocked and soon we see a stack like the following: [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks: [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 * By using the Azure serial console, we collected a sysrq-w when the issue happens: [ 529.515013] sysrq: Show Blocked State [ 529.517730] taskPC stack pid father [ 529.519006] kworker/u4:2D094 2 0x8000 [ 529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? check_preempt_wakeup+0x162/0x260 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __synchronize_srcu.part.14+0x67/0x80 [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 [ 529.519006] ? __switch_to_asm+0x41/0x70 [ 529.519006] synchronize_srcu+0xd1/0xd6 [ 529.519006] fsnotify_mark_destroy_workfn+0x6d/0xc0 [ 529.519006] process_one_work+0x14e/0x390 [ 529.519006] worker_thread+0x1cc/0x3d0 [ 529.519006] kthread+0x105/0x140 [ 529.519006] ? max_active_store+0x60/0x60 [ 529.519006] ? kthread_bind+0x20/0x20 [ 529.519006] ret_from_fork+0x35/0x40 [ 529.519006] udevadm D0 544 1 0x [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? try_to_wake_up+0x4a/0x460 [ 529.519006] ? try_to_wake_up+0x4a/0x460 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __flush_work.isra.29+0x119/0x1b0 [ 529.519006] ? destroy_worker+0x90/0x90 [ 529.519006] flush_delayed_work+0x3f/0x50 [ 529.519006] fsnotify_wait_marks_destroyed+0x15/0x20 [ 529.519006] fsnotify_destroy_group+0x4e/0xc0 [ 529.519006] inotify_release+0x1e/0x50 [ 529.519006] __fput+0xea/0x220 [ 529.519006] fput+0xe/0x10 [ 529.519006] task_work_run+0x8c/0xb0 [ 529.519006] exit_to_usermode_loop+0x70/0xa9 [ 529.519006] do_syscall_64+0x1b5/0x1e0 [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 [ 529.519006] dhclientD0 573572 0x [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? aa_profile_af_perm+0xb4/0xf0 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __wait_rcu_gp+0x123/0x150 [ 529.519006] synchronize_sched+0x4e/0x60 [ 529.519006] ? __call_rcu+0x2f0/0x2f0 [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 [ 529.519006] synchronize_net+0x1c/0x30 [ 529.519006] __unregister_prot_hook+0xcd/0xf0 [ 529.519006] packet_do_bind+0x1bd/0x250 [ 529.519006] packet_bind+0x2f/0x50 [ 529.519006] SYSC_bind+0xd8/0x110 [ 529.519006] ? sock_alloc_file+0x91/0x130 [ 529.519006] SyS_bind+0xe/0x10 [ 529.519006] do_syscall_64+0x80/0x1e0 [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect led us to the fix - the following patch, when backported to a problematic version, fixes the issue: d8e462e19305 ("Drivers: hv: vmbus: Implement Direct Mode for stimer0") * In Azure/Hyper-V, before the aforementioned commit, timer interrupts were passed to the hypervisor through a vmbus message, a mechanism of communication of hyper-v guests/hypervisor. With the patch, a check is made (through MSR-like mechanism) and if the hypervisor supports, a direct timer IRQ mechanism is put in-place inste
[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash
** Changed in: linux-azure (Ubuntu Bionic) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1882623 Title: VM enter into hung status after triggering a crash Status in linux-azure package in Ubuntu: In Progress Status in linux-azure source package in Bionic: Fix Committed Bug description: [Impact] * When kdumping on trusty/4.15 in an Azure instance, we observe quite frequently a stall on the kdump kernel, it gets blocked and soon we see a stack like the following: [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks: [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 * By using the Azure serial console, we collected a sysrq-w when the issue happens: [ 529.515013] sysrq: Show Blocked State [ 529.517730] taskPC stack pid father [ 529.519006] kworker/u4:2D094 2 0x8000 [ 529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? check_preempt_wakeup+0x162/0x260 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __synchronize_srcu.part.14+0x67/0x80 [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 [ 529.519006] ? __switch_to_asm+0x41/0x70 [ 529.519006] synchronize_srcu+0xd1/0xd6 [ 529.519006] fsnotify_mark_destroy_workfn+0x6d/0xc0 [ 529.519006] process_one_work+0x14e/0x390 [ 529.519006] worker_thread+0x1cc/0x3d0 [ 529.519006] kthread+0x105/0x140 [ 529.519006] ? max_active_store+0x60/0x60 [ 529.519006] ? kthread_bind+0x20/0x20 [ 529.519006] ret_from_fork+0x35/0x40 [ 529.519006] udevadm D0 544 1 0x [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? try_to_wake_up+0x4a/0x460 [ 529.519006] ? try_to_wake_up+0x4a/0x460 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __flush_work.isra.29+0x119/0x1b0 [ 529.519006] ? destroy_worker+0x90/0x90 [ 529.519006] flush_delayed_work+0x3f/0x50 [ 529.519006] fsnotify_wait_marks_destroyed+0x15/0x20 [ 529.519006] fsnotify_destroy_group+0x4e/0xc0 [ 529.519006] inotify_release+0x1e/0x50 [ 529.519006] __fput+0xea/0x220 [ 529.519006] fput+0xe/0x10 [ 529.519006] task_work_run+0x8c/0xb0 [ 529.519006] exit_to_usermode_loop+0x70/0xa9 [ 529.519006] do_syscall_64+0x1b5/0x1e0 [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 [ 529.519006] dhclientD0 573572 0x [ 529.519006] Call Trace: [ 529.519006] __schedule+0x292/0x880 [ 529.519006] schedule+0x36/0x80 [ 529.519006] schedule_timeout+0x1d5/0x2f0 [ 529.519006] ? aa_profile_af_perm+0xb4/0xf0 [ 529.519006] wait_for_completion+0xa5/0x110 [ 529.519006] ? wake_up_q+0x80/0x80 [ 529.519006] __wait_rcu_gp+0x123/0x150 [ 529.519006] synchronize_sched+0x4e/0x60 [ 529.519006] ? __call_rcu+0x2f0/0x2f0 [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 [ 529.519006] synchronize_net+0x1c/0x30 [ 529.519006] __unregister_prot_hook+0xcd/0xf0 [ 529.519006] packet_do_bind+0x1bd/0x250 [ 529.519006] packet_bind+0x2f/0x50 [ 529.519006] SYSC_bind+0xd8/0x110 [ 529.519006] ? sock_alloc_file+0x91/0x130 [ 529.519006] SyS_bind+0xe/0x10 [ 529.519006] do_syscall_64+0x80/0x1e0 [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect led us to the fix - the following patch, when backported to a problematic version, fixes the issue: d8e462e19305 ("Drivers: hv: vmbus: Implement Direct Mode for stimer0") * In Azure/Hyper-V, before the aforementioned commit, timer interrupts were passed to the hypervisor through a vmbus message, a mechanism of communication of hyper-v guests/hypervisor. With the patch, a check is made (through MSR-like mechanism) and if the hypervisor supports, a direct timer IRQ mechanism is put in-place instead of the vmbus channel. * Our theory is that on kdump kernel, specially due to the single cpu nature, the vmbus-messaged timer IRQ could interfere with scheduling and create a dead-lock condition, which is what we observe from the stack traces. Hence, we hereby propose to backport
[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash
** Description changed: - Create VM on Azure using Canonical UbuntuServer 14.04.5-LTS latest using - size Standard DS2 v2 + [Impact] - Install kexec-tools kdump-tools makedumpfile, configure boot kernel - parameter crashkernel=512M + * When kdumping on trusty/4.15 in an Azure instance, we observe quite + frequently a stall on the kdump kernel, it gets blocked and soon we see + a stack like the following: - Run sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools and - sed -i 's/LOAD_KEXEC=true/LOAD_KEXEC=false/g' /etc/default/kexec + [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks: + [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 + [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) + [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 - Reboot VM, make sure crash kernel memory reserved successfully + * By using the Azure serial console, we collected a sysrq-w when the issue happens: + [ 529.515013] sysrq: Show Blocked State + [ 529.517730] taskPC stack pid father + [ 529.519006] kworker/u4:2D094 2 0x8000 + [ 529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn + [ 529.519006] Call Trace: + [ 529.519006] __schedule+0x292/0x880 + [ 529.519006] schedule+0x36/0x80 + [ 529.519006] schedule_timeout+0x1d5/0x2f0 + [ 529.519006] ? check_preempt_wakeup+0x162/0x260 + [ 529.519006] wait_for_completion+0xa5/0x110 + [ 529.519006] ? wake_up_q+0x80/0x80 + [ 529.519006] __synchronize_srcu.part.14+0x67/0x80 + [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 + [ 529.519006] ? __switch_to_asm+0x41/0x70 + [ 529.519006] synchronize_srcu+0xd1/0xd6 + [ 529.519006] fsnotify_mark_destroy_workfn+0x6d/0xc0 + [ 529.519006] process_one_work+0x14e/0x390 + [ 529.519006] worker_thread+0x1cc/0x3d0 + [ 529.519006] kthread+0x105/0x140 + [ 529.519006] ? max_active_store+0x60/0x60 + [ 529.519006] ? kthread_bind+0x20/0x20 + [ 529.519006] ret_from_fork+0x35/0x40 + [ 529.519006] udevadm D0 544 1 0x + [ 529.519006] Call Trace: + [ 529.519006] __schedule+0x292/0x880 + [ 529.519006] schedule+0x36/0x80 + [ 529.519006] schedule_timeout+0x1d5/0x2f0 + [ 529.519006] ? try_to_wake_up+0x4a/0x460 + [ 529.519006] ? try_to_wake_up+0x4a/0x460 + [ 529.519006] wait_for_completion+0xa5/0x110 + [ 529.519006] ? wake_up_q+0x80/0x80 + [ 529.519006] __flush_work.isra.29+0x119/0x1b0 + [ 529.519006] ? destroy_worker+0x90/0x90 + [ 529.519006] flush_delayed_work+0x3f/0x50 + [ 529.519006] fsnotify_wait_marks_destroyed+0x15/0x20 + [ 529.519006] fsnotify_destroy_group+0x4e/0xc0 + [ 529.519006] inotify_release+0x1e/0x50 + [ 529.519006] __fput+0xea/0x220 + [ 529.519006] fput+0xe/0x10 + [ 529.519006] task_work_run+0x8c/0xb0 + [ 529.519006] exit_to_usermode_loop+0x70/0xa9 + [ 529.519006] do_syscall_64+0x1b5/0x1e0 + [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 + [ 529.519006] dhclientD0 573572 0x + [ 529.519006] Call Trace: + [ 529.519006] __schedule+0x292/0x880 + [ 529.519006] schedule+0x36/0x80 + [ 529.519006] schedule_timeout+0x1d5/0x2f0 + [ 529.519006] ? aa_profile_af_perm+0xb4/0xf0 + [ 529.519006] wait_for_completion+0xa5/0x110 + [ 529.519006] ? wake_up_q+0x80/0x80 + [ 529.519006] __wait_rcu_gp+0x123/0x150 + [ 529.519006] synchronize_sched+0x4e/0x60 + [ 529.519006] ? __call_rcu+0x2f0/0x2f0 + [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 + [ 529.519006] synchronize_net+0x1c/0x30 + [ 529.519006] __unregister_prot_hook+0xcd/0xf0 + [ 529.519006] packet_do_bind+0x1bd/0x250 + [ 529.519006] packet_bind+0x2f/0x50 + [ 529.519006] SYSC_bind+0xd8/0x110 + [ 529.519006] ? sock_alloc_file+0x91/0x130 + [ 529.519006] SyS_bind+0xe/0x10 + [ 529.519006] do_syscall_64+0x80/0x1e0 + [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 - Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c > - /proc/sysrq-trigger, VM will reboot automatically and after reboot, - check crash dump is generated under /var/crash + * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce + the issue, whereas v4.16 reproduced. Then, a fine-grained git bisect led + us to the fix - the following patch, when backported to a problematic + version, fixes the issue: d8e462e19305 ("Drivers: hv: vmbus: Implement + Direct Mode for stimer0") - Install linux-azure kernel + * In Azure/Hyper-V, before the aforementioned commit, timer interrupts + were passed to the hypervisor through a vmbus message, a mechanism of + communication of hyper-v guests/hypervisor. With the patch, a check is + made (through MSR-like mechanism) and if the hypervisor supports, a + direct timer IRQ mechanism is put in-place instead of the vmbus channel. - Enable private-ppa canonical-kernel-esm + * Our theory is that on
[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash
** Changed in: linux-azure (Ubuntu) Status: Confirmed => In Progress ** Changed in: linux-azure (Ubuntu) Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1882623 Title: VM enter into hung status after triggering a crash Status in linux-azure package in Ubuntu: In Progress Bug description: Create VM on Azure using Canonical UbuntuServer 14.04.5-LTS latest using size Standard DS2 v2 Install kexec-tools kdump-tools makedumpfile, configure boot kernel parameter crashkernel=512M Run sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools and sed -i 's/LOAD_KEXEC=true/LOAD_KEXEC=false/g' /etc/default/kexec Reboot VM, make sure crash kernel memory reserved successfully Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c > /proc/sysrq-trigger, VM will reboot automatically and after reboot, check crash dump is generated under /var/crash Install linux-azure kernel Enable private-ppa canonical-kernel-esm Install kernel linux-azure, reboot VM, kernel version 4.15.0-1084-azure Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c > /proc/sysrq-trigger, VM entered into a hung status. Attach whole serial console [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks: [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 [ 360.020026] INFO: rcu_sched detected stalls on CPUs/tasks: [ 360.024015] 1-...!: (10 GPs behind) idle=b34/0/0 softirq=1/1 fqs=1 [ 360.024015] (detected by 0, t=15002 jiffies, g=717, c=716, q=6429) [ 360.024015] rcu_sched kthread starved for 15000 jiffies! g717 c716 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 [ 420.048010] INFO: rcu_sched detected stalls on CPUs/tasks: [ 420.052006] 1-...!: (0 ticks this GP) idle=f94/0/0 softirq=1/1 fqs=0 [ 420.052006] (detected by 0, t=15002 jiffies, g=718, c=717, q=6429) [ 420.052006] rcu_sched kthread starved for 15002 jiffies! g718 c717 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1882623/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: linux-azure (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1882623 Title: VM enter into hung status after triggering a crash Status in linux-azure package in Ubuntu: Confirmed Bug description: Create VM on Azure using Canonical UbuntuServer 14.04.5-LTS latest using size Standard DS2 v2 Install kexec-tools kdump-tools makedumpfile, configure boot kernel parameter crashkernel=512M Run sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools and sed -i 's/LOAD_KEXEC=true/LOAD_KEXEC=false/g' /etc/default/kexec Reboot VM, make sure crash kernel memory reserved successfully Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c > /proc/sysrq-trigger, VM will reboot automatically and after reboot, check crash dump is generated under /var/crash Install linux-azure kernel Enable private-ppa canonical-kernel-esm Install kernel linux-azure, reboot VM, kernel version 4.15.0-1084-azure Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c > /proc/sysrq-trigger, VM entered into a hung status. Attach whole serial console [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks: [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 [ 360.020026] INFO: rcu_sched detected stalls on CPUs/tasks: [ 360.024015] 1-...!: (10 GPs behind) idle=b34/0/0 softirq=1/1 fqs=1 [ 360.024015] (detected by 0, t=15002 jiffies, g=717, c=716, q=6429) [ 360.024015] rcu_sched kthread starved for 15000 jiffies! g717 c716 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 [ 420.048010] INFO: rcu_sched detected stalls on CPUs/tasks: [ 420.052006] 1-...!: (0 ticks this GP) idle=f94/0/0 softirq=1/1 fqs=0 [ 420.052006] (detected by 0, t=15002 jiffies, g=718, c=717, q=6429) [ 420.052006] rcu_sched kthread starved for 15002 jiffies! g718 c717 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1882623/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1882623] Re: VM enter into hung status after triggering a crash
Last time I did verification against gallery image, the default kernel version is 4.4.0-148-generic. And I tried below kernel, found the kernel linux-image-4.15.0-1042-azure is the first one which we hit this bug, it is not specific for the esm kernel. # Kernel Version/Package Result 0 Default kernel 4.4.0-148-generic => Good 1 linux-image-4.15.0-1023-azure => Good 2 linux-image-4.15.0-1030-azure 3 linux-image-4.15.0-1031-azure 4 linux-image-4.15.0-1032-azure 5 linux-image-4.15.0-1035-azure 6 linux-image-4.15.0-1036-azure => Good 7 linux-image-4.15.0-1037-azure 8 linux-image-4.15.0-1039-azure 9 linux-image-4.15.0-1040-azure => Good 10 linux-image-4.15.0-1041-azure => Good 11 linux-image-4.15.0-1042-azure => Bad 12 linux-image-4.15.0-1045-azure => Bad -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1882623 Title: VM enter into hung status after triggering a crash Status in linux-azure package in Ubuntu: New Bug description: Create VM on Azure using Canonical UbuntuServer 14.04.5-LTS latest using size Standard DS2 v2 Install kexec-tools kdump-tools makedumpfile, configure boot kernel parameter crashkernel=512M Run sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools and sed -i 's/LOAD_KEXEC=true/LOAD_KEXEC=false/g' /etc/default/kexec Reboot VM, make sure crash kernel memory reserved successfully Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c > /proc/sysrq-trigger, VM will reboot automatically and after reboot, check crash dump is generated under /var/crash Install linux-azure kernel Enable private-ppa canonical-kernel-esm Install kernel linux-azure, reboot VM, kernel version 4.15.0-1084-azure Trigger a crash by run echo 1 > /proc/sys/kernel/sysrq, then echo c > /proc/sysrq-trigger, VM entered into a hung status. Attach whole serial console [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks: [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 [ 360.020026] INFO: rcu_sched detected stalls on CPUs/tasks: [ 360.024015] 1-...!: (10 GPs behind) idle=b34/0/0 softirq=1/1 fqs=1 [ 360.024015] (detected by 0, t=15002 jiffies, g=717, c=716, q=6429) [ 360.024015] rcu_sched kthread starved for 15000 jiffies! g717 c716 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 [ 420.048010] INFO: rcu_sched detected stalls on CPUs/tasks: [ 420.052006] 1-...!: (0 ticks this GP) idle=f94/0/0 softirq=1/1 fqs=0 [ 420.052006] (detected by 0, t=15002 jiffies, g=718, c=717, q=6429) [ 420.052006] rcu_sched kthread starved for 15002 jiffies! g718 c717 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1882623/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp