Bug#1069077: rockpro64: multiple kernel oops and frequent boot failures

2024-05-17 Thread Forest
Control: fixed -1 6.8.9-1

On Fri, 17 May 2024 12:15:55 +0200, Diederik de Haas wrote:

>Kernel 6.8.9 has recently been uploaded to Unstable which has that commit.
>Can you verify that it indeed fixes this bug?

Indeed, it seems to be fixed there. It usually takes only one or two boots
to show up, but I didn't see it in five reboots with kernel 6.8.9. This
matches what I found while bisecting for the past week.

Note that I have not examined the es8316 driver code or its relationship to
maple_tree. I don't know if the bug was in maple_tree and now fixed, or
still lurks within the driver but is now hidden as a result of the
maple_tree changes. In any case, I'm happy to report that I no longer see it
breaking the OS in this newer kernel.



Bug#1069077: rockpro64: multiple kernel oops and frequent boot failures

2024-05-17 Thread Diederik de Haas
On Friday, 17 May 2024 03:36:35 CEST Forest wrote:
> A git bisect reveals it to be fixed by this commit:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=
> f7a59018953910032231c0a019208c4b0a4a8bc3
> > maple_tree: make mas_erase() more robust
> > 
> > mas_erase() may not deal correctly with all maple states.  Make the
> > function more robust by ensuring the state is in one of the two acceptable
> > states.

Kernel 6.8.9 has recently been uploaded to Unstable which has that commit.
Can you verify that it indeed fixes this bug?

signature.asc
Description: This is a digitally signed message part.


Bug#1069077: rockpro64: multiple kernel oops and frequent boot failures

2024-04-15 Thread Forest
Package: src:linux
Version: 6.7.9-2
Severity: important
X-Debbugs-Cc: fores...@sonic.net

Dear Maintainer,

The current debian unstable kernel causes a variety of failures that are not
present in the bookworm kernel, on the RockPro64 single board computer. (This
is an arm64 machine built upon the Rockchip rk3399 SoC.)

The system is sometimes able to reach a state where sshd login works, allowing
me to run reportbug, but not always. Regardless of whether it gets that far,
dmesg often contains one or more stack traces, along with messages like these:

  kernel BUG at mm/slub.c:448!
  Internal error: Oops - BUG: f2000800 [#1] SMP

  WARNING: CPU: 2 PID: 0 at kernel/context_tracking.c:128 
ct_kernel_exit.isra.0+0xa0/0xa8

  Unable to handle kernel paging request at virtual address 4daee1bbcd3980fb

I have noticed es8316 driver error messages preceding some of these stack
traces, though I'm not sure if that is always the case.

Sometimes the stack traces appear only once, during boot, and the system
appears to run normally after that. Other times, they appear every few minutes,
and various things like network services and the ability to cleanly shut down,
or even log in at the serial console, fail. In one case, I noticed a message
mentioning a kernel panic in the serial console output when I was trying to
shut down.

Since the worst examples of failure prevent me from logging in, I am unable to
run reportbug to capture information about those cases.

Reverting to linux-image-6.1.0-20-arm64 solves the problem.


-- Package-specific info:
** Version:
Linux version 6.7.9-arm64 (debian-ker...@lists.debian.org) 
(aarch64-linux-gnu-gcc-13 (Debian 13.2.0-18) 13.2.0, GNU ld (GNU Binutils for 
Debian) 2.42) #1 SMP Debian 6.7.9-2 (2024-03-13)

** Command line:
root=/dev/mapper/ console=ttyS2,150n8 net.ifnames=0

** Tainted: DWC (1664)
 * kernel died recently, i.e. there was an OOPS or BUG
 * kernel issued warning
 * staging driver was loaded

** Kernel log:
[   56.250803]  driver_attach+0x2c/0x40
[   56.250809]  bus_add_driver+0x11c/0x238
[   56.250814]  driver_register+0x64/0x138
[   56.250821]  __platform_driver_register+0x30/0x48
[   56.252550]  graph_card_init+0x28/0xff8 [snd_soc_audio_graph_card]
[   56.252565]  do_one_initcall+0x60/0x298
[   56.252574]  do_init_module+0x60/0x218
[   56.252581]  load_module+0x22b4/0x23b8
[   56.252588]  __do_sys_init_module+0x230/0x290
[   56.252593]  __arm64_sys_init_module+0x24/0x38
[   56.252599]  invoke_syscall+0x78/0x100
[   56.252609]  el0_svc_common.constprop.0+0xc8/0xf0
[   56.252617]  do_el0_svc+0x24/0x38
[   56.252624]  el0_svc+0x3c/0x108
[   56.252633]  el0t_64_sync_handler+0x120/0x130
[   56.252639]  el0t_64_sync+0x190/0x198
[   56.256943] Code: 52800024 97fff9b4 a94563f7 17d0 (d421) 
[   56.256952] ---[ end trace  ]---
[   56.256957] note: (udev-worker)[554] exited with irqs disabled
[   56.257262] [ cut here ]
[   56.258816] WARNING: CPU: 2 PID: 0 at kernel/context_tracking.c:128 
ct_kernel_exit.isra.0+0xa0/0xa8
[   56.259633] Modules linked in: snd_soc_audio_graph_card(+) 
snd_soc_simple_card snd_soc_rockchip_i2s evdev snd_soc_spdif_tx 
snd_soc_simple_card_utils snd_soc_es8316 snd_soc_hdmi_codec v4l2_vp9 
rockchip_rga v4l2_h264 videobuf2_dma_contig snd_soc_core v4l2_mem2mem 
sha512_arm64 videobuf2_dma_sg governor_simpleondemand snd_compress 
snd_pcm_dmaengine snd_pcm videobuf2_memops panfrost dw_wdt videobuf2_v4l2 
snd_timer ofpart gpu_sched snd drbg(+) leds_gpio pwm_fan drm_shmem_helper 
spi_nor videodev des_generic ansi_cprng dw_hdmi_i2s_audio dw_hdmi_cec rk_crypto 
ecdh_generic(+) rockchip_saradc gpio_ir_recv rfkill videobuf2_common mc 
crypto_engine ecc nvmem_rockchip_efuse soundcore libdes mtd rockchip_thermal 
coresight_cpu_debug industrialio_triggered_buffer sg kfifo_buf coresight_etm4x 
rockchip_dfi industrialio coresight cpufreq_dt loop efi_pstore configfs 
ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic dm_crypt 
dm_mod dax sd_mod t10_pi xhci_plat_hcd xhci_hcd crc64_rocksoft_generic 
crc64_rocksoft crc_t10dif
[   56.259856]  crct10dif_generic crc64 realtek ahci libahci libata 
rk808_regulator dwc3 scsi_mod udc_core scsi_common fusb302 tcpm ulpi typec 
crct10dif_ce crct10dif_common polyval_ce rockchipdrm polyval_generic dw_hdmi 
dwmac_rk fan53555 cec ghash_ce stmmac_platform rc_core stmmac gf128mul 
dw_mipi_dsi analogix_dp sha2_ce pcs_xpcs pwm_regulator sha256_arm64 
drm_display_helper phylink ohci_platform sha1_ce dwc3_of_simple of_mdio 
gpio_rockchip gpio_keys ohci_hcd ehci_platform drm_dma_helper fixed_phy 
sdhci_of_arasan ehci_hcd sdhci_pltfm drm_kms_helper cqhci dw_mmc_rockchip 
fwnode_mdio phy_rockchip_inno_usb2 phy_rockchip_emmc phy_rockchip_pcie 
phy_rockchip_typec usbcore io_domain pl330 pwm_rockchip spi_rockchip drm 
dw_mmc_pltfm sdhci libphy dw_mmc i2c_rk3x usb_common fixed aes_neon_bs 
aes_neon_blk aes_ce_blk aes_ce_cipher
[   56.274047] CPU: 2 PID: 0 Comm: