Control: tags -1 + moreinfo

Hello Arne,

On Tue, Jul 12, 2022 at 08:14:22AM +0200, Arne Nordmark wrote:
> 
> Package: src:linux
> Version: 5.10.127-1
> Severity: normal
> 
> Dear Maintainer,
> 
> The new kernel in Debian 11.4 seems unstable and crashes when serving NFS.
> On two different computers, these lockups happens within minutes, typically
> when a client runs firefox on an NFS-mounted home directory. Typically the
> servers lock up without any printout, but on one occasion, the following was
> logged:
> 
> jul 10 08:35:13 ano4 kernel: general protection fault, probably for
> non-canonical address 0x2f48514544455145: 0000 [#1] SMP PTI
> jul 10 08:35:13 ano4 kernel: CPU: 2 PID: 1244 Comm: nfsd Not tainted
> 5.10.0-16-amd64 #1 Debian 5.10.127-1
> jul 10 08:35:13 ano4 kernel: Hardware name: System manufacturer System
> Product Name/P5Q DELUXE, BIOS 2201    05/21/2009
> jul 10 08:35:13 ano4 kernel: RIP: 0010:fsnotify+0x2d9/0x570
> jul 10 08:35:13 ano4 kernel: Code: 78 08 44 0b 30 44 0b 68 40 48 83 c1 01 48
> 83 f9 04 75 d9 66 66 66 66 90 44 8b 4c 24 1c 44 89 e8 f7 d0 45 21 f1 41 85
> c1 74 4f <49> 8b 3f 48 8b 07 48 85 c0 0f 84 0a 01 00 00 48 8d 7c 24 38 44 89
> jul 10 08:35:13 ano4 kernel: RSP: 0018:ffffabe901fa3bc8 EFLAGS: 00010202
> jul 10 08:35:13 ano4 kernel: RAX: 00000000bab6aebe RBX: 0000000000000001
> RCX: 0000000000000004
> jul 10 08:35:13 ano4 kernel: RDX: 0000000000035a00 RSI: 0000000000000001
> RDI: 2f48514544455145
> jul 10 08:35:13 ano4 kernel: RBP: ffffabe901fa3c20 R08: 0000000000000001
> R09: 0000000000000002
> jul 10 08:35:13 ano4 kernel: R10: 0000000000000002 R11: 0000000000000002
> R12: 0000000000000002
> jul 10 08:35:13 ano4 kernel: R13: 0000000045495141 R14: 00000000424d6757
> R15: 2f48514544455145
> jul 10 08:35:13 ano4 kernel: FS:  0000000000000000(0000)
> GS:ffff939527d00000(0000) knlGS:0000000000000000
> jul 10 08:35:13 ano4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> jul 10 08:35:13 ano4 kernel: CR2: 0000560b8cee4000 CR3: 00000001034da000
> CR4: 00000000000406e0
> jul 10 08:35:13 ano4 kernel: Call Trace:
> jul 10 08:35:13 ano4 kernel:  __fsnotify_parent+0xe7/0x2d0
> jul 10 08:35:13 ano4 kernel:  ? ext4_buffered_write_iter+0xce/0x160 [ext4]
> jul 10 08:35:13 ano4 kernel:  ? do_iter_readv_writev+0x152/0x1b0
> jul 10 08:35:13 ano4 kernel:  do_iter_write+0xc8/0x1b0
> jul 10 08:35:13 ano4 kernel:  nfsd_vfs_write+0x175/0x510 [nfsd]
> jul 10 08:35:13 ano4 kernel:  nfsd4_write+0x135/0x1b0 [nfsd]
> jul 10 08:35:13 ano4 kernel:  nfsd4_proc_compound+0x40d/0x680 [nfsd]
> jul 10 08:35:13 ano4 kernel:  nfsd_dispatch+0xd3/0x180 [nfsd]
> jul 10 08:35:13 ano4 kernel:  svc_process_common+0x3d4/0x6d0 [sunrpc]
> jul 10 08:35:13 ano4 kernel:  ? nfsd_svc+0x320/0x320 [nfsd]
> jul 10 08:35:13 ano4 kernel:  svc_process+0xb7/0xf0 [sunrpc]
> jul 10 08:35:13 ano4 kernel:  nfsd+0xe8/0x140 [nfsd]
> jul 10 08:35:13 ano4 kernel:  ? nfsd_destroy+0x60/0x60 [nfsd]
> jul 10 08:35:13 ano4 kernel:  kthread+0x11b/0x140
> jul 10 08:35:13 ano4 kernel:  ? __kthread_bind_mask+0x60/0x60
> jul 10 08:35:13 ano4 kernel:  ret_from_fork+0x22/0x30
> jul 10 08:35:13 ano4 kernel: Modules linked in: dm_snapshot dm_bufio tun
> cpufreq_ondemand cpufreq_powersave cpufreq_conservative cpufreq_userspace
> aes_generic libaes crypto_simd cryptd glue_helper cbc cts rpcsec_gss_krb5
> sit tunnel4 ip_tunnel nft_nat sch_fq_codel rc_pinnacl
> e_pctv_hd em28xx_rc rc_core si2157 si2168 i2c_mux em28xx_dvb dvb_core
> snd_hda_codec_analog snd_hda_codec_generic ledtrig_audio ivtv_alsa
> tuner_simple tuner_types snd_hda_codec_hdmi wm8775 snd_hda_intel tda9887
> tda8290 snd_intel_dspcfg tea5767 soundwire_intel tuner
> soundwire_generic_allocation snd_soc_core snd
> _compress soundwire_cadence cx25840 snd_hda_codec ivtv snd_hda_core
> snd_hwdep soundwire_bus em28xx kvm_intel radeon tveeprom snd_pcm cx2341x kvm
> ttm videodev snd_timer snd irqbypass soundcore drm_kms_helper mc serio_raw
> evdev cec i2c_algo_bit iTCO_wdt intel_pmc_bxt iTCO_vendor_support pcspkr
> watchdog sg acpi_
> cpufreq asus_atk0110 button nft_chain_nat nf_nat nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_counter nft_ct
> jul 10 08:35:13 ano4 kernel:  nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> coretemp firewire_sbp2 nf_tables nfnetlink loop nfsd parport_pc ppdev
> nfs_acl lockd lp auth_rpcgss parport grace drm fuse sunrpc configfs
> ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid10 raid4
> 56 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
> libcrc32c crc32c_generic raid0 multipath linear dm_mod raid1 md_mod sd_mod
> hid_generic t10_pi ata_generic crc_t10dif crct10dif_generic st
> crct10dif_common usbhid pata_marvell hid ahci libahci mpt3sas firewire_ohci
> firewire_core aic7xxx
>  crc_itu_t libata skge ehci_pci uhci_hcd scsi_transport_spi lpc_ich i2c_i801
> sky2 ehci_hcd psmouse i2c_smbus raid_class scsi_transport_sas usbcore
> scsi_mod usb_common floppy
> jul 10 08:35:13 ano4 kernel: ---[ end trace 159cb95f57d30ea4 ]---
> jul 10 08:35:13 ano4 kernel: RIP: 0010:fsnotify+0x2d9/0x570
> jul 10 08:35:13 ano4 kernel: Code: 78 08 44 0b 30 44 0b 68 40 48 83 c1 01 48
> 83 f9 04 75 d9 66 66 66 66 90 44 8b 4c 24 1c 44 89 e8 f7 d0 45 21 f1 41 85
> c1 74 4f <49> 8b 3f 48 8b 07 48 85 c0 0f 84 0a 01 00 00 48 8d 7c 24 38 44 89
> jul 10 08:35:13 ano4 kernel: RSP: 0018:ffffabe901fa3bc8 EFLAGS: 00010202
> jul 10 08:35:13 ano4 kernel: RAX: 00000000bab6aebe RBX: 0000000000000001
> RCX: 0000000000000004
> jul 10 08:35:13 ano4 kernel: RDX: 0000000000035a00 RSI: 0000000000000001
> RDI: 2f48514544455145
> jul 10 08:35:13 ano4 kernel: RBP: ffffabe901fa3c20 R08: 0000000000000001
> R09: 0000000000000002
> jul 10 08:35:13 ano4 kernel: R10: 0000000000000002 R11: 0000000000000002
> R12: 0000000000000002
> jul 10 08:35:13 ano4 kernel: R13: 0000000045495141 R14: 00000000424d6757
> R15: 2f48514544455145
> jul 10 08:35:13 ano4 kernel: FS:  0000000000000000(0000)
> GS:ffff939527d00000(0000) knlGS:0000000000000000
> jul 10 08:35:13 ano4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> jul 10 08:35:13 ano4 kernel: CR2: 0000560b8cee4000 CR3: 00000001034da000
> CR4: 00000000000406e0
> jul 10 08:35:13 ano4 kernel: list_del corruption. next->prev should be
> ffff939408b1d6a0, but was 4141514142455142
> jul 10 08:35:13 ano4 kernel: ------------[ cut here ]------------
> jul 10 08:35:13 ano4 kernel: kernel BUG at lib/list_debug.c:54!
> jul 10 08:35:13 ano4 kernel: invalid opcode: 0000 [#2] SMP PTI
> jul 10 08:35:13 ano4 kernel: CPU: 2 PID: 1242 Comm: nfsd Tainted: G D
> 5.10.0-16-amd64 #1 Debian 5.10.127-1
> jul 10 08:35:13 ano4 kernel: Hardware name: System manufacturer System
> Product Name/P5Q DELUXE, BIOS 2201    05/21/2009
> jul 10 08:35:13 ano4 kernel: RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
> jul 10 08:35:13 ano4 kernel: Code: c7 c7 b8 1e d2 8e e8 1a 14 ff ff 0f 0b 48
> 89 fe 48 c7 c7 48 1f d2 8e e8 09 14 ff ff 0f 0b 48 c7 c7 f8 1f d2 8e e8 fb
> 13 ff ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 b8 1f d2 8e e8 e7 13 ff ff 0f 0b
> jul 10 08:35:13 ano4 kernel: RSP: 0018:ffffabe901f93cf8 EFLAGS: 00010246
> jul 10 08:35:13 ano4 kernel: RAX: 0000000000000054 RBX: ffff93940f10b800
> RCX: 0000000000000000
> jul 10 08:35:13 ano4 kernel: RDX: 0000000000000000 RSI: ffff939527d1ca00
> RDI: ffff939527d1ca00
> jul 10 08:35:13 ano4 kernel: RBP: ffff939408b1d690 R08: 0000000000000000
> R09: ffffabe901f93b20
> jul 10 08:35:13 ano4 kernel: R10: ffffabe901f93b18 R11: ffffffff8f2cb448
> R12: ffff939408b1d6b0
> jul 10 08:35:13 ano4 kernel: R13: ffff939408b1d6a0 R14: dead000000000100
> R15: 0000000000000000
> jul 10 08:35:13 ano4 kernel: FS:  0000000000000000(0000)
> GS:ffff939527d00000(0000) knlGS:0000000000000000
> jul 10 08:35:13 ano4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> jul 10 08:35:13 ano4 kernel: CR2: 0000560b8cee4000 CR3: 00000001034da000
> CR4: 00000000000406e0
> jul 10 08:35:13 ano4 kernel: Call Trace:
> jul 10 08:35:13 ano4 kernel:  fsnotify_detach_mark+0x44/0x90
> jul 10 08:35:13 ano4 kernel:  fsnotify_destroy_mark+0x1f/0x40
> jul 10 08:35:13 ano4 kernel:  nfsd_file_free+0xb7/0xe0 [nfsd]
> jul 10 08:35:13 ano4 kernel:  nfsd_file_close_inode_sync+0xfb/0x150 [nfsd]
> jul 10 08:35:13 ano4 kernel:  nfsd_unlink+0x244/0x250 [nfsd]
> jul 10 08:35:13 ano4 kernel:  nfsd4_remove+0x4c/0x130 [nfsd]
> jul 10 08:35:13 ano4 kernel:  nfsd4_proc_compound+0x40d/0x680 [nfsd]
> jul 10 08:35:13 ano4 kernel:  nfsd_dispatch+0xd3/0x180 [nfsd]
> jul 10 08:35:13 ano4 kernel:  svc_process_common+0x3d4/0x6d0 [sunrpc]
> jul 10 08:35:13 ano4 kernel:  ? nfsd_svc+0x320/0x320 [nfsd]
> jul 10 08:35:13 ano4 kernel:  svc_process+0xb7/0xf0 [sunrpc]
> jul 10 08:35:13 ano4 kernel:  nfsd+0xe8/0x140 [nfsd]
> jul 10 08:35:13 ano4 kernel:  ? nfsd_destroy+0x60/0x60 [nfsd]
> jul 10 08:35:13 ano4 kernel:  kthread+0x11b/0x140
> jul 10 08:35:13 ano4 kernel:  ? __kthread_bind_mask+0x60/0x60
> jul 10 08:35:13 ano4 kernel:  ret_from_fork+0x22/0x30
> jul 10 08:35:13 ano4 kernel: Modules linked in: dm_snapshot dm_bufio tun
> cpufreq_ondemand cpufreq_powersave cpufreq_conservative cpufreq_userspace
> aes_generic libaes crypto_simd cryptd glue_helper cbc cts rpcsec_gss_krb5
> sit tunnel4 ip_tunnel nft_nat sch_fq_codel rc_pinnacl
> e_pctv_hd em28xx_rc rc_core si2157 si2168 i2c_mux em28xx_dvb dvb_core
> snd_hda_codec_analog snd_hda_codec_generic ledtrig_audio ivtv_alsa
> tuner_simple tuner_types snd_hda_codec_hdmi wm8775 snd_hda_intel tda9887
> tda8290 snd_intel_dspcfg tea5767 soundwire_intel tuner
> soundwire_generic_allocation snd_soc_core snd
> _compress soundwire_cadence cx25840 snd_hda_codec ivtv snd_hda_core
> snd_hwdep soundwire_bus em28xx kvm_intel radeon tveeprom snd_pcm cx2341x kvm
> ttm videodev snd_timer snd irqbypass soundcore drm_kms_helper mc serio_raw
> evdev cec i2c_algo_bit iTCO_wdt intel_pmc_bxt iTCO_vendor_support pcspkr
> watchdog sg acpi_
> cpufreq asus_atk0110 button nft_chain_nat nf_nat nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_counter nft_ct
> jul 10 08:35:13 ano4 kernel:  nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> coretemp firewire_sbp2 nf_tables nfnetlink loop nfsd parport_pc ppdev
> nfs_acl lockd lp auth_rpcgss parport grace drm fuse sunrpc configfs
> ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid10 raid4
> 56 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
> libcrc32c crc32c_generic raid0 multipath linear dm_mod raid1 md_mod sd_mod
> hid_generic t10_pi ata_generic crc_t10dif crct10dif_generic st
> crct10dif_common usbhid pata_marvell hid ahci libahci mpt3sas firewire_ohci
> firewire_core aic7xxx
>  crc_itu_t libata skge ehci_pci uhci_hcd scsi_transport_spi lpc_ich i2c_i801
> sky2 ehci_hcd psmouse i2c_smbus raid_class scsi_transport_sas usbcore
> scsi_mod usb_common floppy
> jul 10 08:35:13 ano4 kernel: ---[ end trace 159cb95f57d30ea5 ]---
> jul 10 08:35:13 ano4 kernel: RIP: 0010:fsnotify+0x2d9/0x570
> jul 10 08:35:13 ano4 kernel: Code: 78 08 44 0b 30 44 0b 68 40 48 83 c1 01 48
> 83 f9 04 75 d9 66 66 66 66 90 44 8b 4c 24 1c 44 89 e8 f7 d0 45 21 f1 41 85
> c1 74 4f <49> 8b 3f 48 8b 07 48 85 c0 0f 84 0a 01 00 00 48 8d 7c 24 38 44 89
> jul 10 08:35:13 ano4 kernel: RSP: 0018:ffffabe901fa3bc8 EFLAGS: 00010202
> jul 10 08:35:13 ano4 kernel: RAX: 00000000bab6aebe RBX: 0000000000000001
> RCX: 0000000000000004
> jul 10 08:35:13 ano4 kernel: RDX: 0000000000035a00 RSI: 0000000000000001
> RDI: 2f48514544455145
> jul 10 08:35:13 ano4 kernel: RBP: ffffabe901fa3c20 R08: 0000000000000001
> R09: 0000000000000002
> jul 10 08:35:13 ano4 kernel: R10: 0000000000000002 R11: 0000000000000002
> R12: 0000000000000002
> jul 10 08:35:13 ano4 kernel: R13: 0000000045495141 R14: 00000000424d6757
> R15: 2f48514544455145
> jul 10 08:35:13 ano4 kernel: FS:  0000000000000000(0000)
> GS:ffff939527d00000(0000) knlGS:0000000000000000
> jul 10 08:35:13 ano4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> jul 10 08:35:13 ano4 kernel: CR2: 0000560b8cee4000 CR3: 00000001034da000
> CR4: 00000000000406e0
> jul 10 08:35:21 ano4 kernel: general protection fault, probably for
> non-canonical address 0xb1c8a36300fbcf32: 0000 [#3] SMP PTI
> jul 10 08:35:21 ano4 kernel: CPU: 1 PID: 1239 Comm: nfsd Tainted: G D
> 5.10.0-16-amd64 #1 Debian 5.10.127-1
> jul 10 08:35:21 ano4 kernel: Hardware name: System manufacturer System
> Product Name/P5Q DELUXE, BIOS 2201    05/21/2009
> jul 10 08:35:21 ano4 kernel: RIP: 0010:kmem_cache_alloc+0x89/0x1f0
> jul 10 08:35:21 ano4 kernel: Code: 1e 18 72 49 8b 00 49 83 78 10 00 48 89 04
> 24 0f 84 42 01 00 00 48 85 c0 0f 84 39 01 00 00 41 8b 4c 24 28 49 8b 3c 24
> 48 01 c1 <48> 8b 19 48 89 ce 49 33 9c 24 b8 00 00 00 48 8d 4a 01 48 0f ce 48
> jul 10 08:35:21 ano4 kernel: RSP: 0018:ffffabe900f3fd50 EFLAGS: 00010282
> jul 10 08:35:21 ano4 kernel: RAX: b1c8a36300fbcee2 RBX: ffff939403b58070
> RCX: b1c8a36300fbcf32
> 
> After reverting to boot the servers on kernel linux-image-5.10.0-15-amd64
> 5.10.120-1 (but still using linux-image-5.10.0-16-amd64 on the clients) the
> servers are stable again.
> 
> From client mount output: type nfs4 
> (rw,nosuid,nodev,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp6,timeo=600,retrans=2,sec=krb5p,local_lock=none

As you seem to reliably reproduce the issue, do you have the
possiblity (on the nonproduction instance) to try to bisect down the
problem? Additionally to the bisect, on a testinstance were the issue
is reproducible, can you run a selfcompiled 5.10.130 upstream to see
if the problem is still present?

Regards,
Salvatore

Reply via email to