Bug#1022068: linux: kernel NULL pointer dereference in nouveau driver on Thinkpad W541

2022-11-27 Thread Mathieu Parent (Debian)
On Fri, 21 Oct 2022 17:21:01 +0200 Diederik de Haas
 wrote:
> On woensdag 19 oktober 2022 18:47:06 CEST Ansgar wrote:
> > After upgrading to linux 6.0.2-1 I see the following message during boot:
> >
> > ...
> >
> > | [3.858820] BUG: kernel NULL pointer dereference, address:
> > | 0020 [3.858838] #PF: supervisor read access in kernel
> > | mode
> >
> > I only use the integrated Intel graphics, the Nvidia card is unused.
> >
> > There was no null pointer dereference with the previous kernel
> > (5.19.11-1 (2022-09-24)).
>
> Can you verify if the issue is also present on 6.0~rc7-1~exp1?
> I expect it does, but it's better to know then to assume.
>
> There have been quite some commit under 'drivers/gpu/drm/nouveau' in kernel
> 6.0 and in 6.0.3 there have been several NPE fixes, although they didn't 
> appear
> directly related to your issue.
> It could be, but it could also be that there are more.

I think I have the same problem than Angsar. And still reprocude it
with 6.1~rc5-1~exp1:

[2.347693] nouveau :01:00.0: enabling device (0006 -> 0007)
[2.347973] Console: switching to colour dummy device 80x25
[2.348099] nouveau :01:00.0: NVIDIA GK107 (0e7120a2)
[2.363961] nouveau :01:00.0: bios: version 80.07.59.00.0c
[2.414698] usb 1-1: New USB device found, idVendor=8087,
idProduct=0024, bcdDevice= 0.00
[2.414702] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[2.415145] hub 1-1:1.0: USB hub found
[2.415319] hub 1-1:1.0: 6 ports detected
[2.430667] usb 2-1: New USB device found, idVendor=8087,
idProduct=0024, bcdDevice= 0.00
[2.430671] usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[2.430999] hub 2-1:1.0: USB hub found
[2.431042] hub 2-1:1.0: 8 ports detected
[2.433009] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[2.433620] ata1.00: ACPI cmd 00/00:00:00:00:00:a0(NOP) rejected by
device (Stat=0x51 Err=0x04)
[2.433674] ata1.00: ATA-8: SAMSUNG SSD PM830 2.5" 7mm 512GB,
CXM03D1Q, max UDMA/133
[2.433737] ata1.00: 1000215216 sectors, multi 16: LBA48 NCQ (depth 32), AA
[2.434038] ata1.00: ACPI cmd 00/00:00:00:00:00:a0(NOP) rejected by
device (Stat=0x51 Err=0x04)
[2.434122] ata1.00: configured for UDMA/133
[2.435359] scsi 0:0:0:0: Direct-Access ATA  SAMSUNG SSD
PM83 3D1Q PQ: 0 ANSI: 5
[2.638562] i915 :00:02.0: [drm] VT-d active for gfx access
[2.638566] i915 :00:02.0: vgaarb: deactivate vga console
[2.638598] i915 :00:02.0: [drm] Transparent Hugepage support
is recommended for optimal performance when IOMMU is enabled!
[2.638601] i915 :00:02.0: [drm] DMAR active, disabling use of
stolen memory
[2.646770] nouveau :01:00.0: fb: 2048 MiB GDDR5
[2.683850] [drm] Initialized i915 1.6.0 20201103 for :00:02.0 on minor 1
[2.684119] ACPI: video: [Firmware Bug]: ACPI(PEGP) defines _DOD but not _DOS
[2.684347] ACPI: video: Video Device [PEGP] (multi-head: yes  rom:
yes  post: no)
[2.684507] input: Video Bus as
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:38/LNXVIDEO:00/input/input6
[2.685055] ACPI: video: Video Device [GFX0] (multi-head: yes  rom:
no  post: no)
[2.685247] input: Video Bus as
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:01/input/input7
[2.702363] usb 1-1.5: new high-speed USB device number 3 using ehci-pci
[2.718352] usb 2-1.5: new full-speed USB device number 3 using ehci-pci
[2.721400] vga_switcheroo: enabled
[2.721415] nouveau :01:00.0: DRM: VRAM: 2048 MiB
[2.721417] nouveau :01:00.0: DRM: GART: 1048576 MiB
[2.721419] nouveau :01:00.0: DRM: TMDS table version 2.0
[2.721421] nouveau :01:00.0: DRM: DCB version 4.0
[2.721422] nouveau :01:00.0: DRM: DCB outp 00: 08800fd6 0f420020
[2.721424] nouveau :01:00.0: DRM: DCB outp 01: 08000f92 00020020
[2.721425] nouveau :01:00.0: DRM: DCB conn 00: 1046
[2.723196] nouveau :01:00.0: DRM: MM: using COPY for buffer copies
[2.725211] BUG: kernel NULL pointer dereference, address: 0020
[2.725213] #PF: supervisor read access in kernel mode
[2.725214] #PF: error_code(0x) - not-present page
[2.725215] PGD 0 P4D 0
[2.725217] Oops:  [#1] PREEMPT_RT SMP PTI
[2.725219] CPU: 3 PID: 203 Comm: systemd-udevd Not tainted
6.1.0-0-rt-amd64 #1  Debian 6.1~rc5-1~exp1
[2.725221] Hardware name: Dell Inc. XPS L521X/029M77, BIOS A13 12/07/2012
[2.725222] RIP: 0010:nvif_object_mthd+0xba/0x200 [nouveau]
[2.725298] Code: e0 e5 41 8d 56 20 49 8b 44 24 08 83 fa 17 0f 86
35 01 00 00 4c 39 e0 0f 84 ea 00 00 00 4c 89 63 10 31 c9 48 89 de c6
43 06 ff <48> 8b 78 20 48 8b 40 38 48 8b 40 28 e8 15 d4 1f e6 48 8b 3c
24 4c
[2.725299] RSP: 0018:b45a8054b708 EFLAGS: 00010246
[2.725301] RAX:  RBX: b45a8054b710 RCX: 
[2.725302] RDX: 0028 RSI: b45a8054b710 RDI: b45a8054b738

Bug#1022068: linux: kernel NULL pointer dereference in nouveau driver on Thinkpad W541

2022-10-21 Thread Diederik de Haas
On woensdag 19 oktober 2022 18:47:06 CEST Ansgar wrote:
> After upgrading to linux 6.0.2-1 I see the following message during boot:
> 
> ...
>
> | [3.858820] BUG: kernel NULL pointer dereference, address:
> | 0020 [3.858838] #PF: supervisor read access in kernel
> | mode
> 
> I only use the integrated Intel graphics, the Nvidia card is unused.
> 
> There was no null pointer dereference with the previous kernel
> (5.19.11-1 (2022-09-24)).

Can you verify if the issue is also present on 6.0~rc7-1~exp1?
I expect it does, but it's better to know then to assume.

There have been quite some commit under 'drivers/gpu/drm/nouveau' in kernel 
6.0 and in 6.0.3 there have been several NPE fixes, although they didn't appear 
directly related to your issue.
It could be, but it could also be that there are more.

signature.asc
Description: This is a digitally signed message part.


Bug#1022068: linux: kernel NULL pointer dereference in nouveau driver on Thinkpad W541

2022-10-21 Thread Ansgar
On Wed, 2022-10-19 at 18:47 +0200, Ansgar wrote:
> After upgrading to linux 6.0.2-1 I see the following message during
> boot:
[...]
> Besides the null pointer dereference above, suspend to RAM also no
> longer works properly after the upgrade. I have not investigated that
> further so far.

At least this part is easy: after blacklisting the nouveau driver to
avoid the warning about the NULL pointer dereference, suspend works
again.

Ansgar



Bug#1022068: linux: kernel NULL pointer dereference in nouveau driver on Thinkpad W541

2022-10-19 Thread Ansgar
Source: linux
Version: 6.0.2-1
Severity: important

After upgrading to linux 6.0.2-1 I see the following message during boot:

+---
| [3.723631] i915 :00:02.0: [drm] fb0: i915drmfb frame buffer device
| [...]
| [3.855523] vga_switcheroo: enabled
| [3.855536] nouveau :01:00.0: DRM: VRAM: 2048 MiB
| [3.855537] nouveau :01:00.0: DRM: GART: 1048576 MiB
| [3.855539] nouveau :01:00.0: DRM: TMDS table version 2.0
| [3.855541] nouveau :01:00.0: DRM: DCB version 4.0
| [3.855542] nouveau :01:00.0: DRM: DCB outp 00: 08800fc6 0f420010
| [3.855544] nouveau :01:00.0: DRM: DCB outp 01: 08000f82 00020010
| [3.855545] nouveau :01:00.0: DRM: DCB conn 00: 0146
| [3.857230] nouveau :01:00.0: DRM: MM: using COPY for buffer copies
| [3.858820] BUG: kernel NULL pointer dereference, address: 0020
| [3.858838] #PF: supervisor read access in kernel mode
| [3.858847] #PF: error_code(0x) - not-present page
| [3.858856] PGD 0 P4D 0 
| [3.858864] Oops:  [#1] PREEMPT SMP PTI
| [3.858872] CPU: 1 PID: 427 Comm: systemd-udevd Not tainted 6.0.0-1-amd64 
#1  Debian 6.0.2-1
| [3.858886] Hardware name: LENOVO 20EGS1FD00/20EGS1FD00, BIOS GNET88WW 
(2.36 ) 05/30/2018
| [3.858898] RIP: 0010:nvif_object_mthd+0xba/0x200 [nouveau]
| [3.858982] Code: 72 ce 41 8d 56 20 49 8b 44 24 08 83 fa 17 0f 86 35 01 00 
00 4c 39 e0 0f 84 ea 00 00 00 4c 89 63 10 31 c9 48 89 de c6 43 06 ff <48> 8b 78 
20 48 8b 40 38 48 8b 40 28 e8 d5 e3 95 ce 48 8b 3c 24 4c
| [3.859008] RSP: 0018:a8e7409bb718 EFLAGS: 00010246
| [3.859018] RAX:  RBX: a8e7409bb720 RCX: 

| [3.859030] RDX: 0028 RSI: a8e7409bb720 RDI: 
a8e7409bb748
| [3.859042] RBP:  R08: a8e7409bb968 R09: 
0008
| [3.859053] R10: 95661041f9c0 R11: a8e740e3 R12: 
9565ca2114f8
| [3.859065] R13: a8e7409bb720 R14: 0008 R15: 
a8e7409bb740
| [3.859076] FS:  7fc0a2a6e8c0() GS:956d1e24() 
knlGS:
| [3.859090] CS:  0010 DS:  ES:  CR0: 80050033
| [3.859100] CR2: 0020 CR3: 000100f74001 CR4: 
001706e0
| [3.859112] Call Trace:
| [3.859120]  
| [3.859128]  nvif_conn_hpd_status+0x35/0xe0 [nouveau]
| [3.859209]  nouveau_dp_detect+0x2d0/0x410 [nouveau]
| [3.859302]  nouveau_connector_detect+0x9b/0x550 [nouveau]
| [3.859395]  drm_helper_probe_detect+0x84/0xb0 [drm_kms_helper]
| [3.859421]  drm_helper_probe_single_connector_modes+0x361/0x510 
[drm_kms_helper]
| [3.859444]  drm_client_modeset_probe+0x224/0x1490 [drm]
| [3.859487]  ? nouveau_cli_init+0x3ea/0x490 [nouveau]
| [3.859582]  ? __pm_runtime_suspend+0x6a/0x100
| [3.859593]  __drm_fb_helper_initial_config_and_unlock+0x44/0x510 
[drm_kms_helper]
| [3.859618]  ? drm_client_init+0x133/0x160 [drm]
| [3.859653]  nouveau_fbcon_init+0x14a/0x1c0 [nouveau]
| [3.859736]  nouveau_drm_device_init+0x1ec/0x7a0 [nouveau]
| [3.859819]  ? pci_update_current_state+0x6e/0xa0
| [3.859831]  nouveau_drm_probe+0x128/0x1f0 [nouveau]
| [3.859913]  ? _raw_spin_unlock_irqrestore+0x23/0x40
| [3.859925]  local_pci_probe+0x41/0x80
| [3.859935]  pci_device_probe+0xc3/0x230
| [3.859946]  really_probe+0xde/0x380
| [3.859955]  ? pm_runtime_barrier+0x50/0x90
| [3.859963]  __driver_probe_device+0x78/0x170
| [3.859972]  driver_probe_device+0x1f/0x90
| [3.859981]  __driver_attach+0xd1/0x1d0
| [3.859990]  ? __device_attach_driver+0x110/0x110
| [3.86]  bus_for_each_dev+0x87/0xd0
| [3.860011]  bus_add_driver+0x1ae/0x200
| [3.860019]  driver_register+0x89/0xe0
| [3.860028]  ? 0xc0731000
| [3.860035]  do_one_initcall+0x59/0x220
| [3.860047]  do_init_module+0x4a/0x200
| [3.860057]  __do_sys_finit_module+0xac/0x120
| [3.860067]  do_syscall_64+0x3a/0xc0
| [3.860077]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
| [3.860088] RIP: 0033:0x7fc0a3177859
| [3.860096] Code: 08 44 89 e0 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 48 89 
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 87 05 0f 00 f7 d8 64 89 01 48
| [3.860121] RSP: 002b:7ffdb9440778 EFLAGS: 0246 ORIG_RAX: 
0139
| [3.860133] RAX: ffda RBX: 55f6ea1a8cf0 RCX: 
7fc0a3177859
| [3.860144] RDX:  RSI: 7fc0a3327efd RDI: 
0015
| [3.860155] RBP: 7fc0a3327efd R08:  R09: 
55f6ea1af1a0
| [3.860167] R10: 0015 R11: 0246 R12: 
0002
| [3.860178] R13:  R14: 55f6ea1df350 R15: 
55f6e964fcc1
| [3.860190]  
| [3.860196] Modules linked in: raid1 md_mod i915 nouveau(+) sd_mod t10_pi 
sr_mod crc64_rocksoft_generic cdrom crc64_rocksoft