Oops (NULL pointer dereference) in radeon_fence_ref in 3.14.63
On 08.03.2016 11:54, Nicolai Hähnle wrote: > On 05.03.2016 16:24, Christian König wrote: >> just an educated guess, but I think the problem is simply that kernel >> 3.14 doesn't yet contain the code so that radeon_fence_get() can safely >> called with a NULL pointer. >> >> So the backport of Nicolai's patch needs and extra check for the case >> when the fence is NULL. > > Oops indeed. Only the ref call should need the guard, the unref has > always had a NULL pointer test as far as I can see. > > Lutz, could you please test whether the attached patch on top of 3.14.63 > fixes the problem? Nicolai, if you haven't already, please send this patch to stable at vger.kernel.org with an explicit explanation of which stable branches need it. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
Oops (NULL pointer dereference) in radeon_fence_ref in 3.14.63
Nicolai Hähnle wrote: > Lutz, could you please test whether the attached patch on top of 3.14.63 > fixes the problem? I'd tend to say it does. I am running the patched 3.14.63 since I got the patch and so far no oops occurred, whereas without the patch it took less than 2 hours until it oopsed. So: Tested-by: Lutz Euler Thanks for providing the patch so quickly! Regards, Lutz
Oops (NULL pointer dereference) in radeon_fence_ref in 3.14.63
Am 08.03.2016 um 03:54 schrieb Nicolai Hähnle: > Hi, > > On 05.03.2016 16:24, Christian König wrote: >> just an educated guess, but I think the problem is simply that kernel >> 3.14 doesn't yet contain the code so that radeon_fence_get() can safely >> called with a NULL pointer. >> >> So the backport of Nicolai's patch needs and extra check for the case >> when the fence is NULL. > > Oops indeed. Only the ref call should need the guard, the unref has > always had a NULL pointer test as far as I can see. > > Lutz, could you please test whether the attached patch on top of > 3.14.63 fixes the problem? Patch is Reviewed-by: Christian König . Regards, Christian. > > Thanks, > Nicolai > >> Regards, >> Christian. >> >> Am 05.03.2016 um 18:16 schrieb Lutz Euler: >>> Hi, >>> >>> after upgrading from kernel 3.14.62 to 3.14.63, while surfing, the >>> screen suddenly got black and the mouse cursor froze. I had to reset >>> the machine and found an oops followed by repeated messages >>> "BUG: scheduling while atomic: Xorg/3757/0x0002" in the logs. >>> I have copied the oops and the first of these messages below. >>> >>> This was repeatable: After the reboot, when the browser restored its >>> tabs, the oops occurred again. I then rebooted into 3.14.62 and the >>> problem didn't occur again. >>> >>> Just guessing: Of the commits regarding radeon between these two >>> kernel versions might the following be involved >>> >>> Nicolai Hähnle (1): >>>drm/radeon: hold reference to fences in radeon_sa_bo_new >>> >>> as it mentions fences and the stack trace starts with radeon_sa_bo_new? >>> >>> Thanks and Regards, >>> >>> Lutz >>> >>> From lspci -v: >>> >>> 05:00.0 VGA compatible controller: ATI Technologies Inc NI Caicos [AMD >>> RADEON HD 6450] (prog-if 00 [VGA controller]) >>> Subsystem: PC Partner Limited Device e164 >>> Flags: bus master, fast devsel, latency 0, IRQ 53 >>> Memory at d000 (64-bit, prefetchable) [size=256M] >>> Memory at fe9e (64-bit, non-prefetchable) [size=128K] >>> I/O ports at e000 [size=256] >>> Expansion ROM at fe9c [disabled] [size=128K] >>> Capabilities: [50] Power Management version 3 >>> Capabilities: [58] Express Legacy Endpoint, MSI 00 >>> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >>> Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 >>> Len=010 >>> Capabilities: [150] Advanced Error Reporting >>> Kernel driver in use: radeon >>> Kernel modules: radeon >>> >>> X.Org version: 1.10.4 >>> >>> decodecode of the oops: >>> >>> Mar 5 15:04:58 lutz kernel: [ 6995.216776] Code: c7 c6 d8 3e 36 a0 31 >>> c0 45 31 e4 e8 7b 70 15 e1 eb b5 66 0f 1f 84 00 00 00 00 00 55 48 89 >>> f8 ba 01 00 00 00 48 89 e5 48 83 ec 10 0f c1 57 08 ff c2 ff ca 7e >>> 02 c9 c3 80 3d 10 ae 10 00 01 74 >>> All code >>> >>> 0:c7 c6 d8 3e 36 a0mov$0xa0363ed8,%esi >>> 6:31 c0xor%eax,%eax >>> 8:45 31 e4 xor%r12d,%r12d >>> b:e8 7b 70 15 e1 callq 0xe115708b >>>10:eb b5jmp0xffc7 >>>12:66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) >>>19:00 00 >>>1b:55 push %rbp >>>1c:48 89 f8 mov%rdi,%rax >>>1f:ba 01 00 00 00 mov$0x1,%edx >>>24:48 89 e5 mov%rsp,%rbp >>>27:48 83 ec 10 sub$0x10,%rsp >>>2b:*f0 0f c1 57 08 lock xadd %edx,0x8(%rdi) >>> <-- trapping instruction >>>30:ff c2inc%edx >>>32:ff cadec%edx >>>34:7e 02jle0x38 >>>36:c9 leaveq >>>37:c3 retq >>>38:80 3d 10 ae 10 00 01 cmpb $0x1,0x10ae10(%rip)# >>> 0x10ae4f >>>3f:74 .byte 0x74 >>> >>> Code starting with the faulting instruction >>> === >>> 0:f0 0f c1 57 08 lock xadd %edx,0x8(%rdi) >>> 5:ff c2inc%edx >>> 7:ff cadec%edx >>> 9:7e 02jle0xd >>> b:c9 leaveq >>> c:c3 retq >>> d:80 3d 10 ae 10 00 01 cmpb $0x1,0x10ae10(%rip)# >>> 0x10ae24 >>>14:74 .byte 0x74 >>> >>> Mar 5 15:04:58 lutz kernel: [ 6995.192330] BUG: unable to handle >>> kernel NULL pointer dereference at 0008 >>> Mar 5 15:04:58 lutz kernel: [ 6995.192375] IP: [] >>> radeon_fence_ref+0x10/0x50 [radeon] >>> Mar 5 15:04:58 lutz kernel: [ 6995.192441] PGD 22a86a067 PUD >>> 22d8e8067 PMD 0 >>> Mar 5 15:04:58 lutz kernel: [ 6995.192463] Oops: 0002 [#1] PREEMPT SMP >>> Mar 5 15:04:58 lutz kernel: [ 6995.192484] Modules linked
Oops (NULL pointer dereference) in radeon_fence_ref in 3.14.63
Hi, On 05.03.2016 16:24, Christian König wrote: > just an educated guess, but I think the problem is simply that kernel > 3.14 doesn't yet contain the code so that radeon_fence_get() can safely > called with a NULL pointer. > > So the backport of Nicolai's patch needs and extra check for the case > when the fence is NULL. Oops indeed. Only the ref call should need the guard, the unref has always had a NULL pointer test as far as I can see. Lutz, could you please test whether the attached patch on top of 3.14.63 fixes the problem? Thanks, Nicolai > Regards, > Christian. > > Am 05.03.2016 um 18:16 schrieb Lutz Euler: >> Hi, >> >> after upgrading from kernel 3.14.62 to 3.14.63, while surfing, the >> screen suddenly got black and the mouse cursor froze. I had to reset >> the machine and found an oops followed by repeated messages >> "BUG: scheduling while atomic: Xorg/3757/0x0002" in the logs. >> I have copied the oops and the first of these messages below. >> >> This was repeatable: After the reboot, when the browser restored its >> tabs, the oops occurred again. I then rebooted into 3.14.62 and the >> problem didn't occur again. >> >> Just guessing: Of the commits regarding radeon between these two >> kernel versions might the following be involved >> >> Nicolai Hähnle (1): >>drm/radeon: hold reference to fences in radeon_sa_bo_new >> >> as it mentions fences and the stack trace starts with radeon_sa_bo_new? >> >> Thanks and Regards, >> >> Lutz >> >> From lspci -v: >> >> 05:00.0 VGA compatible controller: ATI Technologies Inc NI Caicos [AMD >> RADEON HD 6450] (prog-if 00 [VGA controller]) >> Subsystem: PC Partner Limited Device e164 >> Flags: bus master, fast devsel, latency 0, IRQ 53 >> Memory at d000 (64-bit, prefetchable) [size=256M] >> Memory at fe9e (64-bit, non-prefetchable) [size=128K] >> I/O ports at e000 [size=256] >> Expansion ROM at fe9c [disabled] [size=128K] >> Capabilities: [50] Power Management version 3 >> Capabilities: [58] Express Legacy Endpoint, MSI 00 >> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 >> Len=010 >> Capabilities: [150] Advanced Error Reporting >> Kernel driver in use: radeon >> Kernel modules: radeon >> >> X.Org version: 1.10.4 >> >> decodecode of the oops: >> >> Mar 5 15:04:58 lutz kernel: [ 6995.216776] Code: c7 c6 d8 3e 36 a0 31 >> c0 45 31 e4 e8 7b 70 15 e1 eb b5 66 0f 1f 84 00 00 00 00 00 55 48 89 >> f8 ba 01 00 00 00 48 89 e5 48 83 ec 10 0f c1 57 08 ff c2 ff ca 7e >> 02 c9 c3 80 3d 10 ae 10 00 01 74 >> All code >> >> 0:c7 c6 d8 3e 36 a0mov$0xa0363ed8,%esi >> 6:31 c0xor%eax,%eax >> 8:45 31 e4 xor%r12d,%r12d >> b:e8 7b 70 15 e1 callq 0xe115708b >>10:eb b5jmp0xffc7 >>12:66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) >>19:00 00 >>1b:55 push %rbp >>1c:48 89 f8 mov%rdi,%rax >>1f:ba 01 00 00 00 mov$0x1,%edx >>24:48 89 e5 mov%rsp,%rbp >>27:48 83 ec 10 sub$0x10,%rsp >>2b:*f0 0f c1 57 08 lock xadd %edx,0x8(%rdi) >> <-- trapping instruction >>30:ff c2inc%edx >>32:ff cadec%edx >>34:7e 02jle0x38 >>36:c9 leaveq >>37:c3 retq >>38:80 3d 10 ae 10 00 01 cmpb $0x1,0x10ae10(%rip)# >> 0x10ae4f >>3f:74 .byte 0x74 >> >> Code starting with the faulting instruction >> === >> 0:f0 0f c1 57 08 lock xadd %edx,0x8(%rdi) >> 5:ff c2inc%edx >> 7:ff cadec%edx >> 9:7e 02jle0xd >> b:c9 leaveq >> c:c3 retq >> d:80 3d 10 ae 10 00 01 cmpb $0x1,0x10ae10(%rip)# >> 0x10ae24 >>14:74 .byte 0x74 >> >> Mar 5 15:04:58 lutz kernel: [ 6995.192330] BUG: unable to handle >> kernel NULL pointer dereference at 0008 >> Mar 5 15:04:58 lutz kernel: [ 6995.192375] IP: [] >> radeon_fence_ref+0x10/0x50 [radeon] >> Mar 5 15:04:58 lutz kernel: [ 6995.192441] PGD 22a86a067 PUD >> 22d8e8067 PMD 0 >> Mar 5 15:04:58 lutz kernel: [ 6995.192463] Oops: 0002 [#1] PREEMPT SMP >> Mar 5 15:04:58 lutz kernel: [ 6995.192484] Modules linked in: >> binfmt_misc parport_pc ppdev snd_hda_codec_hdmi snd_opl3_synth >> snd_seq_midi_emul snd_hda_intel snd_hda_codec snd_es1938 gameport >> snd_pcm_oss snd_mixer_oss snd_seq_dummy snd_pcm snd_seq_oss >> snd_opl3_lib snd_hwdep snd_mp
Oops (NULL pointer dereference) in radeon_fence_ref in 3.14.63
Hi Lutz, just an educated guess, but I think the problem is simply that kernel 3.14 doesn't yet contain the code so that radeon_fence_get() can safely called with a NULL pointer. So the backport of Nicolai's patch needs and extra check for the case when the fence is NULL. Regards, Christian. Am 05.03.2016 um 18:16 schrieb Lutz Euler: > Hi, > > after upgrading from kernel 3.14.62 to 3.14.63, while surfing, the > screen suddenly got black and the mouse cursor froze. I had to reset > the machine and found an oops followed by repeated messages > "BUG: scheduling while atomic: Xorg/3757/0x0002" in the logs. > I have copied the oops and the first of these messages below. > > This was repeatable: After the reboot, when the browser restored its > tabs, the oops occurred again. I then rebooted into 3.14.62 and the > problem didn't occur again. > > Just guessing: Of the commits regarding radeon between these two > kernel versions might the following be involved > > Nicolai Hähnle (1): >drm/radeon: hold reference to fences in radeon_sa_bo_new > > as it mentions fences and the stack trace starts with radeon_sa_bo_new? > > Thanks and Regards, > > Lutz > > From lspci -v: > > 05:00.0 VGA compatible controller: ATI Technologies Inc NI Caicos [AMD RADEON > HD 6450] (prog-if 00 [VGA controller]) > Subsystem: PC Partner Limited Device e164 > Flags: bus master, fast devsel, latency 0, IRQ 53 > Memory at d000 (64-bit, prefetchable) [size=256M] > Memory at fe9e (64-bit, non-prefetchable) [size=128K] > I/O ports at e000 [size=256] > Expansion ROM at fe9c [disabled] [size=128K] > Capabilities: [50] Power Management version 3 > Capabilities: [58] Express Legacy Endpoint, MSI 00 > Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 > > Capabilities: [150] Advanced Error Reporting > Kernel driver in use: radeon > Kernel modules: radeon > > X.Org version: 1.10.4 > > decodecode of the oops: > > Mar 5 15:04:58 lutz kernel: [ 6995.216776] Code: c7 c6 d8 3e 36 a0 31 c0 45 > 31 e4 e8 7b 70 15 e1 eb b5 66 0f 1f 84 00 00 00 00 00 55 48 89 f8 ba 01 00 00 > 00 48 89 e5 48 83 ec 10 0f c1 57 08 ff c2 ff ca 7e 02 c9 c3 80 3d 10 ae > 10 00 01 74 > All code > > 0:c7 c6 d8 3e 36 a0 mov$0xa0363ed8,%esi > 6:31 c0 xor%eax,%eax > 8:45 31 e4xor%r12d,%r12d > b:e8 7b 70 15 e1 callq 0xe115708b >10:eb b5 jmp0xffc7 >12:66 0f 1f 84 00 00 00nopw 0x0(%rax,%rax,1) >19:00 00 >1b:55 push %rbp >1c:48 89 f8mov%rdi,%rax >1f:ba 01 00 00 00 mov$0x1,%edx >24:48 89 e5mov%rsp,%rbp >27:48 83 ec 10 sub$0x10,%rsp >2b:* f0 0f c1 57 08 lock xadd %edx,0x8(%rdi) > <-- trapping instruction >30:ff c2 inc%edx >32:ff ca dec%edx >34:7e 02 jle0x38 >36:c9 leaveq >37:c3 retq >38:80 3d 10 ae 10 00 01cmpb $0x1,0x10ae10(%rip)# > 0x10ae4f >3f:74 .byte 0x74 > > Code starting with the faulting instruction > === > 0:f0 0f c1 57 08 lock xadd %edx,0x8(%rdi) > 5:ff c2 inc%edx > 7:ff ca dec%edx > 9:7e 02 jle0xd > b:c9 leaveq > c:c3 retq > d:80 3d 10 ae 10 00 01cmpb $0x1,0x10ae10(%rip)# > 0x10ae24 >14:74 .byte 0x74 > > Mar 5 15:04:58 lutz kernel: [ 6995.192330] BUG: unable to handle kernel NULL > pointer dereference at 0008 > Mar 5 15:04:58 lutz kernel: [ 6995.192375] IP: [] > radeon_fence_ref+0x10/0x50 [radeon] > Mar 5 15:04:58 lutz kernel: [ 6995.192441] PGD 22a86a067 PUD 22d8e8067 PMD 0 > Mar 5 15:04:58 lutz kernel: [ 6995.192463] Oops: 0002 [#1] PREEMPT SMP > Mar 5 15:04:58 lutz kernel: [ 6995.192484] Modules linked in: binfmt_misc > parport_pc ppdev snd_hda_codec_hdmi snd_opl3_synth snd_seq_midi_emul > snd_hda_intel snd_hda_codec snd_es1938 gameport snd_pcm_oss snd_mixer_oss > snd_seq_dummy snd_pcm snd_seq_oss snd_opl3_lib snd_hwdep snd_mpu401_uart > snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq psmouse fbcon tileblit > edac_core font serio_raw i2c_piix4 bitblit softcursor radeon snd_seq_device > snd_timer hwmon_vid ttm drm_kms_helper asus_atk0110 drm i2c_algo_bit snd > soundcor
Oops (NULL pointer dereference) in radeon_fence_ref in 3.14.63
Hi, after upgrading from kernel 3.14.62 to 3.14.63, while surfing, the screen suddenly got black and the mouse cursor froze. I had to reset the machine and found an oops followed by repeated messages "BUG: scheduling while atomic: Xorg/3757/0x0002" in the logs. I have copied the oops and the first of these messages below. This was repeatable: After the reboot, when the browser restored its tabs, the oops occurred again. I then rebooted into 3.14.62 and the problem didn't occur again. Just guessing: Of the commits regarding radeon between these two kernel versions might the following be involved Nicolai Hähnle (1): drm/radeon: hold reference to fences in radeon_sa_bo_new as it mentions fences and the stack trace starts with radeon_sa_bo_new? Thanks and Regards, Lutz >From lspci -v: 05:00.0 VGA compatible controller: ATI Technologies Inc NI Caicos [AMD RADEON HD 6450] (prog-if 00 [VGA controller]) Subsystem: PC Partner Limited Device e164 Flags: bus master, fast devsel, latency 0, IRQ 53 Memory at d000 (64-bit, prefetchable) [size=256M] Memory at fe9e (64-bit, non-prefetchable) [size=128K] I/O ports at e000 [size=256] Expansion ROM at fe9c [disabled] [size=128K] Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 Capabilities: [150] Advanced Error Reporting Kernel driver in use: radeon Kernel modules: radeon X.Org version: 1.10.4 decodecode of the oops: Mar 5 15:04:58 lutz kernel: [ 6995.216776] Code: c7 c6 d8 3e 36 a0 31 c0 45 31 e4 e8 7b 70 15 e1 eb b5 66 0f 1f 84 00 00 00 00 00 55 48 89 f8 ba 01 00 00 00 48 89 e5 48 83 ec 10 0f c1 57 08 ff c2 ff ca 7e 02 c9 c3 80 3d 10 ae 10 00 01 74 All code 0: c7 c6 d8 3e 36 a0 mov$0xa0363ed8,%esi 6: 31 c0 xor%eax,%eax 8: 45 31 e4xor%r12d,%r12d b: e8 7b 70 15 e1 callq 0xe115708b 10: eb b5 jmp0xffc7 12: 66 0f 1f 84 00 00 00nopw 0x0(%rax,%rax,1) 19: 00 00 1b: 55 push %rbp 1c: 48 89 f8mov%rdi,%rax 1f: ba 01 00 00 00 mov$0x1,%edx 24: 48 89 e5mov%rsp,%rbp 27: 48 83 ec 10 sub$0x10,%rsp 2b:* f0 0f c1 57 08 lock xadd %edx,0x8(%rdi)<-- trapping instruction 30: ff c2 inc%edx 32: ff ca dec%edx 34: 7e 02 jle0x38 36: c9 leaveq 37: c3 retq 38: 80 3d 10 ae 10 00 01cmpb $0x1,0x10ae10(%rip)# 0x10ae4f 3f: 74 .byte 0x74 Code starting with the faulting instruction === 0: f0 0f c1 57 08 lock xadd %edx,0x8(%rdi) 5: ff c2 inc%edx 7: ff ca dec%edx 9: 7e 02 jle0xd b: c9 leaveq c: c3 retq d: 80 3d 10 ae 10 00 01cmpb $0x1,0x10ae10(%rip)# 0x10ae24 14: 74 .byte 0x74 Mar 5 15:04:58 lutz kernel: [ 6995.192330] BUG: unable to handle kernel NULL pointer dereference at 0008 Mar 5 15:04:58 lutz kernel: [ 6995.192375] IP: [] radeon_fence_ref+0x10/0x50 [radeon] Mar 5 15:04:58 lutz kernel: [ 6995.192441] PGD 22a86a067 PUD 22d8e8067 PMD 0 Mar 5 15:04:58 lutz kernel: [ 6995.192463] Oops: 0002 [#1] PREEMPT SMP Mar 5 15:04:58 lutz kernel: [ 6995.192484] Modules linked in: binfmt_misc parport_pc ppdev snd_hda_codec_hdmi snd_opl3_synth snd_seq_midi_emul snd_hda_intel snd_hda_codec snd_es1938 gameport snd_pcm_oss snd_mixer_oss snd_seq_dummy snd_pcm snd_seq_oss snd_opl3_lib snd_hwdep snd_mpu401_uart snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq psmouse fbcon tileblit edac_core font serio_raw i2c_piix4 bitblit softcursor radeon snd_seq_device snd_timer hwmon_vid ttm drm_kms_helper asus_atk0110 drm i2c_algo_bit snd soundcore lp parport usbhid btrfs raid6_pq zlib_deflate xor r8169 mii xhci_hcd libcrc32c Mar 5 15:04:58 lutz kernel: [ 6995.192741] CPU: 5 PID: 3757 Comm: Xorg Not tainted 3.14.63 #1 Mar 5 15:04:58 lutz kernel: [ 6995.192765] Hardware name: System manufacturer System Product Name/M4A87TD/USB3, BIOS 120202/17/2011 Mar 5 15:04:58 lutz kernel: [ 6995.192803] task: 88022c1d5f80 ti: 8800c42a6000 task.ti: 8800c42a6000 Mar 5 15:04:58 lutz kernel: [ 6995.192833] RIP: 0010:[] [] radeon_fence_ref+0x10/0x50 [radeon] Mar 5 15:04:58 lutz kernel: [ 6995.192881] RSP: 0018:8800c42a7a68 EFLAGS: 00010282 Mar 5 15:04:58 lutz kernel: [ 6995.192903] RAX: