Re: [Intel-gfx] kernel 5.5.4: BUG: kernel NULL pointer dereference, address: 000000000000000

2020-02-18 Thread Chris Wilson
Quoting Hillf Danton (2020-02-17 02:30:13)
> 
> On Sun, 16 Feb 2020 22:17:59 +0100 Toralf Foerster wrote:
> >
> > This is similar to the behaviour before, the BUG occurres after few 
> > minutes/hours.
> > It brought now:
> > 
> > 
> > Feb 16 22:09:01 t44 CROND[8918]: (root) CMD (/usr/lib/sa/sa1 30 2 -S XALL)
> > Feb 16 22:10:01 t44 CROND[8980]: (root) CMD (/usr/lib/sa/sa1 30 2 -S XALL)
> > Feb 16 22:10:37 t44 kernel: BUG: kernel NULL pointer dereference, address: 
> > 
> > Feb 16 22:10:37 t44 kernel: #PF: supervisor instruction fetch in kernel mode
> > Feb 16 22:10:37 t44 kernel: #PF: error_code(0x0010) - not-present page
> > Feb 16 22:10:37 t44 kernel: PGD 0 P4D 0 
> > Feb 16 22:10:37 t44 kernel: Oops: 0010 [#1] SMP PTI
> > Feb 16 22:10:37 t44 kernel: CPU: 1 PID: 3403 Comm: X Tainted: G 
> >T 5.5.4 #3
> > Feb 16 22:10:37 t44 kernel: Hardware name: LENOVO 20AQCTO1WW/20AQCTO1WW, 
> > BIOS GJET92WW (2.42 ) 03/03/2017
> > Feb 16 22:10:37 t44 kernel: RIP: 0010:0x0
> > Feb 16 22:10:37 t44 kernel: Code: Bad RIP value.
> > Feb 16 22:10:37 t44 kernel: RSP: 0018:ad37009eba20 EFLAGS: 00010087
> > Feb 16 22:10:37 t44 kernel: RAX:  RBX:  
> > RCX: 000e68b0
> > Feb 16 22:10:37 t44 kernel: RDX:  RSI: 8b35598cba88 
> > RDI: 8b362d9146c0
> > Feb 16 22:10:37 t44 kernel: RBP: 8b362d9146c0 R08:  
> > R09: 8b35598cbe00
> > Feb 16 22:10:37 t44 kernel: R10: 0002 R11: 0005 
> > R12: ad37009eba28
> > Feb 16 22:10:37 t44 kernel: R13:  R14: 8b36a40fa200 
> > R15: 8b369bf99600
> > Feb 16 22:10:37 t44 kernel: FS:  7f2b751398c0() 
> > GS:8b36b268() knlGS:
> > Feb 16 22:10:37 t44 kernel: CS:  0010 DS:  ES:  CR0: 
> > 80050033
> > Feb 16 22:10:37 t44 kernel: CR2: ffd6 CR3: 000323292001 
> > CR4: 001606e0
> > Feb 16 22:10:37 t44 kernel: Call Trace:
> > Feb 16 22:10:37 t44 kernel:  dma_fence_signal_locked+0x85/0xc0
> > Feb 16 22:10:37 t44 kernel:  dma_fence_signal+0x1f/0x40
> > Feb 16 22:10:37 t44 kernel:  i915_request_retire+0x9a/0x290 [i915]
> > Feb 16 22:10:37 t44 kernel:  i915_request_create+0x3f/0xc0 [i915]
> > Feb 16 22:10:37 t44 kernel:  i915_gem_do_execbuffer+0x973/0x17d0 [i915]
> > Feb 16 22:10:37 t44 kernel:  i915_gem_execbuffer2_ioctl+0xe9/0x3a0 [i915]
> > Feb 16 22:10:37 t44 kernel:  ? i915_gem_execbuffer_ioctl+0x2c0/0x2c0 [i915]
> > Feb 16 22:10:37 t44 kernel:  drm_ioctl_kernel+0xae/0x100 [drm]
> > Feb 16 22:10:37 t44 kernel:  drm_ioctl+0x223/0x400 [drm]
> > Feb 16 22:10:37 t44 kernel:  ? i915_gem_execbuffer_ioctl+0x2c0/0x2c0 [i915]
> > Feb 16 22:10:37 t44 kernel:  do_vfs_ioctl+0x4d4/0x760
> > Feb 16 22:10:37 t44 kernel:  ksys_ioctl+0x5b/0x90
> > Feb 16 22:10:37 t44 kernel:  __x64_sys_ioctl+0x15/0x20
> > Feb 16 22:10:37 t44 kernel:  do_syscall_64+0x46/0x100
> > Feb 16 22:10:37 t44 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > Feb 16 22:10:37 t44 kernel: RIP: 0033:0x7f2b75372137
> > Feb 16 22:10:37 t44 kernel: Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 
> > c4 18 c3 e8 2d d4 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 
> > 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 ed 0c 00 f7 d8 64 89 01 48
> > Feb 16 22:10:37 t44 kernel: RSP: 002b:7ffebe2b4c38 EFLAGS: 0246 
> > ORIG_RAX: 0010
> > Feb 16 22:10:37 t44 kernel: RAX: ffda RBX: 7ffebe2b4c80 
> > RCX: 7f2b75372137
> > Feb 16 22:10:37 t44 kernel: RDX: 7ffebe2b4c80 RSI: 40406469 
> > RDI: 000d
> > Feb 16 22:10:37 t44 kernel: RBP: 40406469 R08: 561477eb8670 
> > R09: 0202
> > Feb 16 22:10:37 t44 kernel: R10:  R11: 0246 
> > R12: 561477e7b0b0
> > Feb 16 22:10:37 t44 kernel: R13: 000d R14: 7f2b74b51c48 
> > R15: 
> > Feb 16 22:10:37 t44 kernel: Modules linked in: af_packet bridge stp llc 
> > ip6table_filter ip6_tables xt_MASQUERADE iptable_nat nf_nat nf_log_ipv4 
> > nf_log_common xt_LOG xt_limit xt_recent xt_conntrack nf_conntrack 
> > nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables uvcvideo 
> > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common 
> > btusb btrtl btbcm btintel bluetooth ecdh_generic ecc rmi_smbus rmi_core 
> > mousedev x86_pkg_temp_thermal coretemp kvm_intel kvm i915 irqbypass 
> > intel_gtt i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea input_leds 
> > snd_hda_codec_realtek snd_hda_codec_generic cfbimgblt sysfillrect sysimgblt 
> > fb_sys_fops cfbcopyarea wmi_bmof snd_hda_intel snd_intel_dspcfg drm 
> > snd_hda_codec tpm_tis psmouse aesni_intel snd_hda_core glue_helper 
> > crypto_simd iwlmvm cryptd snd_pcm thinkpad_acpi ledtrig_audio tpm_tis_core 
> > iwlwifi pcspkr drm_panel_orientation_quirks ehci_pci atkbd e1000e i2c_i801 
> > ehci_hcd tpm thermal snd_timer ac snd soundcore battery rng_core agpga

Re: [Intel-gfx] kernel 5.5.4: BUG: kernel NULL pointer dereference, address: 000000000000000

2020-02-18 Thread Hillf Danton


On 2020-02-15 16:20 UTC Toralf Foerster wrote:
> Since 5.5.1 I do experience hangs under a hardend Gerntoo Linux + KDE, 
> neither mouse nor keyboard are then working anymore, power off is the only 
> one.
> The syslog tells:
> 
> 
> Feb 15 12:56:31 t44 kernel: BUG: kernel NULL pointer dereference, address: 
> 
> Feb 15 12:56:31 t44 kernel: #PF: supervisor instruction fetch in kernel mode
> Feb 15 12:56:31 t44 kernel: #PF: error_code(0x0010) - not-present page
> Feb 15 12:56:31 t44 kernel: PGD 0 P4D 0 
> Feb 15 12:56:31 t44 kernel: Oops: 0010 [#1] SMP PTI
> Feb 15 12:56:31 t44 kernel: CPU: 0 PID: 3401 Comm: X Tainted: G   
>  T 5.5.4 #2
> Feb 15 12:56:31 t44 kernel: Hardware name: LENOVO 20AQCTO1WW/20AQCTO1WW, BIOS 
> GJET92WW (2.42 ) 03/03/2017
> Feb 15 12:56:31 t44 kernel: RIP: 0010:0x0
> Feb 15 12:56:31 t44 kernel: Code: Bad RIP value.
> Feb 15 12:56:31 t44 kernel: RSP: 0018:9d8780917a40 EFLAGS: 00010087
> Feb 15 12:56:31 t44 kernel: RAX:  RBX:  RCX: 
> 000919dd
> Feb 15 12:56:31 t44 kernel: RDX:  RSI: 8b13d4024b08 RDI: 
> 8b149d88a400
> Feb 15 12:56:31 t44 kernel: RBP: 8b149d88a400 R08:  R09: 
> 8b13d4024100
> Feb 15 12:56:31 t44 kernel: R10: 0002 R11: 0005 R12: 
> 9d8780917a48
> Feb 15 12:56:31 t44 kernel: R13:  R14: 8b14aa17ae00 R15: 
> 8b14a39a02c0
> Feb 15 12:56:31 t44 kernel: FS:  7f8c162148c0() 
> GS:8b14b260() knlGS:
> Feb 15 12:56:31 t44 kernel: CS:  0010 DS:  ES:  CR0: 80050033
> Feb 15 12:56:31 t44 kernel: CR2: ffd6 CR3: 000323998005 CR4: 
> 001606f0
> Feb 15 12:56:31 t44 kernel: Call Trace:
> Feb 15 12:56:31 t44 kernel:  dma_fence_signal_locked+0x85/0xc0
> Feb 15 12:56:31 t44 kernel:  i915_request_retire+0x259/0x2a0 [i915]
> Feb 15 12:56:31 t44 kernel:  i915_request_create+0x3f/0xc0 [i915]
> Feb 15 12:56:31 t44 kernel:  i915_gem_do_execbuffer+0x973/0x17d0 [i915]
> Feb 15 12:56:31 t44 kernel:  i915_gem_execbuffer2_ioctl+0xe9/0x3a0 [i915]
> Feb 15 12:56:31 t44 kernel:  ? i915_gem_execbuffer_ioctl+0x2c0/0x2c0 [i915]
> Feb 15 12:56:31 t44 kernel:  drm_ioctl_kernel+0xae/0x100 [drm]
> Feb 15 12:56:31 t44 kernel:  drm_ioctl+0x223/0x400 [drm]
> Feb 15 12:56:31 t44 kernel:  ? i915_gem_execbuffer_ioctl+0x2c0/0x2c0 [i915]
> Feb 15 12:56:31 t44 kernel:  do_vfs_ioctl+0x4d4/0x760
> Feb 15 12:56:31 t44 kernel:  ksys_ioctl+0x5b/0x90
> Feb 15 12:56:31 t44 kernel:  __x64_sys_ioctl+0x15/0x20
> Feb 15 12:56:31 t44 kernel:  do_syscall_64+0x46/0x100
> Feb 15 12:56:31 t44 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Feb 15 12:56:31 t44 kernel: RIP: 0033:0x7f8c1644d137
> Feb 15 12:56:31 t44 kernel: Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 
> c4 18 c3 e8 2d d4 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 
> 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 ed 0c 00 f7 d8 64 89 01 48
> Feb 15 12:56:31 t44 kernel: RSP: 002b:7ffc2e8fabc8 EFLAGS: 0246 
> ORIG_RAX: 0010
> Feb 15 12:56:31 t44 kernel: RAX: ffda RBX: 7ffc2e8fac10 RCX: 
> 7f8c1644d137
> Feb 15 12:56:31 t44 kernel: RDX: 7ffc2e8fac10 RSI: 40406469 RDI: 
> 000d
> Feb 15 12:56:31 t44 kernel: RBP: 40406469 R08: 561136d07680 R09: 
> 0202
> Feb 15 12:56:31 t44 kernel: R10:  R11: 0246 R12: 
> 561136cca130
> Feb 15 12:56:31 t44 kernel: R13: 000d R14: 7f8c15c2cc48 R15: 
> 
> Feb 15 12:56:31 t44 kernel: Modules linked in: af_packet bridge stp llc 
> ip6table_filter ip6_tables xt_MASQUERADE iptable_nat nf_nat nf_log_ipv4 
> nf_log_common xt_LOG xt_limit xt_recent xt_conntrack nf_conntrack 
> nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables uvcvideo 
> videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common 
> btusb btrtl btbcm btintel bluetooth ecdh_generic ecc rmi_smbus rmi_core 
> mousedev x86_pkg_temp_thermal coretemp i915 kvm_intel kvm irqbypass intel_gtt 
> snd_hda_codec_realtek snd_hda_codec_generic i2c_algo_bit input_leds 
> drm_kms_helper snd_hda_intel wmi_bmof snd_intel_dspcfg cfbfillrect iwlmvm 
> psmouse syscopyarea cfbimgblt aesni_intel glue_helper crypto_simd pcspkr 
> snd_hda_codec atkbd sysfillrect cryptd ehci_pci iwlwifi ehci_hcd sysimgblt 
> fb_sys_fops e1000e cfbcopyarea thinkpad_acpi snd_hda_core i2c_i801 drm 
> snd_pcm ac battery ledtrig_audio tpm_tis tpm_tis_core 
> drm_panel_orientation_quirks snd_timer tpm rng_core agpgart snd i2c_core wmi 
> soun
 dcore thermal evdev
> Feb 15 12:56:31 t44 kernel: CR2: 
> Feb 15 12:56:31 t44 kernel: ---[ end trace 0efcb8355216bb62 ]---
> Feb 15 12:56:31 t44 kernel: RIP: 0010:0x0
> Feb 15 12:56:31 t44 kernel: Code: Bad RIP value.
> Feb 15 12:56:31 t44 kernel: RSP: 0018:9d8780917a40 EFLAGS: 00010087
> Feb 15 12:56:31 t44 kernel: RAX:  

Re: [Intel-gfx] kernel 5.5.4: BUG: kernel NULL pointer dereference, address: 000000000000000

2020-02-18 Thread Hillf Danton


On Sun, 16 Feb 2020 22:17:59 +0100 Toralf Foerster wrote:
>
> This is similar to the behaviour before, the BUG occurres after few 
> minutes/hours.
> It brought now:
> 
> 
> Feb 16 22:09:01 t44 CROND[8918]: (root) CMD (/usr/lib/sa/sa1 30 2 -S XALL)
> Feb 16 22:10:01 t44 CROND[8980]: (root) CMD (/usr/lib/sa/sa1 30 2 -S XALL)
> Feb 16 22:10:37 t44 kernel: BUG: kernel NULL pointer dereference, address: 
> 
> Feb 16 22:10:37 t44 kernel: #PF: supervisor instruction fetch in kernel mode
> Feb 16 22:10:37 t44 kernel: #PF: error_code(0x0010) - not-present page
> Feb 16 22:10:37 t44 kernel: PGD 0 P4D 0 
> Feb 16 22:10:37 t44 kernel: Oops: 0010 [#1] SMP PTI
> Feb 16 22:10:37 t44 kernel: CPU: 1 PID: 3403 Comm: X Tainted: G   
>  T 5.5.4 #3
> Feb 16 22:10:37 t44 kernel: Hardware name: LENOVO 20AQCTO1WW/20AQCTO1WW, BIOS 
> GJET92WW (2.42 ) 03/03/2017
> Feb 16 22:10:37 t44 kernel: RIP: 0010:0x0
> Feb 16 22:10:37 t44 kernel: Code: Bad RIP value.
> Feb 16 22:10:37 t44 kernel: RSP: 0018:ad37009eba20 EFLAGS: 00010087
> Feb 16 22:10:37 t44 kernel: RAX:  RBX:  RCX: 
> 000e68b0
> Feb 16 22:10:37 t44 kernel: RDX:  RSI: 8b35598cba88 RDI: 
> 8b362d9146c0
> Feb 16 22:10:37 t44 kernel: RBP: 8b362d9146c0 R08:  R09: 
> 8b35598cbe00
> Feb 16 22:10:37 t44 kernel: R10: 0002 R11: 0005 R12: 
> ad37009eba28
> Feb 16 22:10:37 t44 kernel: R13:  R14: 8b36a40fa200 R15: 
> 8b369bf99600
> Feb 16 22:10:37 t44 kernel: FS:  7f2b751398c0() 
> GS:8b36b268() knlGS:
> Feb 16 22:10:37 t44 kernel: CS:  0010 DS:  ES:  CR0: 80050033
> Feb 16 22:10:37 t44 kernel: CR2: ffd6 CR3: 000323292001 CR4: 
> 001606e0
> Feb 16 22:10:37 t44 kernel: Call Trace:
> Feb 16 22:10:37 t44 kernel:  dma_fence_signal_locked+0x85/0xc0
> Feb 16 22:10:37 t44 kernel:  dma_fence_signal+0x1f/0x40
> Feb 16 22:10:37 t44 kernel:  i915_request_retire+0x9a/0x290 [i915]
> Feb 16 22:10:37 t44 kernel:  i915_request_create+0x3f/0xc0 [i915]
> Feb 16 22:10:37 t44 kernel:  i915_gem_do_execbuffer+0x973/0x17d0 [i915]
> Feb 16 22:10:37 t44 kernel:  i915_gem_execbuffer2_ioctl+0xe9/0x3a0 [i915]
> Feb 16 22:10:37 t44 kernel:  ? i915_gem_execbuffer_ioctl+0x2c0/0x2c0 [i915]
> Feb 16 22:10:37 t44 kernel:  drm_ioctl_kernel+0xae/0x100 [drm]
> Feb 16 22:10:37 t44 kernel:  drm_ioctl+0x223/0x400 [drm]
> Feb 16 22:10:37 t44 kernel:  ? i915_gem_execbuffer_ioctl+0x2c0/0x2c0 [i915]
> Feb 16 22:10:37 t44 kernel:  do_vfs_ioctl+0x4d4/0x760
> Feb 16 22:10:37 t44 kernel:  ksys_ioctl+0x5b/0x90
> Feb 16 22:10:37 t44 kernel:  __x64_sys_ioctl+0x15/0x20
> Feb 16 22:10:37 t44 kernel:  do_syscall_64+0x46/0x100
> Feb 16 22:10:37 t44 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Feb 16 22:10:37 t44 kernel: RIP: 0033:0x7f2b75372137
> Feb 16 22:10:37 t44 kernel: Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 
> c4 18 c3 e8 2d d4 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 
> 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 ed 0c 00 f7 d8 64 89 01 48
> Feb 16 22:10:37 t44 kernel: RSP: 002b:7ffebe2b4c38 EFLAGS: 0246 
> ORIG_RAX: 0010
> Feb 16 22:10:37 t44 kernel: RAX: ffda RBX: 7ffebe2b4c80 RCX: 
> 7f2b75372137
> Feb 16 22:10:37 t44 kernel: RDX: 7ffebe2b4c80 RSI: 40406469 RDI: 
> 000d
> Feb 16 22:10:37 t44 kernel: RBP: 40406469 R08: 561477eb8670 R09: 
> 0202
> Feb 16 22:10:37 t44 kernel: R10:  R11: 0246 R12: 
> 561477e7b0b0
> Feb 16 22:10:37 t44 kernel: R13: 000d R14: 7f2b74b51c48 R15: 
> 
> Feb 16 22:10:37 t44 kernel: Modules linked in: af_packet bridge stp llc 
> ip6table_filter ip6_tables xt_MASQUERADE iptable_nat nf_nat nf_log_ipv4 
> nf_log_common xt_LOG xt_limit xt_recent xt_conntrack nf_conntrack 
> nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables uvcvideo 
> videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common 
> btusb btrtl btbcm btintel bluetooth ecdh_generic ecc rmi_smbus rmi_core 
> mousedev x86_pkg_temp_thermal coretemp kvm_intel kvm i915 irqbypass intel_gtt 
> i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea input_leds 
> snd_hda_codec_realtek snd_hda_codec_generic cfbimgblt sysfillrect sysimgblt 
> fb_sys_fops cfbcopyarea wmi_bmof snd_hda_intel snd_intel_dspcfg drm 
> snd_hda_codec tpm_tis psmouse aesni_intel snd_hda_core glue_helper 
> crypto_simd iwlmvm cryptd snd_pcm thinkpad_acpi ledtrig_audio tpm_tis_core 
> iwlwifi pcspkr drm_panel_orientation_quirks ehci_pci atkbd e1000e i2c_i801 
> ehci_hcd tpm thermal snd_timer ac snd soundcore battery rng_core agpgart
  i2c_core wmi evdev
> Feb 16 22:10:37 t44 kernel: CR2: 
> Feb 16 22:10:37 t44 kernel: ---[ end trace 7df1d4246cb74d36 ]---
> Feb 16 22:10:37 t44 kernel: RIP: 0010:0x0
> Feb 16 22:10:37 t44 kernel:

Re: [Intel-gfx] kernel 5.5.4: BUG: kernel NULL pointer dereference, address: 000000000000000

2020-02-18 Thread Hillf Danton


On Sun, 16 Feb 2020 11:33:02 +0100 Toralf Foerster wrote:
> On 2/16/20 4:26 AM, Hillf Danton wrote:
> > Looks like a stray lock counts for the above NULL dereference.
> >
> Hi, the patch applied on top of 5.5.4 breaks the internal display now even
>  in the boot phase.

My bad.

Then try to do fence signaling before taking request's lock as we'll
take the fence specific lock if we're in the right direction.

--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -252,10 +252,10 @@ bool i915_request_retire(struct i915_req
 */
remove_from_engine(rq);
 
+   dma_fence_signal(&rq->fence);
+
spin_lock_irq(&rq->lock);
i915_request_mark_complete(rq);
-   if (!i915_request_signaled(rq))
-   dma_fence_signal_locked(&rq->fence);
if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
i915_request_cancel_breadcrumb(rq);
if (i915_request_has_waitboost(rq)) {

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] kernel 5.5.4: BUG: kernel NULL pointer dereference, address: 000000000000000

2020-02-16 Thread Toralf Förster
On 2/16/20 3:55 PM, Hillf Danton wrote:
> 
> On Sun, 16 Feb 2020 11:33:02 +0100 Toralf Foerster wrote:
>> On 2/16/20 4:26 AM, Hillf Danton wrote:
>>> Looks like a stray lock counts for the above NULL dereference.
>>>
>> Hi, the patch applied on top of 5.5.4 breaks the internal display now even
>>  in the boot phase.
> 
> My bad.
> 
> Then try to do fence signaling before taking request's lock as we'll
> take the fence specific lock if we're in the right direction.
> 
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -252,10 +252,10 @@ bool i915_request_retire(struct i915_req
>*/
>   remove_from_engine(rq);
>  
> + dma_fence_signal(&rq->fence);
> +
>   spin_lock_irq(&rq->lock);
>   i915_request_mark_complete(rq);
> - if (!i915_request_signaled(rq))
> - dma_fence_signal_locked(&rq->fence);
>   if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
>   i915_request_cancel_breadcrumb(rq);
>   if (i915_request_has_waitboost(rq)) {
> 

This is similar to the behaviour before, the BUG occurres after few 
minutes/hours.
It brought now:


Feb 16 22:09:01 t44 CROND[8918]: (root) CMD (/usr/lib/sa/sa1 30 2 -S XALL)
Feb 16 22:10:01 t44 CROND[8980]: (root) CMD (/usr/lib/sa/sa1 30 2 -S XALL)
Feb 16 22:10:37 t44 kernel: BUG: kernel NULL pointer dereference, address: 

Feb 16 22:10:37 t44 kernel: #PF: supervisor instruction fetch in kernel mode
Feb 16 22:10:37 t44 kernel: #PF: error_code(0x0010) - not-present page
Feb 16 22:10:37 t44 kernel: PGD 0 P4D 0 
Feb 16 22:10:37 t44 kernel: Oops: 0010 [#1] SMP PTI
Feb 16 22:10:37 t44 kernel: CPU: 1 PID: 3403 Comm: X Tainted: G
T 5.5.4 #3
Feb 16 22:10:37 t44 kernel: Hardware name: LENOVO 20AQCTO1WW/20AQCTO1WW, BIOS 
GJET92WW (2.42 ) 03/03/2017
Feb 16 22:10:37 t44 kernel: RIP: 0010:0x0
Feb 16 22:10:37 t44 kernel: Code: Bad RIP value.
Feb 16 22:10:37 t44 kernel: RSP: 0018:ad37009eba20 EFLAGS: 00010087
Feb 16 22:10:37 t44 kernel: RAX:  RBX:  RCX: 
000e68b0
Feb 16 22:10:37 t44 kernel: RDX:  RSI: 8b35598cba88 RDI: 
8b362d9146c0
Feb 16 22:10:37 t44 kernel: RBP: 8b362d9146c0 R08:  R09: 
8b35598cbe00
Feb 16 22:10:37 t44 kernel: R10: 0002 R11: 0005 R12: 
ad37009eba28
Feb 16 22:10:37 t44 kernel: R13:  R14: 8b36a40fa200 R15: 
8b369bf99600
Feb 16 22:10:37 t44 kernel: FS:  7f2b751398c0() 
GS:8b36b268() knlGS:
Feb 16 22:10:37 t44 kernel: CS:  0010 DS:  ES:  CR0: 80050033
Feb 16 22:10:37 t44 kernel: CR2: ffd6 CR3: 000323292001 CR4: 
001606e0
Feb 16 22:10:37 t44 kernel: Call Trace:
Feb 16 22:10:37 t44 kernel:  dma_fence_signal_locked+0x85/0xc0
Feb 16 22:10:37 t44 kernel:  dma_fence_signal+0x1f/0x40
Feb 16 22:10:37 t44 kernel:  i915_request_retire+0x9a/0x290 [i915]
Feb 16 22:10:37 t44 kernel:  i915_request_create+0x3f/0xc0 [i915]
Feb 16 22:10:37 t44 kernel:  i915_gem_do_execbuffer+0x973/0x17d0 [i915]
Feb 16 22:10:37 t44 kernel:  i915_gem_execbuffer2_ioctl+0xe9/0x3a0 [i915]
Feb 16 22:10:37 t44 kernel:  ? i915_gem_execbuffer_ioctl+0x2c0/0x2c0 [i915]
Feb 16 22:10:37 t44 kernel:  drm_ioctl_kernel+0xae/0x100 [drm]
Feb 16 22:10:37 t44 kernel:  drm_ioctl+0x223/0x400 [drm]
Feb 16 22:10:37 t44 kernel:  ? i915_gem_execbuffer_ioctl+0x2c0/0x2c0 [i915]
Feb 16 22:10:37 t44 kernel:  do_vfs_ioctl+0x4d4/0x760
Feb 16 22:10:37 t44 kernel:  ksys_ioctl+0x5b/0x90
Feb 16 22:10:37 t44 kernel:  __x64_sys_ioctl+0x15/0x20
Feb 16 22:10:37 t44 kernel:  do_syscall_64+0x46/0x100
Feb 16 22:10:37 t44 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 16 22:10:37 t44 kernel: RIP: 0033:0x7f2b75372137
Feb 16 22:10:37 t44 kernel: Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 
18 c3 e8 2d d4 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 19 ed 0c 00 f7 d8 64 89 01 48
Feb 16 22:10:37 t44 kernel: RSP: 002b:7ffebe2b4c38 EFLAGS: 0246 
ORIG_RAX: 0010
Feb 16 22:10:37 t44 kernel: RAX: ffda RBX: 7ffebe2b4c80 RCX: 
7f2b75372137
Feb 16 22:10:37 t44 kernel: RDX: 7ffebe2b4c80 RSI: 40406469 RDI: 
000d
Feb 16 22:10:37 t44 kernel: RBP: 40406469 R08: 561477eb8670 R09: 
0202
Feb 16 22:10:37 t44 kernel: R10:  R11: 0246 R12: 
561477e7b0b0
Feb 16 22:10:37 t44 kernel: R13: 000d R14: 7f2b74b51c48 R15: 

Feb 16 22:10:37 t44 kernel: Modules linked in: af_packet bridge stp llc 
ip6table_filter ip6_tables xt_MASQUERADE iptable_nat nf_nat nf_log_ipv4 
nf_log_common xt_LOG xt_limit xt_recent xt_conntrack nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables uvcvideo 
videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common 
btusb btrtl btbcm btintel bluetoot

Re: [Intel-gfx] kernel 5.5.4: BUG: kernel NULL pointer dereference, address: 000000000000000

2020-02-16 Thread Toralf Förster
On 2/16/20 4:26 AM, Hillf Danton wrote:
> Looks like a stray lock counts for the above NULL dereference.
Hi, the patch applied on top of 5.5.4 breaks the internal display now even in 
the boot phase.
Gert just a black screen after few seconds, nothing in the logs except:

Feb 16 11:21:57 t44 kernel: elogind-daemon[1431]: Removed session c15.
Feb 16 11:21:57 t44 start-stop-daemon[6462]: Will stop PID 1431
Feb 16 11:21:57 t44 start-stop-daemon[6462]: Sending signal 15 to PID 1431
Feb 16 11:21:57 t44 kernel: elogind-daemon[1431]: Received signal 15 [TERM]
Feb 16 11:21:57 t44 kernel: elogind-daemon[1431]: segfault at 56264c00 ip 
7fddfcf76882 sp 7ffc98c721b0 error 4 in 
libc-2.29.so[7fddfcf0c000+15a000]
Feb 16 11:21:57 t44 kernel: Code: a8 02 75 4c 48 8b 15 05 e5 13 00 64 48 83 3a 
00 0f 84 f2 00 00 00 48 8d 3d 2b f2 13 00 a8 04 74 0c 48 89 f0 48 25 00 00 00 
fc <48> 8b 38 48 8b 44 24 18 64 48 33 04 25 28 00
00 00 0f 85 f8 00 00
Feb 16 11:21:57 t44 start-stop-daemon[6549]: Will stop /usr/sbin/dnsmasq
Feb 16 11:21:57 t44 start-stop-daemon[6549]: Will stop PID 2764
Feb 16 11:21:57 t44 start-stop-daemon[6549]: Sending signal 15 to PID 2764


>
> Btw, send pure text message please.

Ick, I do sned plain text to LKML, or?

>
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -254,8 +254,7 @@ bool i915_request_retire(struct i915_req
>
>   spin_lock_irq(&rq->lock);
>   i915_request_mark_complete(rq);
> - if (!i915_request_signaled(rq))
> - dma_fence_signal_locked(&rq->fence);
> + dma_fence_signal(&rq->fence);
>   if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
>   i915_request_cancel_breadcrumb(rq);
>   if (i915_request_has_waitboost(rq)) {
>


--
Toralf
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx