Re: imxdrm issue on SABRE Lite
Hi Phillipp On Mon, Feb 20, 2017 at 3:42 PM, Philipp Zabelwrote: > Hi Dan, > > On Sat, 2017-02-11 at 21:09 +, Dan MacDonald wrote: >> Hi Phillipp >> >> I'm having trouble getting xf86-video-armada working properly on a >> Element 14 / Embest SABRE Lite board running Arch Linux with kernel >> 4.9.8. I have been in touch with RMK and he's confident the crash >> below is caused by a kernel issue relating to imxdrm and a null >> pointer dereference. > > I can't reproduce this on v4.9.8 with a Nitrogen6X with 1280x800 LDVS > panel and 1920x1080 HDMI monitor using modetest from libdrm. Does this > only happen with X? I have read that there are supposedly differences between the Element 14 SL and the Boundary Devices board of the same name but I have yet to find out exactly what they are as the specs are the same on paper. Saying that, all of the uboot and OS images for the BD SL that I've tried have worked without issue, including the GLES demos on the BD Ubuntu 16.04 image. I've never used wayland. The console works fine with my 4K display and I have been able to get X running at 720x480 under Arch with armada but no other resolution. > Do you have framebuffer console emulation enabled? Not sure - will need to check if thats enable by default under ALARM? I've not manually enabled it. > Does this still happen if you disable the LVDS panel in the device tree? Russell sent me a different dtb file which seemed to fix some of my kernel probs. Russell will be able to explain that better than me. > >> The first indiction of X/armada not being happy it that it only starts >> in 720x480 when connected to a 4K HDMI display, and when I try >> running: >> >> xrandr --output HDMI1 --mode 1920x1080 >> >> X crashes. >> >> Here's the serial console output for X loading and then crashing after >> I try running that xrandr command: >> >> Arch Linux 4.9.8-1-ARCH (ttymxc1) >> >> alarm login: [ 19.197477] fec 2188000.ethernet eth0: Link is Up - >> 1Gbps/Full - flow control rx/tx >> [ 19.205165] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >> [ 33.08] vgaarb: this pci device is not a vga device >> [ 33.203520] vgaarb: this pci device is not a vga device >> [ 33.447051] imx-ipuv3 240.ipu: DC stop timeout after 50 ms > > Does the "gpu: ipu-v3: remove IRQ dance on DC channel disable" patch > (https://patchwork.kernel.org/patch/9417683) do anything to improve the > situation, besides removing the above warning? After trying Russells dtb, he asked me to try booting with: setenv bootargs cma=256M Then X failed to start with the error: [ 57.705889] imx-drm display-subsystem: failed to allocate buffer with size 33177600 Do you think that patch could fix such an issue? Thanks ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: imxdrm issue on SABRE Lite
Hi Dan, On Sat, 2017-02-11 at 21:09 +, Dan MacDonald wrote: > Hi Phillipp > > I'm having trouble getting xf86-video-armada working properly on a > Element 14 / Embest SABRE Lite board running Arch Linux with kernel > 4.9.8. I have been in touch with RMK and he's confident the crash > below is caused by a kernel issue relating to imxdrm and a null > pointer dereference. I can't reproduce this on v4.9.8 with a Nitrogen6X with 1280x800 LDVS panel and 1920x1080 HDMI monitor using modetest from libdrm. Does this only happen with X? Do you have framebuffer console emulation enabled? Does this still happen if you disable the LVDS panel in the device tree? > The first indiction of X/armada not being happy it that it only starts > in 720x480 when connected to a 4K HDMI display, and when I try > running: > > xrandr --output HDMI1 --mode 1920x1080 > > X crashes. > > Here's the serial console output for X loading and then crashing after > I try running that xrandr command: > > Arch Linux 4.9.8-1-ARCH (ttymxc1) > > alarm login: [ 19.197477] fec 2188000.ethernet eth0: Link is Up - > 1Gbps/Full - flow control rx/tx > [ 19.205165] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > [ 33.08] vgaarb: this pci device is not a vga device > [ 33.203520] vgaarb: this pci device is not a vga device > [ 33.447051] imx-ipuv3 240.ipu: DC stop timeout after 50 ms Does the "gpu: ipu-v3: remove IRQ dance on DC channel disable" patch (https://patchwork.kernel.org/patch/9417683) do anything to improve the situation, besides removing the above warning? > [ 33.707042] [ cut here ] > [ 33.711682] WARNING: CPU: 1 PID: 316 at > drivers/gpu/drm/drm_atomic_helper.c:1140 > drm_atomic_helper_wait_for_vblanks+0x274/0x278 > [ 33.723187] [CRTC:24] vblank wait timed out > [ 33.727395] Modules linked in: snd_soc_sgtl5000 snd_soc_fsl_ssi > snd_soc_imx_sgtl5000 imx_pcm_fiq imx_pcm_dma snd_soc_fsl_asrc > snd_soc_fsl_asoc_card snd_soc_core dw_hdmi_ahb_audio snd_pcm_dmaengine > caam_jr imx_ipuv3_crtc snd_ac97_codec coda v4l2_mem2mem > videobuf2_dma_contig ac97_bus imx_ipu_v3 snd_soc_imx_audmux snd_pcm > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core > dw_hdmi_imx caam imx2_wdt ofpart spi_imx evdev dw_hdmi etnaviv imx_ldb > pwm_imx snd_timer parallel_display uio_pdrv_genirq uio imxdrm > sch_fq_codel ip_tables x_tables > [ 33.776237] CPU: 1 PID: 316 Comm: Xorg Not tainted 4.9.8-1-ARCH #1 > [ 33.782419] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > [ 33.788965] [] (unwind_backtrace) from [] > (show_stack+0x10/0x14) > [ 33.796719] [] (show_stack) from [] > (dump_stack+0x88/0x9c) > [ 33.803951] [] (dump_stack) from [] (__warn+0xe8/0x100) > [ 33.810919] [] (__warn) from [] > (warn_slowpath_fmt+0x48/0x6c) > [ 33.818410] [] (warn_slowpath_fmt) from [] > (drm_atomic_helper_wait_for_vblanks+0x274/0x278) > [ 33.828516] [] (drm_atomic_helper_wait_for_vblanks) from > [] (imx_drm_atomic_commit_tail+0x48/0x58 [imxdrm]) > [ 33.840004] [] (imx_drm_atomic_commit_tail [imxdrm]) from > [] (commit_tail+0x40/0x5c) > [ 33.849489] [] (commit_tail) from [] > (drm_atomic_helper_commit+0x94/0xd8) > [ 33.858019] [] (drm_atomic_helper_commit) from > [] (drm_atomic_helper_set_config+0x78/0x9c) > [ 33.868028] [] (drm_atomic_helper_set_config) from > [] (drm_mode_set_config_internal+0x58/0xdc) > [ 33.878382] [] (drm_mode_set_config_internal) from > [] (drm_mode_setcrtc+0x4a4/0x550) > [ 33.887867] [] (drm_mode_setcrtc) from [] > (drm_ioctl+0x214/0x44c) > [ 33.895704] [] (drm_ioctl) from [] > (do_vfs_ioctl+0xac/0x980) > [ 33.903104] [] (do_vfs_ioctl) from [] > (SyS_ioctl+0x6c/0x7c) > [ 33.910422] [] (SyS_ioctl) from [] > (ret_fast_syscall+0x0/0x3c) > [ 33.918023] ---[ end trace e8a77aa320be7e55 ]--- > [ 43.997066] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* > [CRTC:24:crtc-0] flip_done timed out > [ 55.517063] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* > [CRTC:24:crtc-0] flip_done timed out > [ 56.958264] alloc_contig_range: [4d0b7, 4d0b8) PFNs busy > [ 90.896615] usb 1-1.2.3: new low-speed USB device number 6 using ci_hdrc > [ 91.050796] usb 1-1.2.3: New USB device found, idVendor=0458, > idProduct=003a > [ 91.057888] usb 1-1.2.3: New USB device strings: Mfr=1, Product=2, > SerialNumber=0 > [ 91.065382] usb 1-1.2.3: Product: Optical Mouse > [ 91.069944] usb 1-1.2.3: Manufacturer: Genius > [ 91.082071] input: Genius Optical Mouse as > /devices/soc0/soc/210.aips-bus/2184200.usb/ci_hdrc.1/usb1/1-1/1-1.2/1-1.2.3/1-1.2.3:1.0/0003:0458:003A.0003/input/input3 > [ 91.098116] hid-generic 0003:0458:003A.0003: input,hidraw2: USB HID > v1.11 Mouse [Genius Optical Mouse] on usb-ci_hdrc.1-1.2.3/input0 > [ 91.162048] mousedev: PS/2 mouse device common for all mice > [ 214.765689] imx-ipuv3 240.ipu: DC stop timeout after 50 ms > [ 214.825688] Unable to handle kernel NULL pointer dereference at > virtual address
Re: imxdrm issue on SABRE Lite
Hi all Has there been any progress toward getting imxdrm working with the SABRE Lite and similar? I'm presuming that non of you own such a board and that this won't be fixed in time for 4.10, right? Thanks On Mon, Feb 13, 2017 at 9:38 AM, Russell King - ARM Linuxwrote: > On Mon, Feb 13, 2017 at 08:55:53AM +, Chris Wilson wrote: >> On Mon, Feb 13, 2017 at 09:05:33AM +0100, Thierry Reding wrote: >> > On Sun, Feb 12, 2017 at 12:15:46AM +, Russell King - ARM Linux wrote: >> > > On Sat, Feb 11, 2017 at 09:09:34PM +, Dan MacDonald wrote: >> > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c >> > > b/drivers/gpu/drm/drm_atomic_helper.c >> > > index 21f992605541..46668d071d6a 100644 >> > > --- a/drivers/gpu/drm/drm_atomic_helper.c >> > > +++ b/drivers/gpu/drm/drm_atomic_helper.c >> > > @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state >> > > *state) >> > > else >> > > drm_atomic_helper_commit_tail(state); >> > > >> > > - drm_atomic_helper_commit_cleanup_done(state); >> > > - >> > > - drm_atomic_state_free(state); >> > > + if (drm_atomic_helper_commit_cleanup_done(state) == 0) >> > > + drm_atomic_state_free(state); >> > >> > Chris (Cc'ed) added reference counting to atomic state for v4.10, maybe >> > that already fixes the issue? >> >> I'm not confident it will, as there is not an independent ref on the >> state for the phases, and so a forced timeout still leaves a dangling >> pointer. The above chunk goes the opposite way and leaks the state to >> avoid the invalid deref, what we need is a ref around its existence on >> the dependency queue if that is outside the lifetime of the commit. > > I said as much in my email - unfortunately, Thierry cut all that context. > > Right now, we oops the kernel, which causes: > > (a) the death of the calling process > (b) leaking of all memory associated with the modeset > > What I'm proposing for the -stable kernels is to _improve_ the situation > by eliminating part of the problem, so it's possible to get a better > idea of which bit went wrong and which outputs have failed. > > Fixing it properly is likely to be very invasive, since you'll need to > add reference counting to the drm_crtc_commit structure, a pointer > to that in the drm_pending_event structure, and ensure that the > reference count gets incremented at the appropriate time. Incrementing > the reference count in drm_atomic_helper_setup_commit() certainly isn't > the right place, that would be at the sites which queue the event, but > they are scattered amongst all the atomic modeset drivers. > > For reference, here's my complete patch I posted yesterday: > > drivers/gpu/drm/drm_atomic_helper.c | 15 +-- > include/drm/drm_atomic_helper.h | 2 +- > 2 files changed, 10 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c > b/drivers/gpu/drm/drm_atomic_helper.c > index 21f992605541..46668d071d6a 100644 > --- a/drivers/gpu/drm/drm_atomic_helper.c > +++ b/drivers/gpu/drm/drm_atomic_helper.c > @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state *state) > else > drm_atomic_helper_commit_tail(state); > > - drm_atomic_helper_commit_cleanup_done(state); > - > - drm_atomic_state_free(state); > + if (drm_atomic_helper_commit_cleanup_done(state) == 0) > + drm_atomic_state_free(state); > } > > static void commit_work(struct work_struct *work) > @@ -1591,12 +1590,12 @@ EXPORT_SYMBOL(drm_atomic_helper_commit_hw_done); > * This is part of the atomic helper support for nonblocking commits, see > * drm_atomic_helper_setup_commit() for an overview. > */ > -void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state) > +int drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state) > { > struct drm_crtc *crtc; > struct drm_crtc_state *crtc_state; > struct drm_crtc_commit *commit; > - int i; > + int i, failed = 0; > long ret; > > for_each_crtc_in_state(state, crtc, crtc_state, i) { > @@ -1621,15 +1620,19 @@ void drm_atomic_helper_commit_cleanup_done(struct > drm_atomic_state *state) > * not hold a reference of its own. */ > ret = wait_for_completion_timeout(>flip_done, > 10*HZ); > - if (ret == 0) > - DRM_ERROR("[CRTC:%d:%s] flip_done timed out\n", > + if (ret == 0) { > + DRM_ERROR("[CRTC:%d:%s] flip_done timed out, memory > leaked\n", > crtc->base.id, crtc->name); > + failed = -ETIMEDOUT; > + } > > spin_lock(>commit_lock); > del_commit: > list_del(>commit_entry); > spin_unlock(>commit_lock); > } > + > + return failed; > } >
Re: imxdrm issue on SABRE Lite
On Mon, Feb 13, 2017 at 09:05:33AM +0100, Thierry Reding wrote: > On Sun, Feb 12, 2017 at 12:15:46AM +, Russell King - ARM Linux wrote: > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c > > b/drivers/gpu/drm/drm_atomic_helper.c > > index 21f992605541..46668d071d6a 100644 > > --- a/drivers/gpu/drm/drm_atomic_helper.c > > +++ b/drivers/gpu/drm/drm_atomic_helper.c > > @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state > > *state) > > else > > drm_atomic_helper_commit_tail(state); > > > > - drm_atomic_helper_commit_cleanup_done(state); > > - > > - drm_atomic_state_free(state); > > + if (drm_atomic_helper_commit_cleanup_done(state) == 0) > > + drm_atomic_state_free(state); > > Chris (Cc'ed) added reference counting to atomic state for v4.10, maybe > that already fixes the issue? No. It's not the atomic state that's referenced, it's only a completion within the drm_crtc_commit structure, which is completely separate from the atomic state. Moreover, the event code has no knowledge of commits, so it can't "put" a reference count on it. See: void drm_send_event_locked(struct drm_device *dev, struct drm_pending_event *e) { assert_spin_locked(>event_lock); if (e->completion) { /* ->completion might disappear as soon as it signalled. */ complete_all(e->completion); e->completion = NULL; } vs the setup of the event done in drm_atomic_helper_setup_commit(): if (!crtc_state->event) { commit->event = kzalloc(sizeof(*commit->event), GFP_KERNEL); if (!commit->event) return -ENOMEM; crtc_state->event = commit->event; } crtc_state->event->base.completion = >flip_done; "commit" gets freed before drm_send_event_locked() is called (hence the timeout message) and when drm_send_event_locked() is eventually called via drm_vblank_off(), this causes a use-after-free bug. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: imxdrm issue on SABRE Lite
On Mon, Feb 13, 2017 at 08:55:53AM +, Chris Wilson wrote: > On Mon, Feb 13, 2017 at 09:05:33AM +0100, Thierry Reding wrote: > > On Sun, Feb 12, 2017 at 12:15:46AM +, Russell King - ARM Linux wrote: > > > On Sat, Feb 11, 2017 at 09:09:34PM +, Dan MacDonald wrote: > > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c > > > b/drivers/gpu/drm/drm_atomic_helper.c > > > index 21f992605541..46668d071d6a 100644 > > > --- a/drivers/gpu/drm/drm_atomic_helper.c > > > +++ b/drivers/gpu/drm/drm_atomic_helper.c > > > @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state > > > *state) > > > else > > > drm_atomic_helper_commit_tail(state); > > > > > > - drm_atomic_helper_commit_cleanup_done(state); > > > - > > > - drm_atomic_state_free(state); > > > + if (drm_atomic_helper_commit_cleanup_done(state) == 0) > > > + drm_atomic_state_free(state); > > > > Chris (Cc'ed) added reference counting to atomic state for v4.10, maybe > > that already fixes the issue? > > I'm not confident it will, as there is not an independent ref on the > state for the phases, and so a forced timeout still leaves a dangling > pointer. The above chunk goes the opposite way and leaks the state to > avoid the invalid deref, what we need is a ref around its existence on > the dependency queue if that is outside the lifetime of the commit. I said as much in my email - unfortunately, Thierry cut all that context. Right now, we oops the kernel, which causes: (a) the death of the calling process (b) leaking of all memory associated with the modeset What I'm proposing for the -stable kernels is to _improve_ the situation by eliminating part of the problem, so it's possible to get a better idea of which bit went wrong and which outputs have failed. Fixing it properly is likely to be very invasive, since you'll need to add reference counting to the drm_crtc_commit structure, a pointer to that in the drm_pending_event structure, and ensure that the reference count gets incremented at the appropriate time. Incrementing the reference count in drm_atomic_helper_setup_commit() certainly isn't the right place, that would be at the sites which queue the event, but they are scattered amongst all the atomic modeset drivers. For reference, here's my complete patch I posted yesterday: drivers/gpu/drm/drm_atomic_helper.c | 15 +-- include/drm/drm_atomic_helper.h | 2 +- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 21f992605541..46668d071d6a 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state *state) else drm_atomic_helper_commit_tail(state); - drm_atomic_helper_commit_cleanup_done(state); - - drm_atomic_state_free(state); + if (drm_atomic_helper_commit_cleanup_done(state) == 0) + drm_atomic_state_free(state); } static void commit_work(struct work_struct *work) @@ -1591,12 +1590,12 @@ EXPORT_SYMBOL(drm_atomic_helper_commit_hw_done); * This is part of the atomic helper support for nonblocking commits, see * drm_atomic_helper_setup_commit() for an overview. */ -void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state) +int drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state) { struct drm_crtc *crtc; struct drm_crtc_state *crtc_state; struct drm_crtc_commit *commit; - int i; + int i, failed = 0; long ret; for_each_crtc_in_state(state, crtc, crtc_state, i) { @@ -1621,15 +1620,19 @@ void drm_atomic_helper_commit_cleanup_done(struct drm_atomic_state *state) * not hold a reference of its own. */ ret = wait_for_completion_timeout(>flip_done, 10*HZ); - if (ret == 0) - DRM_ERROR("[CRTC:%d:%s] flip_done timed out\n", + if (ret == 0) { + DRM_ERROR("[CRTC:%d:%s] flip_done timed out, memory leaked\n", crtc->base.id, crtc->name); + failed = -ETIMEDOUT; + } spin_lock(>commit_lock); del_commit: list_del(>commit_entry); spin_unlock(>commit_lock); } + + return failed; } EXPORT_SYMBOL(drm_atomic_helper_commit_cleanup_done); diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h index 7ff92b09fd9c..ee3d642c1feb 100644 --- a/include/drm/drm_atomic_helper.h +++ b/include/drm/drm_atomic_helper.h @@ -88,7 +88,7 @@ int drm_atomic_helper_setup_commit(struct drm_atomic_state *state, bool nonblock); void drm_atomic_helper_wait_for_dependencies(struct drm_atomic_state *state); void
Re: imxdrm issue on SABRE Lite
On Mon, Feb 13, 2017 at 09:05:33AM +0100, Thierry Reding wrote: > On Sun, Feb 12, 2017 at 12:15:46AM +, Russell King - ARM Linux wrote: > > On Sat, Feb 11, 2017 at 09:09:34PM +, Dan MacDonald wrote: > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c > > b/drivers/gpu/drm/drm_atomic_helper.c > > index 21f992605541..46668d071d6a 100644 > > --- a/drivers/gpu/drm/drm_atomic_helper.c > > +++ b/drivers/gpu/drm/drm_atomic_helper.c > > @@ -1201,9 +1201,8 @@ static void commit_tail(struct drm_atomic_state > > *state) > > else > > drm_atomic_helper_commit_tail(state); > > > > - drm_atomic_helper_commit_cleanup_done(state); > > - > > - drm_atomic_state_free(state); > > + if (drm_atomic_helper_commit_cleanup_done(state) == 0) > > + drm_atomic_state_free(state); > > Chris (Cc'ed) added reference counting to atomic state for v4.10, maybe > that already fixes the issue? I'm not confident it will, as there is not an independent ref on the state for the phases, and so a forced timeout still leaves a dangling pointer. The above chunk goes the opposite way and leaks the state to avoid the invalid deref, what we need is a ref around its existence on the dependency queue if that is outside the lifetime of the commit. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: imxdrm issue on SABRE Lite
On Sun, Feb 12, 2017 at 12:15:46AM +, Russell King - ARM Linux wrote: > On Sat, Feb 11, 2017 at 09:09:34PM +, Dan MacDonald wrote: > > [ 43.997066] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* > > [CRTC:24:crtc-0] flip_done timed out > > [ 55.517063] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* > > [CRTC:24:crtc-0] flip_done timed out > > This seems to lay the foundation for the kernel to Oops itself later. > The problem seems to be this: > > drm_atomic_helper_commit(state->dev, state, false) > - drm_atomic_helper_setup_commit(state, false) > - foreach crtc in state > - commit->event = kzalloc() > - crtc_state->event = commit->event > - crtc_state->event->base.completion = >flip_done > ... > - commit_tail(state) > - funcs->atomic_commit_tail(state) > ... > - drm_atomic_helper_commit_planes(dev, state, > DRM_PLANE_COMMIT_ACTIVE_ONLY | > > DRM_PLANE_COMMIT_NO_DISABLE_AFTER_MODES$ - foreach active crtc in state > - funcs->atomic_begin(crtc, old_crtc_state) > - ipu_crtc_atomic_begin() > - drm_crtc_vblank_on() > - if crtc->state->event > - drm_crtc_arm_vblank_event(crtc, crtc->state->event) > - crtc->state->event = NULL > > At this point, the "commit->flip_done" completion is queued with the > event onto the vblank list. > ... > - drm_atomic_helper_commit_cleanup_done(state) > - foreach crtc in state > - try_wait_for_completion(>hw_done) > - wait_for_completion_timeout(>flip_done, 10sec) > > This is where we get the timeout message. > > - drm_atomic_state_free(state) > > This "clears" the commit state (calling drm_crtc_commit_put() on it) > which has the effect of kfree()'ing the structure containing the > flip_done, but which is still on the vblank list. > > The next time we try to set a mode, the result is that a call to > drm_crtc_vblank_off() causes all queued events to be sent, including > the now kfree()'d flip_done completion, resulting in the reported > kernel oops. > > It seems others are also suffering similar issues when the flip_done > completion times out with other drivers: > > https://lkml.org/lkml/2016/12/1/171 > https://bugs.freedesktop.org/show_bug.cgi?id=96781 > https://lists.opensuse.org/opensuse-bugs/2016-10/msg03011.html > https://patchwork.kernel.org/patch/9280223/ (which is me...) > > This is likely the same, although the timeout line was not captured: > https://bugzilla.redhat.com/show_bug.cgi?id=1415180 > https://bodhi.fedoraproject.org/updates/kernel-4.8.7-200.fc24 > > So, can we please avoid killing the kernel when the hardware doesn't > quite behave as we want it to? > > Right now, when we oops the kernel, we're leaking all the memory > associated with the atomic modeset, so if we stop oopsing the kernel > but still leak the memory, surely that would be an improvement? > Maybe something like the untested patch at the bottom of this mail? > > It would give the opportunity to poke about on a failed system to > work out what happened and maybe why the hardware misbehaved. > > The real answer is for the hardware to behave, but we can't always > have our cake. > > Note - I can't reproduce Dan's problem here on 4.10-rc7 as I suspect > it needs multiple CRTCs/outputs running in the IPU to trigger it, > which Sabre lite has. I'll try enabling the (disconnected) LVDS > output tomorrow (I have Fabio's LVDS patch knocking about), but I > suspect those with a deeper knowledge of the IPU need to investigate > what's going on. > > > [ 214.765689] imx-ipuv3 240.ipu: DC stop timeout after 50 ms > > [ 214.825688] Unable to handle kernel NULL pointer dereference at > > virtual address > > [ 214.833783] pgd = ed1b8000 > > [ 214.836491] [] *pgd=4c974831 > > [ 214.840084] Internal error: Oops: 17 [#1] SMP ARM > > [ 214.844789] Modules linked in: mousedev snd_soc_sgtl5000 > > snd_soc_fsl_ssi snd_soc_imx_sgtl5000 imx_pcm_fiq imx_pcm_dma > > snd_soc_fsl_asrc snd_soc_fsl_asoc_card snd_soc_core dw_hdmi_ahb_audio > > snd_pcm_dmaengine caam_jr imx_ipuv3_crtc snd_ac97_codec coda > > v4l2_mem2mem videobuf2_dma_contig ac97_bus imx_ipu_v3 > > snd_soc_imx_audmux snd_pcm videobuf2_vmalloc videobuf2_memops > > videobuf2_v4l2 videobuf2_core dw_hdmi_imx caam imx2_wdt ofpart spi_imx > > evdev dw_hdmi etnaviv imx_ldb pwm_imx snd_timer parallel_display > > uio_pdrv_genirq uio imxdrm sch_fq_codel ip_tables x_tables > > [ 214.894338] CPU: 2 PID: 316 Comm: Xorg Tainted: GW > > 4.9.8-1-ARCH #1 > > [ 214.901735] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > > [ 214.908264] task: ed2c4d00 task.stack: ed2a6000 > > [ 214.912803] PC is at __wake_up_common+0x1c/0x80 > > [ 214.917337] LR is at __wake_up_locked+0x14/0x1c > > [ 214.921871] pc : []lr : []psr: a0070093 > > [ 214.921871] sp : ed2a7c68 ip : fp : c0fa2a70 > > [
imxdrm issue on SABRE Lite
Hi Phillipp I'm having trouble getting xf86-video-armada working properly on a Element 14 / Embest SABRE Lite board running Arch Linux with kernel 4.9.8. I have been in touch with RMK and he's confident the crash below is caused by a kernel issue relating to imxdrm and a null pointer dereference. The first indiction of X/armada not being happy it that it only starts in 720x480 when connected to a 4K HDMI display, and when I try running: xrandr --output HDMI1 --mode 1920x1080 X crashes. Here's the serial console output for X loading and then crashing after I try running that xrandr command: Arch Linux 4.9.8-1-ARCH (ttymxc1) alarm login: [ 19.197477] fec 2188000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 19.205165] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 33.08] vgaarb: this pci device is not a vga device [ 33.203520] vgaarb: this pci device is not a vga device [ 33.447051] imx-ipuv3 240.ipu: DC stop timeout after 50 ms [ 33.707042] [ cut here ] [ 33.711682] WARNING: CPU: 1 PID: 316 at drivers/gpu/drm/drm_atomic_helper.c:1140 drm_atomic_helper_wait_for_vblanks+0x274/0x278 [ 33.723187] [CRTC:24] vblank wait timed out [ 33.727395] Modules linked in: snd_soc_sgtl5000 snd_soc_fsl_ssi snd_soc_imx_sgtl5000 imx_pcm_fiq imx_pcm_dma snd_soc_fsl_asrc snd_soc_fsl_asoc_card snd_soc_core dw_hdmi_ahb_audio snd_pcm_dmaengine caam_jr imx_ipuv3_crtc snd_ac97_codec coda v4l2_mem2mem videobuf2_dma_contig ac97_bus imx_ipu_v3 snd_soc_imx_audmux snd_pcm videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core dw_hdmi_imx caam imx2_wdt ofpart spi_imx evdev dw_hdmi etnaviv imx_ldb pwm_imx snd_timer parallel_display uio_pdrv_genirq uio imxdrm sch_fq_codel ip_tables x_tables [ 33.776237] CPU: 1 PID: 316 Comm: Xorg Not tainted 4.9.8-1-ARCH #1 [ 33.782419] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 33.788965] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [ 33.796719] [] (show_stack) from [] (dump_stack+0x88/0x9c) [ 33.803951] [] (dump_stack) from [] (__warn+0xe8/0x100) [ 33.810919] [] (__warn) from [] (warn_slowpath_fmt+0x48/0x6c) [ 33.818410] [] (warn_slowpath_fmt) from [] (drm_atomic_helper_wait_for_vblanks+0x274/0x278) [ 33.828516] [] (drm_atomic_helper_wait_for_vblanks) from [] (imx_drm_atomic_commit_tail+0x48/0x58 [imxdrm]) [ 33.840004] [] (imx_drm_atomic_commit_tail [imxdrm]) from [] (commit_tail+0x40/0x5c) [ 33.849489] [] (commit_tail) from [] (drm_atomic_helper_commit+0x94/0xd8) [ 33.858019] [] (drm_atomic_helper_commit) from [] (drm_atomic_helper_set_config+0x78/0x9c) [ 33.868028] [] (drm_atomic_helper_set_config) from [] (drm_mode_set_config_internal+0x58/0xdc) [ 33.878382] [] (drm_mode_set_config_internal) from [] (drm_mode_setcrtc+0x4a4/0x550) [ 33.887867] [] (drm_mode_setcrtc) from [] (drm_ioctl+0x214/0x44c) [ 33.895704] [] (drm_ioctl) from [] (do_vfs_ioctl+0xac/0x980) [ 33.903104] [] (do_vfs_ioctl) from [] (SyS_ioctl+0x6c/0x7c) [ 33.910422] [] (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x3c) [ 33.918023] ---[ end trace e8a77aa320be7e55 ]--- [ 43.997066] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* [CRTC:24:crtc-0] flip_done timed out [ 55.517063] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* [CRTC:24:crtc-0] flip_done timed out [ 56.958264] alloc_contig_range: [4d0b7, 4d0b8) PFNs busy [ 90.896615] usb 1-1.2.3: new low-speed USB device number 6 using ci_hdrc [ 91.050796] usb 1-1.2.3: New USB device found, idVendor=0458, idProduct=003a [ 91.057888] usb 1-1.2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 91.065382] usb 1-1.2.3: Product: Optical Mouse [ 91.069944] usb 1-1.2.3: Manufacturer: Genius [ 91.082071] input: Genius Optical Mouse as /devices/soc0/soc/210.aips-bus/2184200.usb/ci_hdrc.1/usb1/1-1/1-1.2/1-1.2.3/1-1.2.3:1.0/0003:0458:003A.0003/input/input3 [ 91.098116] hid-generic 0003:0458:003A.0003: input,hidraw2: USB HID v1.11 Mouse [Genius Optical Mouse] on usb-ci_hdrc.1-1.2.3/input0 [ 91.162048] mousedev: PS/2 mouse device common for all mice [ 214.765689] imx-ipuv3 240.ipu: DC stop timeout after 50 ms [ 214.825688] Unable to handle kernel NULL pointer dereference at virtual address [ 214.833783] pgd = ed1b8000 [ 214.836491] [] *pgd=4c974831 [ 214.840084] Internal error: Oops: 17 [#1] SMP ARM [ 214.844789] Modules linked in: mousedev snd_soc_sgtl5000 snd_soc_fsl_ssi snd_soc_imx_sgtl5000 imx_pcm_fiq imx_pcm_dma snd_soc_fsl_asrc snd_soc_fsl_asoc_card snd_soc_core dw_hdmi_ahb_audio snd_pcm_dmaengine caam_jr imx_ipuv3_crtc snd_ac97_codec coda v4l2_mem2mem videobuf2_dma_contig ac97_bus imx_ipu_v3 snd_soc_imx_audmux snd_pcm videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core dw_hdmi_imx caam imx2_wdt ofpart spi_imx evdev dw_hdmi etnaviv imx_ldb pwm_imx snd_timer parallel_display uio_pdrv_genirq uio imxdrm sch_fq_codel ip_tables x_tables [ 214.894338] CPU: 2 PID: 316
Re: imxdrm issue on SABRE Lite
On Sat, Feb 11, 2017 at 09:09:34PM +, Dan MacDonald wrote: > [ 43.997066] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* > [CRTC:24:crtc-0] flip_done timed out > [ 55.517063] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* > [CRTC:24:crtc-0] flip_done timed out This seems to lay the foundation for the kernel to Oops itself later. The problem seems to be this: drm_atomic_helper_commit(state->dev, state, false) - drm_atomic_helper_setup_commit(state, false) - foreach crtc in state - commit->event = kzalloc() - crtc_state->event = commit->event - crtc_state->event->base.completion = >flip_done ... - commit_tail(state) - funcs->atomic_commit_tail(state) ... - drm_atomic_helper_commit_planes(dev, state, DRM_PLANE_COMMIT_ACTIVE_ONLY | DRM_PLANE_COMMIT_NO_DISABLE_AFTER_MODES$ - foreach active crtc in state - funcs->atomic_begin(crtc, old_crtc_state) - ipu_crtc_atomic_begin() - drm_crtc_vblank_on() - if crtc->state->event - drm_crtc_arm_vblank_event(crtc, crtc->state->event) - crtc->state->event = NULL At this point, the "commit->flip_done" completion is queued with the event onto the vblank list. ... - drm_atomic_helper_commit_cleanup_done(state) - foreach crtc in state - try_wait_for_completion(>hw_done) - wait_for_completion_timeout(>flip_done, 10sec) This is where we get the timeout message. - drm_atomic_state_free(state) This "clears" the commit state (calling drm_crtc_commit_put() on it) which has the effect of kfree()'ing the structure containing the flip_done, but which is still on the vblank list. The next time we try to set a mode, the result is that a call to drm_crtc_vblank_off() causes all queued events to be sent, including the now kfree()'d flip_done completion, resulting in the reported kernel oops. It seems others are also suffering similar issues when the flip_done completion times out with other drivers: https://lkml.org/lkml/2016/12/1/171 https://bugs.freedesktop.org/show_bug.cgi?id=96781 https://lists.opensuse.org/opensuse-bugs/2016-10/msg03011.html https://patchwork.kernel.org/patch/9280223/ (which is me...) This is likely the same, although the timeout line was not captured: https://bugzilla.redhat.com/show_bug.cgi?id=1415180 https://bodhi.fedoraproject.org/updates/kernel-4.8.7-200.fc24 So, can we please avoid killing the kernel when the hardware doesn't quite behave as we want it to? Right now, when we oops the kernel, we're leaking all the memory associated with the atomic modeset, so if we stop oopsing the kernel but still leak the memory, surely that would be an improvement? Maybe something like the untested patch at the bottom of this mail? It would give the opportunity to poke about on a failed system to work out what happened and maybe why the hardware misbehaved. The real answer is for the hardware to behave, but we can't always have our cake. Note - I can't reproduce Dan's problem here on 4.10-rc7 as I suspect it needs multiple CRTCs/outputs running in the IPU to trigger it, which Sabre lite has. I'll try enabling the (disconnected) LVDS output tomorrow (I have Fabio's LVDS patch knocking about), but I suspect those with a deeper knowledge of the IPU need to investigate what's going on. > [ 214.765689] imx-ipuv3 240.ipu: DC stop timeout after 50 ms > [ 214.825688] Unable to handle kernel NULL pointer dereference at > virtual address > [ 214.833783] pgd = ed1b8000 > [ 214.836491] [] *pgd=4c974831 > [ 214.840084] Internal error: Oops: 17 [#1] SMP ARM > [ 214.844789] Modules linked in: mousedev snd_soc_sgtl5000 > snd_soc_fsl_ssi snd_soc_imx_sgtl5000 imx_pcm_fiq imx_pcm_dma > snd_soc_fsl_asrc snd_soc_fsl_asoc_card snd_soc_core dw_hdmi_ahb_audio > snd_pcm_dmaengine caam_jr imx_ipuv3_crtc snd_ac97_codec coda > v4l2_mem2mem videobuf2_dma_contig ac97_bus imx_ipu_v3 > snd_soc_imx_audmux snd_pcm videobuf2_vmalloc videobuf2_memops > videobuf2_v4l2 videobuf2_core dw_hdmi_imx caam imx2_wdt ofpart spi_imx > evdev dw_hdmi etnaviv imx_ldb pwm_imx snd_timer parallel_display > uio_pdrv_genirq uio imxdrm sch_fq_codel ip_tables x_tables > [ 214.894338] CPU: 2 PID: 316 Comm: Xorg Tainted: GW > 4.9.8-1-ARCH #1 > [ 214.901735] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > [ 214.908264] task: ed2c4d00 task.stack: ed2a6000 > [ 214.912803] PC is at __wake_up_common+0x1c/0x80 > [ 214.917337] LR is at __wake_up_locked+0x14/0x1c > [ 214.921871] pc : []lr : []psr: a0070093 > [ 214.921871] sp : ed2a7c68 ip : fp : c0fa2a70 > [ 214.933348] r10: c0f37384 r9 : 0003 r8 : > [ 214.938574] r7 : r6 : edbf3410 r5 : edbf3408 r4 : edbf340c > [ 214.945101] r3 : r2 : r1 : r0 : edbf340c > [ 214.951630] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM > Segment