The MSM DRM driver is currently broken in multiple ways with respect to probe deferral. Not only does the driver currently fail to probe again after a late deferral, but due to a related use-after-free bug this also triggers NULL-pointer dereferences.
These bugs are not new but have become critical with the release of 5.19 where probe is deferred in case the aux-bus EP panel driver has not yet been loaded. The underlying problem is lifetime issues due to careless use of device-managed resources. Specifically, device-managed resources allocated post component bind must be tied to the lifetime of the aggregate DRM device or they will not necessarily be released when binding of the aggregate device is deferred. The following call chain and pseudo code serves as an illustration of the problem: - platform_probe(pdev1) - dp_display_probe() - component_add() - platform_probe(pdev2) // last component - dp_display_probe() // d0 - component_add() - try_to_bring_up_aggregate_device() - devres_open_group(adev->parent) // d1 - msm_drm_bind() - msm_drm_init() - component_bind_all() - for_each_component() - component_bind() - devres_open_group(&pdev->dev) // d2 - dp_display_bind() - devm_kzalloc(&pdev->dev) // a1, OK - devres_close_group(&pdev->dev) // d3 - dpu_kms_hw_init() - for_each_panel() - msm_dp_modeset_init() - dp_display_request_irq() - devm_request_irq(&pdev->dev) // a2, BUG - if (pdev == pdev2 && condition) - return -EPROBE_DEFER; - if (error) - component_unbind_all() - for_each_component() - component_unbind() - dp_display_unbind() - devres_release_group(&pdev->dev) // d4, only a1 is freed - if (error) - devres_release_group(adev->parent) // d5 The device-managed allocation a2 is buggy as its lifetime is tied to the component platform device and will not be released when the aggregate device bind fails (e.g. due to a probe deferral). When pdev2 is later probed again, the attempt to allocate the IRQ a second time will fail for pdev1 (which is still bound to its platform driver). This series fixes the lifetime issues by tying the lifetime of a2 (and similar allocations) to the lifetime of the aggregate device so that a2 is released at d5. In some cases, such has for the DP IRQ, the above situation can also be avoided by moving the allocation in question to the platform driver probe (d0) or component bind (between d2 and d3). But as doing so is not a general fix, this can be done later as a cleanup/optimisation. Johan Changes in v2 - use a custom devres action instead of amending the AUX bus interface (Doug) - split sanity check fixes and cleanups per bridge type (Dmitry) - add another Fixes tag for the missing bridge counter reset (Dmitry) Johan Hovold (10): drm/msm: fix use-after-free on probe deferral drm/msm/dp: fix memory corruption with too many bridges drm/msm/dsi: fix memory corruption with too many bridges drm/msm/hdmi: fix memory corruption with too many bridges drm/msm/dp: fix IRQ lifetime drm/msm/dp: fix aux-bus EP lifetime drm/msm/dp: fix bridge lifetime drm/msm/hdmi: fix IRQ lifetime drm/msm/dp: drop modeset sanity checks drm/msm/dsi: drop modeset sanity checks drivers/gpu/drm/msm/dp/dp_display.c | 26 +++++++++++++++++++------- drivers/gpu/drm/msm/dp/dp_parser.c | 6 +++--- drivers/gpu/drm/msm/dp/dp_parser.h | 5 +++-- drivers/gpu/drm/msm/dsi/dsi.c | 9 +++++---- drivers/gpu/drm/msm/hdmi/hdmi.c | 7 ++++++- drivers/gpu/drm/msm/msm_drv.c | 1 + 6 files changed, 37 insertions(+), 17 deletions(-) -- 2.35.1