On Tue, Jan 24, 2023 at 09:09:02AM +0100, Johan Hovold wrote:
> On Mon, Jan 23, 2023 at 09:17:49AM -0800, Bjorn Andersson wrote:
> > On Mon, Jan 23, 2023 at 05:01:45PM +0100, Johan Hovold wrote:
> > > On Tue, Jan 17, 2023 at 09:04:39AM +0100, Johan Hovold wrote:
> > > > On Mon, Jan 16, 2023 at 08:51:22PM -0600, Bjorn Andersson wrote:
> 
> > > > > Perhaps we have shuffled other things around to avoid this bug?  
> > > > > Either
> > > > > way, let's this on hold  until further proof that it's still
> > > > > reproducible.
> > > > 
> > > > As I've mentioned off list, I haven't hit the apparent race I reported
> > > > here:
> > > > 
> > > >         
> > > > https://lore.kernel.org/all/y1efjh11b5uqz...@hovoldconsulting.com/
> > > > 
> > > > since moving to 6.2. I did hit it with both 6.0 and 6.1-rc2, but it
> > > > could very well be that something has changes that fixes (or hides) the
> > > > issue since.
> > > 
> > > For unrelated reasons, I tried enabling async probing, and apart from
> > > apparently causing the panel driver to probe defer indefinitely, I also
> > > again hit the WARN_ON() I had added to catch this:
> > > 
> > > [   13.593235] WARNING: CPU: 0 PID: 125 at 
> > > drivers/gpu/drm/drm_probe_helper.c:664 
> > > drm_kms_helper_hotplug_event+0x48/0x7
> > > 0 [drm_kms_helper]
> 
> > > So the bug still appears to be there (and the MSM DRM driver is fragile
> > > and broken, but we knew that).
> > > 
> > 
> > But the ordering between mode_config.funcs = !NULL and
> > drm_kms_helper_poll_init() in msm_drm_init() seems pretty clear.
> > 
> > And my testing shows that drm_kms_helper_poll_init() is the cause for
> > getting bridge->hpd_cb != NULL.
> > 
> > So the ordering seems legit, unless there's something else causing the
> > assignment of bridge->hpd_cb to happen earlier in this scenario.
> 
> I'm not saying that this patch is correct (indeed it doesn't seem to
> be), but only that the bug I reported still appears to be present in
> 6.2.
> 
> Now that I actually looked at this again, I realise that the reason that
> haven't seen it with 6.2 is more likely due to the fact that I'm now
> making sure to load the panel driver before the drm driver to avoid that
> unnecessary probe deferral.
> 
> With async probing, I get the probe deferral again, and boom, I hit the
> same old NULL deref.
> 
> I see there's a call to drm_kms_helper_poll_fini() in msm_drm_uninit()
> which should stop the polling, but perhaps there's still some corner
> case due to the unexpected probe (or rather component bind) deferral
> which we're hitting.

I guess the drm_kms_helper_poll_fini() bit is irrelevant here as the
call comes from the pmic_glink_altmode_worker() and
drm_bridge_hpd_notify().

Perhaps the pmic_glink altmode driver simply isn't notified that the
drm device is gone again due to the late "probe" deferral or similar?

Johan

Reply via email to