On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote: > Readding Daniel, which somehow got dropped from the cc. > > On 01/25/2016 03:53 PM, Ville Syrjälä wrote: > >On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote: > >> > >> > >>On 01/25/2016 02:23 PM, Ville Syrjälä wrote: > >>>On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: > >>>> > >>>> > >>>>On 01/25/2016 05:15 AM, Michel Dänzer wrote: > >>>>>On 23.01.2016 00:18, Ville Syrjälä wrote: > >>>>>>On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: > >>>>>>> > >>>>>>>[ Trimming KDE folks from Cc ] > >>>>>>> > >>>>>>>On 21.01.2016 19:09, Daniel Vetter wrote: > >>>>>>>>On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: > >>>>>>>>>On 21.01.2016 16:58, Daniel Vetter wrote: > >>>>>>>>>> > >>>>>>>>>>Can you please point me at the vblank on/off jump bug please? > >>>>>>>>> > >>>>>>>>>AFAIR I originally reported it in response to > >>>>>>>>>http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html > >>>>>>>>>, but I can't find that in the archives, so maybe that was just on > >>>>>>>>>IRC. > >>>>>>>>>See > >>>>>>>>>http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html > >>>>>>>>>. Basically, I ran into the bug fixed by your patch because the > >>>>>>>>>counter > >>>>>>>>>jumped forward on every DPMS off, so it hit the 32-bit boundary after > >>>>>>>>>just a few days. > >>>>>>>> > >>>>>>>>Ok, so just uncovered the overflow bug. > >>>>>>> > >>>>>>>Not sure what you mean by "just", but to be clear: The > >>>>>>>drm_vblank_on/off > >>>>>>>counter jumping bug (similar to the bug this thread is about), which > >>>>>>>exposed the overflow bug, is still alive and kicking in 4.5. It seems > >>>>>>>to happen when turning off the CRTC: > >>>>>>> > >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: > >>>>>>>current=218104694, diff=0, hw=916 hw_last=916 > >>>>>>>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 > >>>>>>>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 > >>>>>>>p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] > >>>>>>>[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 > >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: > >>>>>>>current=218104694, diff=16776301, hw=1 hw_last=916 > >>>>>> > >>>>>>Not sure what bug we're talking about here, but here the hw counter > >>>>>>clearly jumps backwards. > >>>>>> > >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 1: > >>>>>>>current=0, diff=0, hw=0 hw_last=0 > >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 2: > >>>>>>>current=0, diff=0, hw=0 hw_last=0 > >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 3: > >>>>>>>current=0, diff=0, hw=0 hw_last=0 > >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1 > >>>>>>>[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ > >>>>>>>7304.317140 -> 7304.317140 [e 0 us, 0 rep] > >>>>>>>[drm:radeon_get_vblank_counter_kms] Query failed! stat 1 > >>>>>>>[drm:drm_update_vblank_count] updating vblank count on crtc 0: > >>>>>>>current=234880995, diff=16777215, hw=0 hw_last=1 > >>>>>> > >>>>>>Same here. > >>>>> > >>>>>At least one of the jumps is expected, because this is around turning > >>>>>off the CRTC for DPMS off. Don't know yet why there are two jumps back > >>>>>though. > >>>>> > >>>>> > >>>>>>These things just don't happen on i915 because drm_vblank_off() and > >>>>>>drm_vblank_on() are always called around the times when the hw counter > >>>>>>might get reset. Or at least that's how it should be. > >>>>> > >>>>>Which is of course the idea of Daniel's patch (which is what I'm getting > >>>>>the above with) or Mario's patch as well, but clearly something's still > >>>>>wrong. It's certainly possible that it's something in the driver, but > >>>>>since calling drm_vblank_pre/post_modeset from the same places seems to > >>>>>work fine (ignoring the regression discussed in this thread)... Do > >>>>>drm_vblank_on/off require something else to handle this correctly? > >>>>> > >>>>> > >>>> > >>>>I suspect it is because vblank_disable_and_save calls > >>>>drm_update_vblank_count() unconditionally, even if vblank irqs are > >>>>already off. > >>>> > >>>>So on a manual display disable -> reenable you get something like > >>>> > >>>>At disable: > >>>> > >>>>Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> > >>>>vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes > >>>>final count. > >>>> > >>>>Then the crtc is shut down and its hw counter resets to zero. > >>>> > >>>>At reenable: > >>>> > >>>>Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> > >>>>atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> > >>>>drm_vblank_off -> vblank_disable_and_save -> A pointless > >>>>drm_update_vblank_count() while the hw counter is already reset to zero > >>>>--> Unwanted counter jump. > >>>> > >>>> > >>>>The problem doesn't happen on a pure modeset to a different video > >>>>resolution/refresh rate, as then we only have one call into > >>>>atombios_crtc_dpms(DPMS_OFF). > >>>> > >>>>I think the fix is to fix vblank_disable_and_save() to only call > >>>>drm_update_vblank_count() if vblank irqs get actually disabled, not on > >>>>no-op calls. I will try that now. > >>> > >>>It does that on purpose. Otherwise the vblank counter would appear to > >>>have stalled while the interrupt was off. > >>> > >> > >>Ok, that's what the comments there say, although i don't see atm. why > >>that perceived stall would be a big problem. I checked all callers of > >>vblank_disable_and_save(). They are all careful to not call that > >>function if vblanks are already disabled. The only exception is > >>drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms > >>drivers which have resetting hw counters or other problematic behaviour > >>during modesets etc. then this will break. E.g., calling the vblank > >>timestamping stuff is also not safe/well-defined during modesets when > >>the timestamping constants are not (yet) updated to reflect the new mode > >>timing of the modeset in progress. > > > >The idea is to maintain the appearance that the counter ticks all the > >time as long as the crtc is active. While that may not be really > >required in case if no one is currently interested in the vblank > >counter, I think it's a nice thing to have just to make the behaviour > >of the counter consistent. > > > >As far as calling drm_vblank_off() after the hw counter got reset, well, > >that not correct. It should be called before the reset. > > What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. The > first call to DMPS_OFF will call drm_vblank_off() and really disable > vblank-irqs if they were running, updating the counts/ts a last time. But > then the dpms off will reset the hw counter to zero. When one reenables the > display, a second call to DPMS_OFF will now call drm_vblank_off again when > it apparently shouldn't. > > I just tested this patch, which fixes the counter jumps on radeon-kms with > my or Daniel's drm_vblank_off patches to radeon:
This might be due to the legacy helpers, which just love to redundantly disable stuff that's off already. The problem I see with no-oping these out is that for atomic drivers (which really should get this right) this might paper over bugs: E.g. when you forget to call _off() when disabling the crtc, then calling _on() twice in a row is indeed a serious bug. Similar when you forget to call _on() and have multiple _off() calls in a row. So not sure what to do here. -Daniel > > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c > index 607f493..d739d93 100644 > --- a/drivers/gpu/drm/drm_irq.c > +++ b/drivers/gpu/drm/drm_irq.c > @@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev, unsigned > int pipe) > spin_lock_irqsave(&dev->event_lock, irqflags); > > spin_lock(&dev->vbl_lock); > - vblank_disable_and_save(dev, pipe); > + DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe, > vblank->enabled); > + > + if (vblank->enabled) > + vblank_disable_and_save(dev, pipe); > wake_up(&vblank->queue); > > /* > @@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned > int pipe) > return; > > spin_lock_irqsave(&dev->vbl_lock, irqflags); > + DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe, > vblank->enabled); > + > /* Drop our private "prevent drm_vblank_get" refcount */ > if (vblank->inmodeset) { > atomic_dec(&vblank->refcount); > > > > Another, maybe better, approach might be to no-op redundant calls to > drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to > drm_vblank_on() iff !vblank->inmodeset. > > -mario > > > > > >> > >>-mario > >> > >> > >>>> > >>>>Otherwise kms drivers would have to be careful to never call > >>>>drm_vblank_off multiple times before calling drm_vblank_on, but the help > >>>>text to drm_vblank_on() claims that unbalanced calls to these functions > >>>>are perfectly fine. > >>>> > >>>>-mario > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch