[PATCH] drm/vc4: Implement precise vblank timestamping.

2016-07-07 Thread Eric Anholt
Mario Kleiner  writes:

> Precise vblank timestamping is implemented via the
> usual scanout position based method. On VC4 the
> pixelvalves PV do not have a scanout position
> register. Only the hardware video scaler HVS has a
> similar register which describes which scanline for
> the output is currently composited and stored in the
> HVS fifo for later consumption by the PV.
>
> This causes a problem in that the HVS runs at a much
> faster clock (system clock / audio gate) than the PV
> which runs at video mode dot clock, so the unless the
> fifo between HVS and PV is full, the HVS will progress
> faster in its observable read line position than video
> scan rate, so the HVS position reading can't be directly
> translated into a scanout position for timestamp correction.
>
> Additionally when the PV is in vblank, it doesn't consume
> from the fifo, so the fifo gets full very quickly and then
> the HVS stops compositing until the PV enters active scanout
> and starts consuming scanlines from the fifo again, making
> new space for the HVS to composite.
>
> Therefore a simple translation of HVS read position into
> elapsed time since (or to) start of active scanout does
> not work, but for the most interesting cases we can still
> get useful and sufficiently accurate results:
>
> 1. The PV enters active scanout of a new frame with the
>fifo of the HVS completely full, and the HVS can refill
>any fifo line which gets consumed and thereby freed up by
>the PV during active scanout very quickly. Therefore the
>PV and HVS work effectively in lock-step during active
>scanout with the fifo never having more than 1 scanline
>freed up by the PV before it gets refilled. The PV's
>real scanout position is therefore trailing the HVS
>compositing position as scanoutpos = hvspos - fifosize
>and we can get the true scanoutpos as HVS readpos minus
>fifo size, so precise timestamping works while in active
>scanout, except for the last few scanlines of the frame,
>when the HVS reaches end of frame, stops compositing and
>the PV catches up and drains the fifo. This special case
>would only introduce minor errors though.
>
> 2. If we are in vblank, then we can only guess something
>reasonable. If called from vblank irq, we assume the irq is
>usually dispatched with minimum delay, so we can take a
>timestamp taken at entry into the vblank irq handler as a
>baseline and then add a full vblank duration until the
>guessed start of active scanout. As irq dispatch is usually
>pretty low latency this works with relatively low jitter and
>good results.
>
>If we aren't called from vblank then we could be anywhere
>within the vblank interval, so we return a neutral result,
>simply the current system timestamp, and hope for the best.
>
> Measurement shows the generated timestamps to be rather precise,
> and at least never off more than 1 vblank duration worst-case.
>
> Limitations: Doesn't work well yet for interlaced video modes,
>  therefore disabled in interlaced mode for now.
>
> Signed-off-by: Mario Kleiner 
> ---
>  drivers/gpu/drm/vc4/vc4_crtc.c | 143 
> +
>  drivers/gpu/drm/vc4/vc4_drv.c  |   2 +
>  drivers/gpu/drm/vc4/vc4_drv.h  |   7 ++
>  drivers/gpu/drm/vc4/vc4_regs.h |   4 ++
>  4 files changed, 156 insertions(+)
>
> diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
> index c82d468..c75166e 100644
> --- a/drivers/gpu/drm/vc4/vc4_crtc.c
> +++ b/drivers/gpu/drm/vc4/vc4_crtc.c

> +int vc4_crtc_get_scanoutpos(struct drm_device *dev, unsigned int crtc_id,
> + unsigned int flags, int *vpos, int *hpos,
> + ktime_t *stime, ktime_t *etime,
> + const struct drm_display_mode *mode)
> +{
...
> + /* This is the offset we need for translating hvs -> pv scanout pos. */
> + /* XXX Find proper formula from hw docs instead of guesstimating? */
> + fifo_lines = 2048 * 7 / mode->crtc_hdisplay;

You got the math really close here!

The COB is laid out as:
4 * 512-pixel, 4 byte per pixel SRAMs
4 * 4672-pixel, 3 byte per pixel SRAMs

The first 4 get allocated for the transposer (fifo 2) for writeback to
memory (which we don't support yet).  Display FIFO 1 (HDMI) gets 1920 *
7 + 16 pixels after that.  Display FIFO 0 gets the rest.  You can see
the current values in the DISPBASE registers (which we should probably
be initializing at boot, if we ever want to support powering on without
the firmware!)  DISPBASE has base address (in units of pixels) in the
low 16 bits, and the last included pixel in the top 16.

Want to respin using reads of the regs?  Reading them once at
initialization of the HVS should be fine.
-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: 


[PATCH] drm/vc4: Implement precise vblank timestamping.

2016-06-23 Thread Mario Kleiner
Precise vblank timestamping is implemented via the
usual scanout position based method. On VC4 the
pixelvalves PV do not have a scanout position
register. Only the hardware video scaler HVS has a
similar register which describes which scanline for
the output is currently composited and stored in the
HVS fifo for later consumption by the PV.

This causes a problem in that the HVS runs at a much
faster clock (system clock / audio gate) than the PV
which runs at video mode dot clock, so the unless the
fifo between HVS and PV is full, the HVS will progress
faster in its observable read line position than video
scan rate, so the HVS position reading can't be directly
translated into a scanout position for timestamp correction.

Additionally when the PV is in vblank, it doesn't consume
from the fifo, so the fifo gets full very quickly and then
the HVS stops compositing until the PV enters active scanout
and starts consuming scanlines from the fifo again, making
new space for the HVS to composite.

Therefore a simple translation of HVS read position into
elapsed time since (or to) start of active scanout does
not work, but for the most interesting cases we can still
get useful and sufficiently accurate results:

1. The PV enters active scanout of a new frame with the
   fifo of the HVS completely full, and the HVS can refill
   any fifo line which gets consumed and thereby freed up by
   the PV during active scanout very quickly. Therefore the
   PV and HVS work effectively in lock-step during active
   scanout with the fifo never having more than 1 scanline
   freed up by the PV before it gets refilled. The PV's
   real scanout position is therefore trailing the HVS
   compositing position as scanoutpos = hvspos - fifosize
   and we can get the true scanoutpos as HVS readpos minus
   fifo size, so precise timestamping works while in active
   scanout, except for the last few scanlines of the frame,
   when the HVS reaches end of frame, stops compositing and
   the PV catches up and drains the fifo. This special case
   would only introduce minor errors though.

2. If we are in vblank, then we can only guess something
   reasonable. If called from vblank irq, we assume the irq is
   usually dispatched with minimum delay, so we can take a
   timestamp taken at entry into the vblank irq handler as a
   baseline and then add a full vblank duration until the
   guessed start of active scanout. As irq dispatch is usually
   pretty low latency this works with relatively low jitter and
   good results.

   If we aren't called from vblank then we could be anywhere
   within the vblank interval, so we return a neutral result,
   simply the current system timestamp, and hope for the best.

Measurement shows the generated timestamps to be rather precise,
and at least never off more than 1 vblank duration worst-case.

Limitations: Doesn't work well yet for interlaced video modes,
 therefore disabled in interlaced mode for now.

Signed-off-by: Mario Kleiner 
---
 drivers/gpu/drm/vc4/vc4_crtc.c | 143 +
 drivers/gpu/drm/vc4/vc4_drv.c  |   2 +
 drivers/gpu/drm/vc4/vc4_drv.h  |   7 ++
 drivers/gpu/drm/vc4/vc4_regs.h |   4 ++
 4 files changed, 156 insertions(+)

diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index c82d468..c75166e 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -46,6 +46,9 @@ struct vc4_crtc {
const struct vc4_crtc_data *data;
void __iomem *regs;

+   /* Timestamp at start of vblank irq - unaffected by lock delays. */
+   ktime_t t_vblank;
+
/* Which HVS channel we're using for our CRTC. */
int channel;

@@ -146,6 +149,145 @@ int vc4_crtc_debugfs_regs(struct seq_file *m, void 
*unused)
 }
 #endif

+int vc4_crtc_get_scanoutpos(struct drm_device *dev, unsigned int crtc_id,
+   unsigned int flags, int *vpos, int *hpos,
+   ktime_t *stime, ktime_t *etime,
+   const struct drm_display_mode *mode)
+{
+   struct vc4_dev *vc4 = to_vc4_dev(dev);
+   struct vc4_crtc *vc4_crtc = vc4->crtc[crtc_id];
+   u32 val;
+   int fifo_lines;
+   int vblank_lines;
+   int ret = 0;
+
+   /*
+* XXX Doesn't work well in interlaced mode yet, partially due
+* to problems in vc4 kms or drm core interlaced mode handling,
+* so disable for now in interlaced mode.
+*/
+   if (mode->flags & DRM_MODE_FLAG_INTERLACE)
+   return ret;
+
+   /* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
+
+   /* Get optional system timestamp before query. */
+   if (stime)
+   *stime = ktime_get();
+
+   /*
+* Read vertical scanline which is currently composed for our
+* pixelvalve by the HVS, and also the scaler status.
+*/
+   val = HVS_READ(SCALER_DISPSTATX(vc4_crtc->channel));
+
+