Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

Mario Kleiner Sun, 26 Dec 2010 15:58:50 -0800

On Dec 26, 2010, at 3:53 PM, Andrew Lutomirski wrote:

On Wed, Dec 22, 2010 at 4:06 PM, Mario Kleiner
<mario.klei...@tuebingen.mpg.de> wrote:
There's a new drm module parameter for selecting the timeout: echo50 >
/sys/module/drm/parameters/vblankoffdelay
would set the timeout to 50 msecs. A setting of zero will disablethe timer,
so vblank irq's would stay on all the time.
The default setting is still 5000 msecs as before, but reducingthis to 100
msecs wouldn't be a real problem imho. At least i didn't observe any
miscounting during extensive testing with 100 msecs.
The patches in drm-next fix a couple of races that i observed onintel andradeon during testing and a few that i didn't see but that i couldimaginehappening. It tries to make sure that the saved final count atvblank irqdisable of the software vblank_count and the gpu counter areconsistent - nooff by one errors. They also try to detect and filter out spuriousvblank
interrupts at vblank enable time, e.g., on the radeon.
There's still one possible race in the disable path which i willtry to fix:
We don't know when exactly the hardware counter increments wrt. the
processing of the vblank interrupt - it could increment a few
(dozen/hundred) microseconds before or after the irq handler runs,so if youhappen to query the hardware counter while the gpu is inside thevblank youcan't be sure if you picked up the old count or the new count forthat
vblank.
That's disgusting.  Does this affect many GPUs?  (I can't imagine why
any sensible implementation wouldn't guarantee that the counter
increments just before the IRQ.)

;-). I don't know, but at least on the tested R500 and R600 classRadeon's, this was the case, so i assume it's at least this way onmany radeon gpu's (probably all avivo parts?) out there. We don'thave any evergreen gpu's yet in our lab so i don't know how the morerecent parts behave. Also it doesn't matter

I guess it's also a matter of definition when a new video framestarts? Leading edge / trailing edge of vblank? Start of vsync?Something else?

This only matters during vblank disable. For that reason it's notsuch agood idea to disable vblank irq's from within the vblank irqhandler. Itried that and it didn't work well --> When doing it from withinirq youbasically maximize the chance of hitting the time window when therace canhappen. Delaying within the irq handler by a millisecond would fixthat, but
that's not what we want.
Having the disable in a function triggered by a timer like now isthe most
simple solution i could come up with. There we can burn a few dozen
microseconds if neccessary to remove this race.
Maybe I'm missing something, but other than the race above (which
seems like it shouldn't exist on sane hardware), the new code seems
more complicated than necessary.

I don't think it's more complicated than necessary for what it triesto achieve, but of course i'm a bit biased. It also started off moresimple and grew a bit when i found new issues with the tested gpu's.

The aim is to fix a couple of real races and to make vblank countsand timestamps as trustworthy and oml_sync_control spec-conformant(see http://www.opengl.org/registry/specs/OML/glx_sync_control.txt)as possible. It only consumes a few additional bytes of memory(approx. 40 bytes) per crtc, doesn't use excessive time inside theirq handler and tries to avoid taking locks that are shared betweenirq and non-irq context to avoid delays in irq execution, also ifused with a kernel with the preempt_rt patch applied (important formy use case and other hard realtime apps). It's pretty self-containedand because most of it is driver-independent it can handle similarissues on different gpu's and kms drivers without the need for us tocheck each single gpu + driver combo if it has such issues or not.

1. There's the hardware vblank counter race, and it's there on lotsof existing hardware, regardless if this is sane or not, so it needsto be handled.

2. There are gpu's firing spurious vblank irq's as soon as you enableirq's -- you need to filter those out, otherwise counts andtimestamps will be wrong for at least one refresh cycle after vblankirq enable. You don't know when these redundant ones get delivered orif they get delivered at all because a real vblank irq enable getstriggered by some userspace calls, not locked to the video refreshcycle and because the enable code itself holds spin_lock_irqsavelocks and may or may not (depending on number of cores and irqrouting) prevent delivery of the vblank irq's concurrent to its ownexecution -> a possible race between the drm_handle_vblank() routineand the drm_update_vblank_count() routine each time you calldrm_vblank_get().

3. There's gpu's sometimes, but not on other times, firing the irqtoo early (1-3 scanlines observed on radeon's) so you get your irqbefore the associated vblank interval and need to do at least alltimestamping as if you are already in that vblank. This may bedependent on both video mode and on the dispatch delay of the vblankirq handler (e.g., due to some other unrelated code holding offirq's, e.g., by holding spin_lock_irqsave() locks).

4. In the old code it could happen that the vblank counter getsincremented after it was "disabled", e.g., because vblank irq's wereturned off in the gpu, but there was still a vblank irq pending(e.g., due to some spin_lock_irqsave on the relevant core),incrementing the counter after it was supposedly frozen. With the newoml_sync_control timestamping in place, such off-by-ones would beworse as they would also corrupt timestamps.

There were some more issues which i can't remember from the top of myhead which get handled by the current code (Note to myself: Take morenotes!).


Why not do nothing at all on vblank disable and, on enable, do this:

Call get_vblank_counter and declare the return value to be correct.

Because declaring it to be correct isn't the same as it beingcorrect. Also the code needs to handle wraparound of the hardwarecounter and for that it needs correct start values from the vblankdisable routine, which is why the disable routine needs to work asrace-free as possible. The vblank count for a frame also needs to beconsistent with the vblank timestamp for that frame at all times,otherwise the oml_sync_control extension becomes too unreliable to betrustworthy and useful for serious applications.

On each vblank IRQ (or maybe just the first one after enable), read
the counter again and if it matches the cached value, then assume we
raced on enable (i.e. we enabled right at the beginning of the vblank
interval, and this interrupt is redundant).  In that case, do nothing.
 Otherwise increment the cached value by one.

On hardware with the silly race where get_vblank_counter in the IRQ
can return a number one too low, use the scanout pos to fix it (i.e.
if the scanout position is at the end of the frame, assume we hit that
race and increment by 1).

See the other races above. Iirc i tried something similar already andthey made it fail/unreliable. The current patch uses the vblanktimestamp instead of the hardware vblank counter to detect and filterredundant irq's, because with the timestamping patches at least onintel and ati gpu's (and hopefully nouveau/nvidia and others asap)these are well defined and accurate (to define the end of a vblankinterval) so they can serve as a reference point. The tbd fix forrace condition #1 will also use scanout position as a fixed reference.

The sample client code (below) for scheduling accurately timedbufferswaps needs precise and trustworthy return values fromglXGetSyncValuesOML() for it to work. If vblank's are disabled attime of invocation of glXGetSyncValuesOML() then that function willtrigger the real vblank irq enable path and use its return values forswap scheduling - i.e. unless it is called within or close to avblank, it uses the vblank count and timestamp computed indrm_update_vblank_count(), usually before the vblank irq had a chanceto run. For that reason it is important for my kind of applicationsthat it really delivers the right counts and timestamps especially inthe enable path.

This is the context in case you're interested why i'm so protectiveof the current implementation: The toolkit i'm developing is probablyone of the currently most demanding clients of the new dri2 swap &sync bits and it is used for neuroscience research. Many of theexperiments there require very precise visual timing and often sub-millisecond accurate timestamping. Too many unhandled races in thewrong places could really spoil the work of the scientists that aremy users. As long as we have a disable timeout of 5 seconds, vblankirq's probably won't disable at all during most experiment sessionsand even if they do, the frequency of possible screwups is probablysmall enough to be annoying but manageable with statistics (detecting/discarding outliers in experimental results etc.). At a disabletimeout of around 50 msecs, the error rate would be unbearable.That's why i would like to make sure that the implementation canhandle at least the already known quirks of a large number of thegpu's out there if we go down to 50 msecs. But i assume races in thatcode would affect the quality of "normal" media players as well, oncewe choose such low timeouts.


Thanks and belated happy x-mas,
-mario

This means that it would be safe to disable vblanks from any context
at all, because it has no effect other than turning off the interupt.

--Andy
There could be other races that i couldn't think of and that ididn't seeduring my testing with my 2 radeons and 1 intel gpu. Therefore ithink weshould keep vblank irq's enabled for at least 2 or maybe 3 refreshcycles ifthey aren't used by anyone. Apps that want to schedule swaps verypreciselyand the ddx/drm/kms-driver itself do multiple queries in quicksuccession
for a typical swapbuffers call, i.e., drm_vblank_get() -> query ->
drm_vblank_put(), so on an otherwise idle graphics system therefcount willtoggle between zero and one multiple times during a swap, usuallywithin afew milliseconds. If the timeout is big enough so that irq's don'tget
disabled within such a sequence of toggles, even if there's a bit of
scheduling delay for the x-server or client, then a client willsee at leastconsistent vblank count during a swap, even if there are stillsome raceshiding somewhere that we don't handle properly. That should begood enough,and paranoid clients can always increase the timeout value or setit to
infinite.
E.g., my toolkit schedules a swap for a specific system time likethis:
1. glXGetSyncValuesOML(... &base_msc, &base_ust);
2. calculate target_msc based on user provided swap deadline t and
(base_msc, base_ust) as a baseline.
3. glXSwapBuffersMscOML(...., target_msc,...);
4. glXWaitForSbcOML() or use Intel_swap_events for retrieving thetrue msc
and ust of swap completion.
=> Doesn't matter if there would be an off-by-one error in vblankcountingdue to an unknown race, as long as it doesn't happen between 1.and 4. Aslong as there aren't any client/x-server scheduling delays betweenstep 1and 3 of more than /sys/module/drm/parameters/vblankoffdelaymsecs, nothing
can go wrong even if there are race conditions left in that area.
=> 50-100 msecs as new default would be probably good enough andat the same
time prevent the "blinking cursor keeps vblank irq's on all the time"
problem.
I didn't reduce the timeout in the current patches because thefiltering forrace-conditions and other gpu funkyness relies on somewhat precisevblanktimestamps and the timestamping hooks aren't yet implemented inthe nouveaukms. Maybe i manage to get this working over christmas. Patches tonouveauwould be simple, i just don't know the mmio register addresses forcrtc
scanout position on nvidia gpu's.

-mario


*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.klei...@tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax:    +49 (0)7071/601-616
www:    http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)


*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.klei...@tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax:    +49 (0)7071/601-616
www:    http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

Reply via email to