Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

Mario Kleiner Wed, 22 Dec 2010 13:07:13 -0800

On Dec 22, 2010, at 6:23 PM, Jesse Barnes wrote:

On Wed, 22 Dec 2010 05:36:13 +0100
Mario Kleiner <mario.klei...@tuebingen.mpg.de> wrote:

----------------------------------------------------------------------


Message: 1
Date: Mon, 20 Dec 2010 19:23:40 -0800
From: Keith Packard <kei...@keithp.com>
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Andy Lutomirski <l...@mit.edu>, Jesse Barnes

<jbar...@virtuousgeek.org>, Chris Wilson <ch...@chris-wilson.co.uk>,

        David Airlie <airl...@linux.ie>
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID: <yunfwtrrepv....@aiko.keithp.com>
Content-Type: text/plain; charset="us-ascii"

On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski <l...@mit.edu>
wrote:

Enabling and disabling the vblank interrupt (on devices that
support it) is cheap.  So disable it quickly after each
interrupt.


So, the concern (and reason for the original design) was that you
might
lose count of vblank interrupts as vblanks are counted differently
while

off than while on. If vblank interrupts get enabled near theinterrupt

time, is it possible that you'll end up off by one somehow?

Leaving them enabled for 'a while' was supposed to reduce the
impact of
this potential error.

--
keith.pack...@intel.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/
20101220/105a9fb6/attachment-0001.pgp>

------------------------------

Message: 2
Date: Mon, 20 Dec 2010 22:34:12 -0500
From: Andrew Lutomirski <l...@mit.edu>
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Keith Packard <kei...@keithp.com>
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID:
        <aanlktik-1zni1dds6i+1aroenx6p=9jqaaia6t5ug...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Dec 20, 2010 at 10:23 PM, Keith Packard <kei...@keithp.com>
wrote:

On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski <l...@mit.edu>
wrote:

Enabling and disabling the vblank interrupt (on devices that
support it) is cheap. ?So disable it quickly after each
interrupt.


So, the concern (and reason for the original design) was that you
might
lose count of vblank interrupts as vblanks are counted differently
while
off than while on. If vblank interrupts get enabled near the
interrupt
time, is it possible that you'll end up off by one somehow?


So the race is:

1. Vblank happens.

2. vblank_get runs, reads hardware counter, and enables theinterrupt

(and maybe not quite in that order)
3. Interrupt fires and increments the counter.  Now the counter is
one too high.

What if we read the hardware counter from the IRQ the first timethatit fires after being enabled? That way, if the hardware andsoftware

counters match (mod 2^23 or whatever the magic number is), we don't
increment the counter.


Leaving them enabled for 'a while' was supposed to reduce the
impact of
this potential error.


Fair enough,

But five seconds is a *long* time, and anything short enough thattheinterrupt actually gets turned off in normal use risks the samerace.


--Andy


------------------------------

Message: 3
Date: Mon, 20 Dec 2010 19:47:11 -0800
From: Keith Packard <kei...@keithp.com>
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Andrew Lutomirski <l...@mit.edu>
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID: <yunbp4frdmo....@aiko.keithp.com>
Content-Type: text/plain; charset="us-ascii"

On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski
<l...@mit.edu> wrote:

But five seconds is a *long* time, and anything short enoughthat theinterrupt actually gets turned off in normal use risks the samerace.


Right, so eliminating any race seems like the basic requirement. Can
that be done?

--
keith.pack...@intel.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/
20101220/5ca3b674/attachment-0001.pgp>

------------------------------

Message: 4
Date: Mon, 20 Dec 2010 22:55:46 -0500
From: Andrew Lutomirski <l...@mit.edu>
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Keith Packard <kei...@keithp.com>
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID:
        <aanlktimk6rlkr8lt76cr8nclw_3kax6dqa+=id9_g...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Dec 20, 2010 at 10:47 PM, Keith Packard <kei...@keithp.com>
wrote:

On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski
<l...@mit.edu> wrote:

But five seconds is a *long* time, and anything short enough that
the
interrupt actually gets turned off in normal use risks the same
race.

Right, so eliminating any race seems like the basic requirement.Can

that be done?


I'll give it a shot.

Do you know if there's an existing tool to call drm_wait_vblank from
userspace for testing?  I know approximately nothing about libdrm or
any userspace graphics stuff whatsoever.

--Andy


--
keith.pack...@intel.com



------------------------------

Message: 5
Date: Mon, 20 Dec 2010 20:03:53 -0800
From: Jesse Barnes <jbar...@virtuousgeek.org>
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Andrew Lutomirski <l...@mit.edu>
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID: <20101220200353.12479...@jbarnes-desktop>
Content-Type: text/plain; charset=US-ASCII

On Mon, 20 Dec 2010 22:55:46 -0500
Andrew Lutomirski <l...@mit.edu> wrote:

On Mon, Dec 20, 2010 at 10:47 PM, Keith Packard
<kei...@keithp.com> wrote:
On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski
<l...@mit.edu> wrote:
But five seconds is a *long* time, and anything short enough
that the
interrupt actually gets turned off in normal use risks the same
race.
Right, so eliminating any race seems like the basicrequirement. Can
that be done?
I'll give it a shot.
Do you know if there's an existing tool to call drm_wait_vblankfromuserspace for testing? I know approximately nothing aboutlibdrm or
any userspace graphics stuff whatsoever.

Yeah, libdrm has a test program called vbltest; it should do whatyou

need.


Not so fast please! After a lot of review by Jesse, Dave and Chris
just merged a set of my patches into the drm-next (and the intel and
radeon kms drivers) to implement precise timestamping of vblank's and
pageflip completion and vblank counting for DRI2 and the

OML_sync_control extension. It also fixes (hopefully) almost allrace-

conditions (that i could find or think of) related to vblank irq on/
off, a few of them surprising and due to "funny" behaviour of some
gpu's when you enable/disable vblanks (e.g., radeon's spontaneously
firing a vblank irq in the middle of a scanout cycle when vblank
irq's get enabled, or firing the irq sometimes shortly before a
vblank instead of in the vblank).

There's one tiny race left in the vblank off path, which i wanted to
address during the next weeks. Also i need to implement support for
nouveau. After that we could simply reduce the vblank off timeout to

something small like 50 msecs. Or use Andrew's heuristic on top ofthis.


In any case, please check against the drm-next branch. I think your
patches touch/conflict with most of the areas in drm that are
modified in drm-next. At least my users need a very high level of
precision and robustness in vblank counting and timestamping for
neuro-science applications and similar stuff.


To preserve all of that, the easiest way to reduce the amount of time
the vblank remains enabled would probably be to reduce the vblank off
timeout.  Just make it 100ms or so since that will make sure it stays
on for a few frames at least.

Mario, for cases where you have very intermittent waits, I think you
added a method to disable the timer altogether?  That combined with a
reduced timeout seems like it could make everyone happy...

--
Jesse Barnes, Intel Open Source Technology Center

There's a new drm module parameter for selecting the timeout: echo 50> /sys/module/drm/parameters/vblankoffdelaywould set the timeout to 50 msecs. A setting of zero will disable thetimer, so vblank irq's would stay on all the time.

The default setting is still 5000 msecs as before, but reducing thisto 100 msecs wouldn't be a real problem imho. At least i didn'tobserve any miscounting during extensive testing with 100 msecs.

The patches in drm-next fix a couple of races that i observed onintel and radeon during testing and a few that i didn't see but thati could imagine happening. It tries to make sure that the saved finalcount at vblank irq disable of the software vblank_count and the gpucounter are consistent - no off by one errors. They also try todetect and filter out spurious vblank interrupts at vblank enabletime, e.g., on the radeon.

There's still one possible race in the disable path which i will tryto fix: We don't know when exactly the hardware counter incrementswrt. the processing of the vblank interrupt - it could increment afew (dozen/hundred) microseconds before or after the irq handlerruns, so if you happen to query the hardware counter while the gpu isinside the vblank you can't be sure if you picked up the old count orthe new count for that vblank.

This only matters during vblank disable. For that reason it's notsuch a good idea to disable vblank irq's from within the vblank irqhandler. I tried that and it didn't work well --> When doing it fromwithin irq you basically maximize the chance of hitting the timewindow when the race can happen. Delaying within the irq handler by amillisecond would fix that, but that's not what we want.

Having the disable in a function triggered by a timer like now is themost simple solution i could come up with. There we can burn a fewdozen microseconds if neccessary to remove this race.

There could be other races that i couldn't think of and that i didn'tsee during my testing with my 2 radeons and 1 intel gpu. Therefore ithink we should keep vblank irq's enabled for at least 2 or maybe 3refresh cycles if they aren't used by anyone. Apps that want toschedule swaps very precisely and the ddx/drm/kms-driver itself domultiple queries in quick succession for a typical swapbuffers call,i.e., drm_vblank_get() -> query -> drm_vblank_put(), so on anotherwise idle graphics system the refcount will toggle between zeroand one multiple times during a swap, usually within a fewmilliseconds. If the timeout is big enough so that irq's don't getdisabled within such a sequence of toggles, even if there's a bit ofscheduling delay for the x-server or client, then a client will seeat least consistent vblank count during a swap, even if there arestill some races hiding somewhere that we don't handle properly. Thatshould be good enough, and paranoid clients can always increase thetimeout value or set it to infinite.


E.g., my toolkit schedules a swap for a specific system time like this:

1. glXGetSyncValuesOML(... &base_msc, &base_ust);

2. calculate target_msc based on user provided swap deadline t and(base_msc, base_ust) as a baseline.

3. glXSwapBuffersMscOML(...., target_msc,...);

4. glXWaitForSbcOML() or use Intel_swap_events for retrieving thetrue msc and ust of swap completion.

=> Doesn't matter if there would be an off-by-one error in vblankcounting due to an unknown race, as long as it doesn't happen between1. and 4. As long as there aren't any client/x-server schedulingdelays between step 1 and 3 of more than /sys/module/drm/parameters/vblankoffdelay msecs, nothing can go wrong even if there are raceconditions left in that area.

=> 50-100 msecs as new default would be probably good enough and atthe same time prevent the "blinking cursor keeps vblank irq's on allthe time" problem.

I didn't reduce the timeout in the current patches because thefiltering for race-conditions and other gpu funkyness relies onsomewhat precise vblank timestamps and the timestamping hooks aren'tyet implemented in the nouveau kms. Maybe i manage to get thisworking over christmas. Patches to nouveau would be simple, i justdon't know the mmio register addresses for crtc scanout position onnvidia gpu's.


-mario


*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.klei...@tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax:    +49 (0)7071/601-616
www:    http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

Reply via email to