On Wed, Sep 23, 2009 at 10:52 PM, Nicolai Hähnle <nhaeh...@gmail.com> wrote:

> Am Tuesday 22 September 2009 23:25:09 schrieb Pauli Nieminen:
> > Too bad GPU reset is already now stopping this use case while it doesn't
> > protect user from possible attack causing multiple GPU reset in row. So
> > this long rendering operation blocking GPU is more like scheduler or mesa
> > bug that it doesn't split rendering to small enough parts that we can
> > scheduler something else in between for user interface. Is it possible to
> > scheduler something els to GPU wile only part of GPU runs the slow and
> long
> > running shader? If no then it looks like big limitation in hw design.
>
> I would hope the hardware people thought of this on newer GPUs, but at
> least I
> haven't seen anything to support context switching in the docs released by
> AMD.
>
> As for the rest, I agree that it's a problem. It is actually roughly the
> same
> problem as when the system goes into a swapping loop of death, except it
> may
> actually be easier to identify the culprit. After all, by simply checking
> which fences have already been written back by the GPU, we should be able
> to
> determine which client caused the currently executing command stream.
>
>
Now I  remember some talk that WDDM driver model requires preemptive
scheduling from driver so maybe r600+ cards have preemptive scheduling
support at least in some forms.

That probably does require adding some more tracking, but perhaps it can be
> integrated into the existing fence mechanisms.
>
> The second part would be to punish applications that have caused GPU hangs.
> Frankly, killing them seems like a bad idea; it seems better to
> de-prioritize
> them and force them to wait before sending new command buffer.
>
> Problem here is that each GPU hang will last over 500ms before GPU is
reset. It might be something like first lower the priority and then if hangs
continue then start killing the application. I think that GPU hang is more
like memory access violation in normal application so it should cause crash.

Rendering of application will anyway be broken after reset because some of
rendering operations failed and image would be corrupted.


> Another major worry is that we should somehow make sure that the X server -
> or
> alternative future display servers - will not become victims of this
> regime.
> After all, if the X server services an indirect rendering GLX client, it
> could
> also be hoodwinked by this client into submitting too-long-running command
> streams.
>
> If the DRM clients get appropriate feedback when they caused a GPU reset,
> the
> X server could potentially use this information to punish GLX clients
> accordingly.
>
> cu,
> Nicolai
>

So we would need secure communication link with xserver so it could
cooperate after GPU hang and penalize the broken application the way that is
tough to be correct. In my option sending fatal signal is the best option
but if all the problems in keeping broken application running can be fixed
somehow then it could do something else.
------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to