Re: preventing GPU reset DoS
On Wed, Sep 23, 2009 at 10:52 PM, Nicolai Hähnle wrote: > Am Tuesday 22 September 2009 23:25:09 schrieb Pauli Nieminen: > > Too bad GPU reset is already now stopping this use case while it doesn't > > protect user from possible attack causing multiple GPU reset in row. So > > this long rendering operation blocking GPU is more like scheduler or mesa > > bug that it doesn't split rendering to small enough parts that we can > > scheduler something else in between for user interface. Is it possible to > > scheduler something els to GPU wile only part of GPU runs the slow and > long > > running shader? If no then it looks like big limitation in hw design. > > I would hope the hardware people thought of this on newer GPUs, but at > least I > haven't seen anything to support context switching in the docs released by > AMD. > > As for the rest, I agree that it's a problem. It is actually roughly the > same > problem as when the system goes into a swapping loop of death, except it > may > actually be easier to identify the culprit. After all, by simply checking > which fences have already been written back by the GPU, we should be able > to > determine which client caused the currently executing command stream. > > Now I remember some talk that WDDM driver model requires preemptive scheduling from driver so maybe r600+ cards have preemptive scheduling support at least in some forms. That probably does require adding some more tracking, but perhaps it can be > integrated into the existing fence mechanisms. > > The second part would be to punish applications that have caused GPU hangs. > Frankly, killing them seems like a bad idea; it seems better to > de-prioritize > them and force them to wait before sending new command buffer. > > Problem here is that each GPU hang will last over 500ms before GPU is reset. It might be something like first lower the priority and then if hangs continue then start killing the application. I think that GPU hang is more like memory access violation in normal application so it should cause crash. Rendering of application will anyway be broken after reset because some of rendering operations failed and image would be corrupted. > Another major worry is that we should somehow make sure that the X server - > or > alternative future display servers - will not become victims of this > regime. > After all, if the X server services an indirect rendering GLX client, it > could > also be hoodwinked by this client into submitting too-long-running command > streams. > > If the DRM clients get appropriate feedback when they caused a GPU reset, > the > X server could potentially use this information to punish GLX clients > accordingly. > > cu, > Nicolai > So we would need secure communication link with xserver so it could cooperate after GPU hang and penalize the broken application the way that is tough to be correct. In my option sending fatal signal is the best option but if all the problems in keeping broken application running can be fixed somehow then it could do something else. -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: preventing GPU reset DoS
Am Tuesday 22 September 2009 23:25:09 schrieb Pauli Nieminen: > Too bad GPU reset is already now stopping this use case while it doesn't > protect user from possible attack causing multiple GPU reset in row. So > this long rendering operation blocking GPU is more like scheduler or mesa > bug that it doesn't split rendering to small enough parts that we can > scheduler something else in between for user interface. Is it possible to > scheduler something els to GPU wile only part of GPU runs the slow and long > running shader? If no then it looks like big limitation in hw design. I would hope the hardware people thought of this on newer GPUs, but at least I haven't seen anything to support context switching in the docs released by AMD. As for the rest, I agree that it's a problem. It is actually roughly the same problem as when the system goes into a swapping loop of death, except it may actually be easier to identify the culprit. After all, by simply checking which fences have already been written back by the GPU, we should be able to determine which client caused the currently executing command stream. That probably does require adding some more tracking, but perhaps it can be integrated into the existing fence mechanisms. The second part would be to punish applications that have caused GPU hangs. Frankly, killing them seems like a bad idea; it seems better to de-prioritize them and force them to wait before sending new command buffer. Another major worry is that we should somehow make sure that the X server - or alternative future display servers - will not become victims of this regime. After all, if the X server services an indirect rendering GLX client, it could also be hoodwinked by this client into submitting too-long-running command streams. If the DRM clients get appropriate feedback when they caused a GPU reset, the X server could potentially use this information to punish GLX clients accordingly. cu, Nicolai -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: preventing GPU reset DoS
Am Mittwoch, den 23.09.2009, 06:10 +1000 schrieb Dave Airlie: > I'm just wondering what other use-case we'd need anything more > agressive? Hi, the computer rooms at my University came to my mind: There are students who want to do their homework for the computer graphics lectures. If it *is* possible to lock a machine for a longer time, they could lock up machine by machine by accident and the computer room guidelines say that it's forbidden to turn off/reset computers and the administrators aren't around all the time. This could also be done intentionally to annoy other users... Another scenario could be an Internet cafe or so. I don't care if I can or can not DoS myself on my own machine, but I see a potential problem in multi-user environments. Maybe having the possibility to adjust protection during run-time would be a "nice to have" feature. Kind regards - Fuddl signature.asc Description: Dies ist ein digital signierter Nachrichtenteil -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: preventing GPU reset DoS
On Tue, Sep 22, 2009 at 11:51 PM, Corbin Simpson wrote: > On 09/22/2009 01:19 PM, Nicolai Hähnle wrote: > > I'm pretty confident that you can write a perfectly legal OpenGL > application > > that creates commands that take *minutes* to run on decent graphics > cards. > > Just produce a huge number of screen-sized primitives and use a very long > > fragment program that samples from a huge, non-mipmapped texture in an > > entirely non-locally-coherent way - and that doesn't even take GLSL loops > into > > account! > > I don't have any examples with me, but you could, as an admittedly > contrived possibility, take several hundred thousand verts, submit them > in immediate mode without VBOs, multitexture from all eight texture > samplers using 256x256x256 3D textures, use 4 4096x4096 render targets, > and 16x FSAA multisample the whole shebang. Should (barely) fit on a > 512MB r500, and take at least half a minute to execute, probably more. > > And as you said, GLSL can be used to write *very* long-running shaders. > > Really, there's no solution to this that won't also lock out legitimate > uses, I fear. > > ~ C. > > Too bad GPU reset is already now stopping this use case while it doesn't protect user from possible attack causing multiple GPU reset in row. So this long rendering operation blocking GPU is more like scheduler or mesa bug that it doesn't split rendering to small enough parts that we can scheduler something else in between for user interface. Is it possible to scheduler something els to GPU wile only part of GPU runs the slow and long running shader? If no then it looks like big limitation in hw design. I can see also possible attacks that use GPU hang as way to disable computers from local use possible scenario. And if this only requires access to normal user account it is real problem and should be protected. It is possible that someone would write virus that targets Linux desktop and tries to casue harm to users. This kind of virus in corporate network causing many computers fall to unusable state would cost quite a lot. Even tough admin could clean the virus with ssh but it still cost a lot of money when computer is unusable even for short time. Do I have to come up with more scenarios why GPU reset would need to protect local access to computer? I think it is just bad if normal user can cause local access to be unusable for extended period of time. Of course this protections has to be configurable in runtime or boot time. And I posted idea here before even exploring how it would be practically possible to get more toughs what should be taken in account. -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: preventing GPU reset DoS
On 09/22/2009 01:19 PM, Nicolai Hähnle wrote: > I'm pretty confident that you can write a perfectly legal OpenGL application > that creates commands that take *minutes* to run on decent graphics cards. > Just produce a huge number of screen-sized primitives and use a very long > fragment program that samples from a huge, non-mipmapped texture in an > entirely non-locally-coherent way - and that doesn't even take GLSL loops > into > account! I don't have any examples with me, but you could, as an admittedly contrived possibility, take several hundred thousand verts, submit them in immediate mode without VBOs, multitexture from all eight texture samplers using 256x256x256 3D textures, use 4 4096x4096 render targets, and 16x FSAA multisample the whole shebang. Should (barely) fit on a 512MB r500, and take at least half a minute to execute, probably more. And as you said, GLSL can be used to write *very* long-running shaders. Really, there's no solution to this that won't also lock out legitimate uses, I fear. ~ C. -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: preventing GPU reset DoS
Am Tuesday 22 September 2009 21:13:47 schrieb Pauli Nieminen: > Hi! > > I have been thinking GPU reset as possible DoS attack from > user-space.Problem here is that display doesn't work anymore at all if > attacker chooses to run a application that constantly causes GPU hang. It > would be of course ideal to have CS checker not to let in any problematic > combinations of commands. Butin practice we can't assume that everything is > safe with all hardware so we need to take some actions prevent possible > problems. I'm pretty confident that you can write a perfectly legal OpenGL application that creates commands that take *minutes* to run on decent graphics cards. Just produce a huge number of screen-sized primitives and use a very long fragment program that samples from a huge, non-mipmapped texture in an entirely non-locally-coherent way - and that doesn't even take GLSL loops into account! So this is really not so much about safe-guarding against illegal hardware commands, but about how to deal with the fact that we can't do pre-emptive scheduling on the GPU. In the end, the symptoms are the same, but it might change how you think about the problem. > So first defense would be terminating application that did send command > stream that caused GPU hang. But attacker could easily by-pass this > protection with forking all the time new processes. > > So we need stronger defense if same user account is causing multiple hangs > in short time frame. I would think temporary denying new DRI access would > let user to gain back control of system and take actions to stop the > problematic program from running. I have a feeling that this needs a solution that cooperates across the whole stack ... cu, Nicolai -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: preventing GPU reset DoS
On Wed, Sep 23, 2009 at 5:13 AM, Pauli Nieminen wrote: > Hi! > > I have been thinking GPU reset as possible DoS attack from > user-space.Problem here is that display doesn't work anymore at all if > attacker chooses to run a application that constantly causes GPU hang. It > would be of course ideal to have CS checker not to let in any problematic > combinations of commands. Butin practice we can't assume that everything is > safe with all hardware so we need to take some actions prevent possible > problems. > > So first defense would be terminating application that did send command > stream that caused GPU hang. But attacker could easily by-pass this > protection with forking all the time new processes. > > So we need stronger defense if same user account is causing multiple hangs > in short time frame. I would think temporary denying new DRI access would > let user to gain back control of system and take actions to stop the > problematic program from running. > It depends on what sort of system you are talking about, in a normal desktop/laptop user type systems, the DoS is either going to be sitting at it in which case the power button, or caused by an app running on it, possibly X, in which case killing it will solve the issue. In a multi-user gpgpu environment, if someone starts a DoS, it should only lock up the GPU not the CPU, unless the start a CPU DoS as well, so in that case an admin can always ssh in and kill the DoS user a/c etc. I'm just wondering what other use-case we'd need anything more agressive? Dave. -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: preventing GPU reset DoS
On Tue, 2009-09-22 at 12:13 -0700, Pauli Nieminen wrote: > Hi! > > I have been thinking GPU reset as possible DoS attack from > user-space.Problem here is that display doesn't work anymore at all if > attacker chooses to run a application that constantly causes GPU hang. > It would be of course ideal to have CS checker not to let in any > problematic combinations of commands. Butin practice we can't assume > that everything is safe with all hardware so we need to take some > actions prevent possible problems. > > So first defense would be terminating application that did send > command stream that caused GPU hang. But attacker could easily by-pass > this protection with forking all the time new processes. > > So we need stronger defense if same user account is causing multiple > hangs in short time frame. I would think temporary denying new DRI > access would let user to gain back control of system and take actions > to stop the problematic program from running. OK, but you'd want to be able to turn it off for developers -- you've just described my normal workflow... Keith -- Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel