Re: [PyCUDA] Dealing with driver timeouts in long running kernels

Cyrus Omar Tue, 16 Nov 2010 18:34:22 -0800

On Tue, Nov 16, 2010 at 20:25, Dan Goodman <dg.pyc...@thesamovar.net> wrote:


> A final option that I thought of would be to check for a launch timeout
> failure after each kernel launch, and if it happens, divide my problem size
> by two and try again, repeating until I don't get any launch failures. The
> trouble with this approach is that I'll get multiple failures and screen
> flashes before it settles down to a value that works, wasting a little bit
> of time but more importantly being quite alarming. It also doesn't feel very
> elegant... ;-)


This is risky, as per the TDR webpage you linked to:

> Minor changes were made in Windows Vista SP1 to improve the user experience
> in cases of frequent and rapidly occurring GPU hangs. Repetitive GPU hangs
> indicate that the graphics hardware has not recovered successfully. In these
> instances, the system must be shut down and restarted to fully reset the
> graphics hardware. If the operating system detects that six or more GPU
> hangs and subsequent recoveries occur within 1 minute, then the following
> GPU hang is treated as a system bug check.
>
Seems the best option is to just disable TDR through the registry while the
application is running and inform the user that that is what you're doing
and what it means.

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Dealing with driver timeouts in long running kernels

Reply via email to