Scott,
How difficult would it be for the test harness to run a failed test again if the return code has specific values? Instead of erroring out. I am thinking in particular about GPUs but it is general. If the GPU doesn't have he resources available it will error out thus crashing the entire job in the pipeline requiring retrying the job from the GUI. Wasting everyone's time. Seems in theory like it should be pretty straightforward but, of course, unforeseen issues can make it difficult. Just check the program's error code and it if is certain values run the program again, or wait a few seconds and run Barry Issues are still broken hence here.