A thought occurred to me this morning...

We know that we have a problem with the timeout on OSCAR tests, particularly when they are submitted to Torque (because the job may not start within the timeout, etc.). We've talked before about revising the code to start the timeout only when the job actually *starts* (as opposed to when it is enqueued).

A quick hack to make this work is simply to have tests that submit to Torque touch a specific file (on NFS) as the first thing that they do. The timeout can start when the script on the head node sees this file.

Yes, it's a hack, and yes, it's not going to be 100% accurate (at a minimum, it's subject to NFS timing jitter). But it's more accurate than what we have now, and it's probably easy to do.

Just an idea... (did anyone commit to fixing this code? Dave? Bernard?)

--
{+} Jeff Squyres
{+} [EMAIL PROTECTED]
{+} http://www.lam-mpi.org/



-------------------------------------------------------
SF.Net email is sponsored by: GoToMeeting - the easiest way to collaborate
online with coworkers and clients while avoiding the high cost of travel and
communications. There is no equipment to buy and you can meet as often as
you want. Try it free.http://ads.osdn.com/?ad_id=7402&alloc_id=16135&op=click
_______________________________________________
Oscar-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to