Satish Balay <ba...@mcs.anl.gov> writes: > On Sat, 21 Oct 2017, Lisandro Dalcin wrote: > >> Satish set TIMEOUT to more than 2 hours. If a test ever fails because >> of a deadlock, the build worker will be stuck for 2 hours. Of course, >> we will likely notice, but still... > > As mentioned timeout doesn't reall work with valgrind builds [ perhaps > also with openmpi builds and with any mpi impl that doesn't kill child > process when mpiexec proc is killed etc.] - so short timeout is just > printing incorrect-verbose messages [i.e a kill message is printed - > but the job isn't getting killed]. - a long/infinite timeout is just > the representation of curent runtime behavior. > > In the future - if all tests are converted to test harness - a few > long jobs won't be a big issue wrt throughput. [as multiple jobs get > run simultaneously]
Depends if you're using the machine for other things. I think having that long-running job would tend to oversubscribe MPI and slow down throughput. In any case, we probably shouldn't run this 3D convergence study under any testing system, even without Valgrind. > Sure its best to not have long running test jobs - if possible. > > Satish