I am rebooting the box and kicking out all the jobs until we figure this
out.
Thanks!
Alex
On 2/8/2018 7:27 AM, Szilárd Páll wrote:
BTW, timeouts can be caused by contention from stupid number of ranks/tMPI
threads hammering a single GPU (especially with 2 threads/core with HT),
but I'm not sure if the tests are ever executed with such a huge rank count.
--
Szilárd
On Thu, Feb 8, 2018 at 2:40 PM, Mark Abraham <mark.j.abra...@gmail.com>
wrote:
Hi,
On Thu, Feb 8, 2018 at 2:15 PM Alex <nedoma...@gmail.com> wrote:
Mark and Peter,
Thanks for commenting. I was told that all CUDA tests passed, but I will
double check on how many of those were actually run. Also, we never
rebooted the box after CUDA install, and finally we had a bunch of
gromacs (2016.4) jobs running, because we didn't want to interrupt
postdoc's work... All of those were with -nb cpu though. Could those
factors have affected our regression tests?
Can't say. You observed timeouts, which could be consistent with drivers or
runtimes getting stuck. However, the other mdrun processes may have by
default set thread affinity, and any process that does that will interfere
with how effectively any others run, such as the tests. Sharing a node is
difficult to do well, and doing anything else with a node running GROMACS
is asking for trouble unless you have manually managed keeping the tasks
apart. Just don't.
Mark
It will really suck, if these are hardware-related...
Thanks,
Alex
On 2/8/2018 3:03 AM, Mark Abraham wrote:
Hi,
Or leftovers of the drivers that are now mismatching. That has caused
timeouts for us.
Mark
On Thu, Feb 8, 2018 at 10:55 AM Peter Kroon <p.c.kr...@rug.nl> wrote:
Hi,
with changing failures like this I would start to suspect the hardware
as well. Mark's suggestion of looking at simpler test programs than
GMX
is a good one :)
Peter
On 08-02-18 09 <08-02%2018%2009> <08-02%2018%2009>:10, Mark Abraham
wrote:
Hi,
That suggests that your new CUDA installation is differently
incomplete.
Do
its samples or test programs run?
Mark
On Thu, Feb 8, 2018 at 1:20 AM Alex <nedoma...@gmail.com> wrote:
Update: we seem to have had a hiccup with an orphan CUDA install and
that
was causing issues. After wiping everything off and rebuilding the
errors
from the initial post disappeared. However, two tests failed during
regression:
95% tests passed, 2 tests failed out of 39
Label Time Summary:
GTest = 170.83 sec (33 tests)
IntegrationTest = 125.00 sec (3 tests)
MpiTest = 4.90 sec (3 tests)
UnitTest = 45.83 sec (30 tests)
Total Test time (real) = 1225.65 sec
The following tests FAILED:
9 - GpuUtilsUnitTests (Timeout)
32 - MdrunTests (Timeout)
Errors while running CTest
CMakeFiles/run-ctest-nophys.dir/build.make:57: recipe for target
'CMakeFiles/run-ctest-nophys' failed
make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8
CMakeFiles/Makefile2:1160: recipe for target
'CMakeFiles/run-ctest-nophys.dir/all' failed
make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2
CMakeFiles/Makefile2:971: recipe for target
'CMakeFiles/check.dir/rule'
failed
make[1]: *** [CMakeFiles/check.dir/rule] Error 2
Makefile:546: recipe for target 'check' failed
make: *** [check] Error 2
Any ideas? I can post the complete log, if needed.
Thank you,
Alex
--
Gromacs Users mailing list
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
or
send a mail to gmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list
* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a
mail to gmx-users-requ...@gromacs.org.