We're running Maui 3.6.2p11 with Torque 2.1.6 (I realize these may not be the most current versions, but we don't switch that often).
This combination has been running fine for several months. However, in the last 3 weeks, every Friday, Maui has been constantly reporting messages like the following: cannot get node info: Unknown Job Id cannot get node info: NULL cannot get node info: Execution server rejected request MSG=connection to mom timed out cannot get node info: Execution server rejected request MSG=cannot send job to mom, state=PRERUN job '1545086' cannot be started: (rc: 15031 errmsg: 'Premature end of message') Followed of course by attempts to re-initialize the PBS interface. These would seem to imply some kind of networking issue, although Maui is communicating with Torque on the same machine. When these messages occur, Maui doesn't successfully re-initialize the PBS interface - we always have to kill it and then restart it manually. Often, we're doing this every 3 minutes or so between 6am and 9am (like we are now). Any ideas about how to troubleshoot this accurately (and quickly)? Michael Durket _______________________________________________ mauiusers mailing list mauiusers@supercluster.org http://www.supercluster.org/mailman/listinfo/mauiusers