We have run into a problem several times now that has
severly cut into production.  When a job sits in a
queue for too long (12 hours seems long enough but it
may happen quicker; we aren't sure) something gets
hosed in torque and/or maui (I'm guessing the latter).
Jobs in some queues will no longer run, even jobs in
higher priority queues.  Other queues contine running
jobs just fine.

It doesn't seem to matter which queue the long waiting
job is in.

We have routing queues for a handful of the queues.
Two of these are typically hosed when this happens,
the other is not.

If we bump the priority on the queue with the job
that's waited "too long" above those of queues
with running jobs so taht the offending jobs starts,
things go back to nromal.  Until the next time.

Any ideas?  Any config info that would help?

Thanks,
Miles
--
Miles O'Neal
Manager, NSA
[EMAIL PROTECTED]
30° 18' 39N, 97° 55' 1W
_______________________________________________
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to