Hi Nate,
I kept keeping the errors untill I restarted galaxy completely. I could
still submit jobs to the HPC queue from other programs. I'm not very
familiar with python, but if you have pointers on where to start to
solve this, I might be able to contribute. If it would be possible to
restart for example just the handlers, this might be enough.
best regards,
Geert
On 07/12/2012 07:05 PM, Nate Coraor wrote:
On Jul 12, 2012, at 3:03 AM, Geert Vandeweyer wrote:
Hi,
Today I ran into a cluster error on our local instance using latest galaxy-dist
and torque/pbs with the python-pbs binding.
Under heavy load of the galaxy process, it appears that the handler processes
failed to contact the pbs-server, although the pbs_server was still up and
running. after that, a lot of the following statements kept appearing in the
handler.log file:
galaxy.jobs.runners.pbs DEBUG 2012-07-11 17:39:06,649
(11647/12788.pbs_master_address) Skipping state check because PBS server
connection failed
After restarting the galaxy process (run.sh), everything worked again, with no
changes to the pbs_server.
Would it be possible to setup some checks for this failure? Like:
- contact system admin
- restart galaxy
- auto retry job submission after a while as to not crash workflows.
Hi Geert,
It'd be useful to retry submission rather than fail. I doubt we'll get to it
soon, but would welcome any submissions that did this. Is restarting Galaxy
absolutely necessary, or will job submission begin to succeed again after load
goes down?
--nate
best regards,
Geert Vandeweyer
--
Geert Vandeweyer, Ph.D.
Department of Medical Genetics
University of Antwerp
Prins Boudewijnlaan 43
2650 Edegem
Belgium
Tel: +32 (0)3 275 97 56
E-mail: [email protected]
http://ua.ac.be/cognitivegenetics
http://www.linkedin.com/pub/geert-vandeweyer/26/457/726
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
--
Geert Vandeweyer, Ph.D.
Department of Medical Genetics
University of Antwerp
Prins Boudewijnlaan 43
2650 Edegem
Belgium
Tel: +32 (0)3 275 97 56
E-mail: [email protected]
http://ua.ac.be/cognitivegenetics
http://www.linkedin.com/pub/geert-vandeweyer/26/457/726
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/