Re: [gt-user] Condor-g problems

Charles Bacon Mon, 26 Nov 2007 07:10:23 -0800

On Nov 26, 2007, at 5:59 AM, scott fletcher (BITS) wrote:

Problem 1
=========

...

At this point even if we revert to submitting jobs directly toCondor we
get the same message, the only thing that seems to fix it is a reboot.

I don't have an idea about this one, and I suspect you'll have betterluck with it in a Condor forum. I am surprised, however, because Iknow that in their architecture there's a separate daemon called theGAHP they use to offload their interactions with things like Globusinto a separate daemon. The only thing I can think to suggest is tolook if a GAHP is up and running at the time you experience thisproblem and try killing it.

Problem 2
=========
When we submit a job to the master node it gets there and runs as you
would expect and then exits, however on the submission node the job
appears idle until about a minute after the job has actually finished
(on short jobs lasting 10 secs, we have not really tried any long ones

yet), it then shows status as running (which takes several timesthe job

actually took to run) and then exits.

This has to do with the architecture of GRAM2. It polls for jobcompletion, and does so at a one minute interval. Condor-G ismeddles with it to try to improve things, which I believe is thepoll_fast output you're seeing. It sounds like the poll_fast isn'tspeeding things up, and you're instead getting the default one-minuteinterval polling. If you set up the more recent implementation ofGRAM, which also works with Condor, you will get near-instantaneousnotification of job completion.



Charles

Re: [gt-user] Condor-g problems

Reply via email to