Thanks Charles, we have really been able to move this along with your
help, everything seems to be working as far as we can see now, we just
wondered if it was normal for the grid ftp wrapper shell script to show
as running in condor_q for about 30 mins even when the job has finished
? It seems to work ok and even when you run another job it seems to
reuse it if it is still running but just wondered if that was expected
behaviour.

Thanks,

Scott

On Nov 26, 2007, at 10:28 AM, scott fletcher (BITS) wrote:

> Thanks For the info Charles,
>
>> If you set up the more recent implementation of GRAM, which also 
>> works with Condor, you will get near-instantaneous notification of 
>> job completion.
>
> Do you mean just specifying a different GRAM (eg gt4) in the job 
> submission file or is there some extra setup that is required ?

You'd have to setup the GRAM4 server.  That boils down to setting up a
backend database for RFT, standing up a GridFTP server for stage-in/
out, and starting the webservices container.  That's sections 2.4-2.7 in
the quickstart: http://www.globus.org/toolkit/docs/4.0/admin/
docbook/quickstart.html#q-gridftp

Then, yes, you just change your gt2 to a gt4 in the condor submit
script, and change your contact from the gatekeeper on port 2119 to
https://hostname.whatever:8443/wsrf/services/ManagedJobFactoryService

If your install of GT was already setup for submission to Condor, the
webservices condor stuff should all be setup already too.


Charles

>
> Thanks,
>
> Scott
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On 
> Behalf Of Charles Bacon
> Sent: 26 November 2007 15:10
> To: scott fletcher (BITS)
> Cc: [email protected]
> Subject: Re: [gt-user] Condor-g problems
>
> On Nov 26, 2007, at 5:59 AM, scott fletcher (BITS) wrote:
>
>> Problem 1
>> =========
> ...
>>
>> At this point even if we revert to submitting jobs directly to Condor

>> we get the same message, the only thing that seems to fix it is a 
>> reboot.
>
> I don't have an idea about this one, and I suspect you'll have better 
> luck with it in a Condor forum.  I am surprised, however, because I 
> know that in their architecture there's a separate daemon called the 
> GAHP they use to offload their interactions with things like Globus 
> into a separate daemon.  The only thing I can think to suggest is to 
> look if a GAHP is up and running at the time you experience this 
> problem and try killing it.
>
>> Problem 2
>> =========
>> When we submit a job to the master node it gets there and runs as you

>> would expect and then exits, however on the submission node the job 
>> appears idle until about a minute after the job has actually finished

>> (on short jobs lasting 10 secs, we have not really tried any long 
>> ones yet), it then shows status as running (which takes several times

>> the job actually took to run) and then exits.
>
> This has to do with the architecture of GRAM2.  It polls for job 
> completion, and does so at a one minute interval.  Condor-G is meddles

> with it to try to improve things, which I believe is the poll_fast 
> output you're seeing.  It sounds like the poll_fast isn't speeding 
> things up, and you're instead getting the default one-minute interval 
> polling.  If you set up the more recent implementation of GRAM, which 
> also works with Condor, you will get near-instantaneous notification 
> of job completion.
>
>
> Charles
>
>
> --
> Disclaimer: This e-mail and any attachments are confidential and 
> intended solely for the use of the recipient(s) to whom they are 
> addressed. If you have received it in error, please destroy all copies

> and inform the sender. This email and any attachments are believed to 
> be free from viruses but BBSRC accepts no liability in connection 
> therewith.
>


-- 
Disclaimer: This e-mail and any attachments are confidential and intended 
solely for the use of the recipient(s) to whom they are addressed. If you have 
received it in error, please destroy all copies and inform the sender. This 
email and any attachments are believed to be free from viruses but BBSRC 
accepts no liability in connection therewith. 

Reply via email to