Re: [gt-user] gram job submission takes too long

Silviu Popescu Thu, 06 Mar 2008 11:54:25 -0800

You were right is was a name resolution problem I think. I am using hosts file 
for name resolution for the moment for testing purpose and I forgot to populate 
it.

Thanks a lot,
Silviu

Joseph Bester <[EMAIL PROTECTED]> wrote: On Mar 6, 2008, at 11:44 AM, Silviu 
Popescu wrote:

> Hi,
>
> I've installed gt4.0.6 containers on several hosts ( 15 ) on Fedora  
> 8 and there is a strange behavior on 2 of the hosts when submitting  
> gram jobs.
> The only problem is it takes too long to return the job status.
> e.g: a /bind/date takes 2 minutes (and 4 minutes with staging) on  
> the 2 hosts while on the other hosts takes about 2 seconds. GridFtp  
> works fine
> When submitting between the 2 hosts  it works fine ( or from those  
> hosts to other hosts ).
>
> [EMAIL PROTECTED] time globusrun-ws -submit -F  
> https://141.85.1.217:8443/wsrf/services/ManagedJobFactoryService 
>  -c /bin/date
> Submitting job...Done.
> Job ID: uuid:0116ac5e-eb9b-11dc-a8db-0018f39fc34f
> Termination time: 03/07/2008 16:33 GMT
> --------------------here I wait long time------------------------
> Current job state: Done
> Destroying job...Done.
>
> real    2m0.403s
>
> [EMAIL PROTECTED] time globusrun-ws -submit -F  
> https://141.85.1.208:8443/wsrf/services/ManagedJobFactoryService 
>  -c /bin/date
> Submitting job...Done.
> Job ID: uuid:2b7b5fea-eb9a-11dc-bb86-0018f39fc34f
> Termination time: 03/07/2008 16:27 GMT
> Current job state: Active
> Current job state: CleanUp
> Current job state: Done
> Destroying job...Done.
>
> real    0m1.639s
>
> I think it may be some configuration error because I was configuring  
> the 2 hosts at the same time.
>
> If anybody has any idea please help me it might be a stupid thing  
> that I can't figure it out.
>
> Thanks,
>
>
> Silviu Popescu

What's likely happening is that the container is failing to get  
notifications to your client globusrun-ws program. As a fallback  
behavior, the globusrun-ws program polls the containier for job status  
every minute, so that the client can normally recover from not  
receiving the notifications (which is why it seems like job  
submissions are slow---it's poll-based client behavior instead of  
event driven service behavior).

There are a few possible causes for notification failures (off the top  
of my head):
- globusrun-ws is using host authorization on the notification  
consumer and the container is connecting with an IP address that  
doesn't resolve to the name that is in the container's certificate
- globusrun-ws is using a host name or address in the EPR of its  
notification consumer which the container cannot contact. (Has an  
internal IP address or is behind a firewall)

If it's the first case, you should be able to work around by  
explicitly setting the expected subject name to be that of the  
container's certificate with the -subject command-line option. You can  
figure out what the container's cert is by using globusrun-ws -submit - 
self -F .... and use the name of the
remote entity in the error message to be the argument to the -subject  
command-line option.

If it's the second case, you should be able to work around by setting  
GLOBUS_HOSTNAME to an externally visible IP address for the host and  
maybe setting GLOBUS_TCP_PORT_RANGE as needed to work through a  
firewall. (See http://dev.globus.org/wiki/FirewallHowTo)

Joe

Re: [gt-user] gram job submission takes too long

Reply via email to