Re: [gt-user] gram job submission takes too long

Joseph Bester Thu, 06 Mar 2008 09:26:02 -0800

On Mar 6, 2008, at 11:44 AM, Silviu Popescu wrote:

Hi,
I've installed gt4.0.6 containers on several hosts ( 15 ) on Fedora8 and there is a strange behavior on 2 of the hosts when submittinggram jobs.
The only problem is it takes too long to return the job status.
e.g: a /bind/date takes 2 minutes (and 4 minutes with staging) onthe 2 hosts while on the other hosts takes about 2 seconds. GridFtpworks fineWhen submitting between the 2 hosts it works fine ( or from thosehosts to other hosts ).
[EMAIL PROTECTED] time globusrun-ws -submit -F https://141.85.1.217:8443/wsrf/services/ManagedJobFactoryService-c /bin/date
Submitting job...Done.
Job ID: uuid:0116ac5e-eb9b-11dc-a8db-0018f39fc34f
Termination time: 03/07/2008 16:33 GMT
--------------------here I wait long time------------------------
Current job state: Done
Destroying job...Done.

real    2m0.403s
[EMAIL PROTECTED] time globusrun-ws -submit -F https://141.85.1.208:8443/wsrf/services/ManagedJobFactoryService-c /bin/date
Submitting job...Done.
Job ID: uuid:2b7b5fea-eb9a-11dc-bb86-0018f39fc34f
Termination time: 03/07/2008 16:27 GMT
Current job state: Active
Current job state: CleanUp
Current job state: Done
Destroying job...Done.

real    0m1.639s
I think it may be some configuration error because I was configuringthe 2 hosts at the same time.
If anybody has any idea please help me it might be a stupid thingthat I can't figure it out.
Thanks,


Silviu Popescu

What's likely happening is that the container is failing to getnotifications to your client globusrun-ws program. As a fallbackbehavior, the globusrun-ws program polls the containier for job statusevery minute, so that the client can normally recover from notreceiving the notifications (which is why it seems like jobsubmissions are slow---it's poll-based client behavior instead ofevent driven service behavior).

There are a few possible causes for notification failures (off the topof my head):- globusrun-ws is using host authorization on the notificationconsumer and the container is connecting with an IP address thatdoesn't resolve to the name that is in the container's certificate- globusrun-ws is using a host name or address in the EPR of itsnotification consumer which the container cannot contact. (Has aninternal IP address or is behind a firewall)

If it's the first case, you should be able to work around byexplicitly setting the expected subject name to be that of thecontainer's certificate with the -subject command-line option. You canfigure out what the container's cert is by using globusrun-ws -submit -self -F .... and use the name of theremote entity in the error message to be the argument to the -subjectcommand-line option.

If it's the second case, you should be able to work around by settingGLOBUS_HOSTNAME to an externally visible IP address for the host andmaybe setting GLOBUS_TCP_PORT_RANGE as needed to work through afirewall. (See http://dev.globus.org/wiki/FirewallHowTo)

Joe

Re: [gt-user] gram job submission takes too long

Reply via email to