You were right is was a name resolution problem I think. I am using hosts file for name resolution for the moment for testing purpose and I forgot to populate it.
Thanks a lot, Silviu Joseph Bester <[EMAIL PROTECTED]> wrote: On Mar 6, 2008, at 11:44 AM, Silviu Popescu wrote: > Hi, > > I've installed gt4.0.6 containers on several hosts ( 15 ) on Fedora > 8 and there is a strange behavior on 2 of the hosts when submitting > gram jobs. > The only problem is it takes too long to return the job status. > e.g: a /bind/date takes 2 minutes (and 4 minutes with staging) on > the 2 hosts while on the other hosts takes about 2 seconds. GridFtp > works fine > When submitting between the 2 hosts it works fine ( or from those > hosts to other hosts ). > > [EMAIL PROTECTED] time globusrun-ws -submit -F > https://141.85.1.217:8443/wsrf/services/ManagedJobFactoryService > -c /bin/date > Submitting job...Done. > Job ID: uuid:0116ac5e-eb9b-11dc-a8db-0018f39fc34f > Termination time: 03/07/2008 16:33 GMT > --------------------here I wait long time------------------------ > Current job state: Done > Destroying job...Done. > > real 2m0.403s > > [EMAIL PROTECTED] time globusrun-ws -submit -F > https://141.85.1.208:8443/wsrf/services/ManagedJobFactoryService > -c /bin/date > Submitting job...Done. > Job ID: uuid:2b7b5fea-eb9a-11dc-bb86-0018f39fc34f > Termination time: 03/07/2008 16:27 GMT > Current job state: Active > Current job state: CleanUp > Current job state: Done > Destroying job...Done. > > real 0m1.639s > > I think it may be some configuration error because I was configuring > the 2 hosts at the same time. > > If anybody has any idea please help me it might be a stupid thing > that I can't figure it out. > > Thanks, > > > Silviu Popescu What's likely happening is that the container is failing to get notifications to your client globusrun-ws program. As a fallback behavior, the globusrun-ws program polls the containier for job status every minute, so that the client can normally recover from not receiving the notifications (which is why it seems like job submissions are slow---it's poll-based client behavior instead of event driven service behavior). There are a few possible causes for notification failures (off the top of my head): - globusrun-ws is using host authorization on the notification consumer and the container is connecting with an IP address that doesn't resolve to the name that is in the container's certificate - globusrun-ws is using a host name or address in the EPR of its notification consumer which the container cannot contact. (Has an internal IP address or is behind a firewall) If it's the first case, you should be able to work around by explicitly setting the expected subject name to be that of the container's certificate with the -subject command-line option. You can figure out what the container's cert is by using globusrun-ws -submit - self -F .... and use the name of the remote entity in the error message to be the argument to the -subject command-line option. If it's the second case, you should be able to work around by setting GLOBUS_HOSTNAME to an externally visible IP address for the host and maybe setting GLOBUS_TCP_PORT_RANGE as needed to work through a firewall. (See http://dev.globus.org/wiki/FirewallHowTo) Joe
