Art,

As I understand it, your application runs a single process in the "fork"
job manager.  So you are referring to the latency in running a single 
simple process, rather than to that in submission to a batch system.

I now remember that last September, Thomas Brüsemeister pointed out to
us a work-around for a similar problem, at least regarding file transfers.
It was to add the following 'iptables' rule:

iptables -A OUTPUT -p tcp --syn --dport 113 -j REJECT --reject-with tcp-reset

We implemented this on many of our systems at AIP, and observed a big 
improvement in some kinds of latencey.  Now I see that on some of them
the setting has been lost (after system upgrades, etc.)

Would this improve things for your application?

Cheers!


On 20.07.08, Arthur Carlson wrote:
> In the thread "Globus not for real-time application?", a number of users 
> discuss whether it is realistic or not to get latencies below 1 second. 
> Sounds like paradise. I am seeing latencies of up to a minute!
> 
> My workstation, gavosrv1.mpe.mpg.de, not the newest anymore, has GTK 
> 4.0.5 installed. When I use globusrun-ws to go from this machine back to 
> itself, ... but just look:
> 
>    [EMAIL PROTECTED] ~]$ time globusrun-ws -submit -s -F gavosrv1 -c /bin/true
>    Delegating user credentials...Done.
>    Submitting job...Done.
>    Job ID: uuid:52f0f962-54e1-11dd-a56f-0007e914d571
>    Termination time: 07/19/2008 15:51 GMT
>    Current job state: Active
>    Current job state: CleanUp-Hold
>    Current job state: CleanUp
>    Current job state: Done
>    Destroying job...Done.
>    Cleaning up any delegated credentials...Done.
> 
>    real    0m24.327s
>    user    0m1.242s
>    sys     0m0.113s
> 
> Note that "user" and "sys" times are reasonable. Almost all of this time 
> passes between "CleanUp" and "Done". It can't just be checking 
> credentials because gsissh is done in a jiffy:
> 
>    [EMAIL PROTECTED] ~]$ time gsissh -p 2222 gavosrv1
>    /bin/true                      
> 
>    real    0m0.649s
>    user    0m0.134s
>    sys     0m0.020s
> 
> Maybe that is already enough for someone to see where the problem lies. 
> I can also point out that all (at least many) of the machines in our 
> grid (AstroGrid-D) seem to be affected, but to varying degrees. Here is 
> a little matrix of tests:
> 
> from gavosrv1.mpe.mpg.de to gavosrv1.mpe.mpg.de: 0m27.235s
> from gavosrv1.mpe.mpg.de to titan.ari.uni-heidelberg.de: 0m14.324s
> from gavosrv1.mpe.mpg.de to udo-gt03.grid.tu-dortmund.de: 0m8.823s
> 
> from titan to gavosrv1.mpe.mpg.de: 0m57.208s
> from titan to titan.ari.uni-heidelberg.de: 0m16.875s
> from titan to udo-gt03.grid.tu-dortmund.de: 0m27.225s
> 
> from udo-gt03 to gavosrv1.mpe.mpg.de: 1m5.221s
> from udo-gt03 to titan.ari.uni-heidelberg.de: 0m12.905s
> from udo-gt03 to udo-gt03.grid.tu-dortmund.de: 0m6.952s
> 
> Please tell me I am doing something really stupid. For production of my 
> application even a minute of latency is not a big deal, but it's a pain 
> during development and debugging. Right now I am using gsissh instead of 
> globusrun-ws just to work around this.
> 
> Thank for the lift,
> Art Carlson
> AstroGrid-D Project
> Max-Planck-Institute für extraterrestrische Physik, Garching, Germany
> 

-- 
| -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
| Steve White                                             +49(331)7499-202
| e-Science / AstroGrid-D                                   Zi. 35  Bg. 20
| -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
| Astrophysikalisches Institut Potsdam (AIP)
| An der Sternwarte 16, D-14482 Potsdam
|
| Vorstand: Prof. Dr. Matthias Steinmetz, Peter A. Stolz
|
| Stiftung privaten Rechts, Stiftungsverzeichnis Brandenburg: III/7-71-026
| -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -

Reply via email to