Art,
As I understand it, your application runs a single process in the "fork"
job manager. So you are referring to the latency in running a single
simple process, rather than to that in submission to a batch system.
I now remember that last September, Thomas Brüsemeister pointed out to
us a work-around for a similar problem, at least regarding file
transfers.
It was to add the following 'iptables' rule:
iptables -A OUTPUT -p tcp --syn --dport 113 -j REJECT --reject-with
tcp-reset
We implemented this on many of our systems at AIP, and observed a big
improvement in some kinds of latencey. Now I see that on some of them
the setting has been lost (after system upgrades, etc.)
Would this improve things for your application?
Cheers!
On 20.07.08, Arthur Carlson wrote:
In the thread "Globus not for real-time application?", a number of
users
discuss whether it is realistic or not to get latencies below 1 second.
Sounds like paradise. I am seeing latencies of up to a minute!
My workstation, gavosrv1.mpe.mpg.de, not the newest anymore, has GTK
4.0.5 installed. When I use globusrun-ws to go from this machine
back to
itself, ... but just look:
[EMAIL PROTECTED] ~]$ time globusrun-ws -submit -s -F gavosrv1 -c
/bin/true
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:52f0f962-54e1-11dd-a56f-0007e914d571
Termination time: 07/19/2008 15:51 GMT
Current job state: Active
Current job state: CleanUp-Hold
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.
real 0m24.327s
user 0m1.242s
sys 0m0.113s
Note that "user" and "sys" times are reasonable. Almost all of this
time
passes between "CleanUp" and "Done". It can't just be checking
credentials because gsissh is done in a jiffy:
[EMAIL PROTECTED] ~]$ time gsissh -p 2222 gavosrv1
/bin/true
real 0m0.649s
user 0m0.134s
sys 0m0.020s
Maybe that is already enough for someone to see where the problem lies.
I can also point out that all (at least many) of the machines in our
grid (AstroGrid-D) seem to be affected, but to varying degrees. Here is
a little matrix of tests:
from gavosrv1.mpe.mpg.de to gavosrv1.mpe.mpg.de: 0m27.235s
from gavosrv1.mpe.mpg.de to titan.ari.uni-heidelberg.de: 0m14.324s
from gavosrv1.mpe.mpg.de to udo-gt03.grid.tu-dortmund.de: 0m8.823s
from titan to gavosrv1.mpe.mpg.de: 0m57.208s
from titan to titan.ari.uni-heidelberg.de: 0m16.875s
from titan to udo-gt03.grid.tu-dortmund.de: 0m27.225s
from udo-gt03 to gavosrv1.mpe.mpg.de: 1m5.221s
from udo-gt03 to titan.ari.uni-heidelberg.de: 0m12.905s
from udo-gt03 to udo-gt03.grid.tu-dortmund.de: 0m6.952s
Please tell me I am doing something really stupid. For production of my
application even a minute of latency is not a big deal, but it's a pain
during development and debugging. Right now I am using gsissh
instead of
globusrun-ws just to work around this.
Thank for the lift,
Art Carlson
AstroGrid-D Project
Max-Planck-Institute für extraterrestrische Physik, Garching, Germany
--
| - - - - - - - - - - - - - - - - - - - - - -
- - -
| Steve White
+49(331)7499-202
| e-Science / AstroGrid-D Zi. 35
Bg. 20
| - - - - - - - - - - - - - - - - - - - - - -
- - -
| Astrophysikalisches Institut Potsdam (AIP)
| An der Sternwarte 16, D-14482 Potsdam
|
| Vorstand: Prof. Dr. Matthias Steinmetz, Peter A. Stolz
|
| Stiftung privaten Rechts, Stiftungsverzeichnis Brandenburg:
III/7-71-026
| - - - - - - - - - - - - - - - - - - - - - -
- - -