Art, As I understand it, your application runs a single process in the "fork" job manager. So you are referring to the latency in running a single simple process, rather than to that in submission to a batch system.
I now remember that last September, Thomas Brüsemeister pointed out to us a work-around for a similar problem, at least regarding file transfers. It was to add the following 'iptables' rule: iptables -A OUTPUT -p tcp --syn --dport 113 -j REJECT --reject-with tcp-reset We implemented this on many of our systems at AIP, and observed a big improvement in some kinds of latencey. Now I see that on some of them the setting has been lost (after system upgrades, etc.) Would this improve things for your application? Cheers! On 20.07.08, Arthur Carlson wrote: > In the thread "Globus not for real-time application?", a number of users > discuss whether it is realistic or not to get latencies below 1 second. > Sounds like paradise. I am seeing latencies of up to a minute! > > My workstation, gavosrv1.mpe.mpg.de, not the newest anymore, has GTK > 4.0.5 installed. When I use globusrun-ws to go from this machine back to > itself, ... but just look: > > [EMAIL PROTECTED] ~]$ time globusrun-ws -submit -s -F gavosrv1 -c /bin/true > Delegating user credentials...Done. > Submitting job...Done. > Job ID: uuid:52f0f962-54e1-11dd-a56f-0007e914d571 > Termination time: 07/19/2008 15:51 GMT > Current job state: Active > Current job state: CleanUp-Hold > Current job state: CleanUp > Current job state: Done > Destroying job...Done. > Cleaning up any delegated credentials...Done. > > real 0m24.327s > user 0m1.242s > sys 0m0.113s > > Note that "user" and "sys" times are reasonable. Almost all of this time > passes between "CleanUp" and "Done". It can't just be checking > credentials because gsissh is done in a jiffy: > > [EMAIL PROTECTED] ~]$ time gsissh -p 2222 gavosrv1 > /bin/true > > real 0m0.649s > user 0m0.134s > sys 0m0.020s > > Maybe that is already enough for someone to see where the problem lies. > I can also point out that all (at least many) of the machines in our > grid (AstroGrid-D) seem to be affected, but to varying degrees. Here is > a little matrix of tests: > > from gavosrv1.mpe.mpg.de to gavosrv1.mpe.mpg.de: 0m27.235s > from gavosrv1.mpe.mpg.de to titan.ari.uni-heidelberg.de: 0m14.324s > from gavosrv1.mpe.mpg.de to udo-gt03.grid.tu-dortmund.de: 0m8.823s > > from titan to gavosrv1.mpe.mpg.de: 0m57.208s > from titan to titan.ari.uni-heidelberg.de: 0m16.875s > from titan to udo-gt03.grid.tu-dortmund.de: 0m27.225s > > from udo-gt03 to gavosrv1.mpe.mpg.de: 1m5.221s > from udo-gt03 to titan.ari.uni-heidelberg.de: 0m12.905s > from udo-gt03 to udo-gt03.grid.tu-dortmund.de: 0m6.952s > > Please tell me I am doing something really stupid. For production of my > application even a minute of latency is not a big deal, but it's a pain > during development and debugging. Right now I am using gsissh instead of > globusrun-ws just to work around this. > > Thank for the lift, > Art Carlson > AstroGrid-D Project > Max-Planck-Institute für extraterrestrische Physik, Garching, Germany > -- | - - - - - - - - - - - - - - - - - - - - - - - - - | Steve White +49(331)7499-202 | e-Science / AstroGrid-D Zi. 35 Bg. 20 | - - - - - - - - - - - - - - - - - - - - - - - - - | Astrophysikalisches Institut Potsdam (AIP) | An der Sternwarte 16, D-14482 Potsdam | | Vorstand: Prof. Dr. Matthias Steinmetz, Peter A. Stolz | | Stiftung privaten Rechts, Stiftungsverzeichnis Brandenburg: III/7-71-026 | - - - - - - - - - - - - - - - - - - - - - - - - -
