Re: [gt-user] excessive latency

Ioan Raicu Tue, 22 Jul 2008 06:37:27 -0700

Let me see if I understand this right. Your setup is such that you arerunning a task farming grid, where each compute resource (i.e. 1 CPU, or1 node) has GRAM installed and waiting to receive work? Or do you havea gateway node that has GRAM configured and is waiting for work, whichthen gets passed down to another LRM (i.e. BOINC) to dispatch out to theremote CPUs? So, there are 2 paradigms here: 1 to 1 GRAM submission,and 1 to many GRAM submission. The 1-1 GRAM submission is what I wasreferring to below, when I said that its OK to have 1~60 sec latenciesif your jobs are hours long each. Note that GRAM parallelizes quitewell, so if your submission client is multi-threaded, you should be ableto get around 1 job/sec throughput (which translates to about 1 secamortized latencies).The 1-many GRAM submission is the trickier one. Instead of running GRAMon each remote CPU (i.e. a server), in Falkon we decided to make theremote CPUs clients, which communicated back to a GT4 instance tocollect work. This also avoided us having to run a full GT4 on eachremote CPU.

Can you give us more details about your deployment, such as networktopology, how many CPUs, LRM used (i.e. BOINC, PBS, Condor), is theclient submitting to GRAM multi-threaded, number of jobs injected intothe system over a period of time, min/average/max job run times, and howmuch control you have over the various pieces (which ones you havecontrol of changing if you need to)? Better understanding yourdeployment will help us better point you to a solution that is right foryou!


Ioan

Alexander Beck-Ratzka wrote:

On Sonntag, 20. Juli 2008 18:19:57 Ioan Raicu wrote:
Hi,
You are forgetting that in real Grid deployments, the majority of the
wait time will be in queue wait times in batch schedulers.  For example,
in some logs I looked at from 2005 from SDSC, I recall seeing queue wait
times of 6 hours on average over a 1 year period.  So, having some extra
latency on the order of 1~60 seconds is not a big deal when your average
job lengths are hours, or more.
This might be write for your usecase. However, there are also other usecasesaround in the grid world. We are running [EMAIL PROTECTED] as a task farmingapplication on the ressourece of D-Grid, and we consume per day about 100000CPU hours. So it is really a productive application. Because we aresubmitting hundred of jobs, the latency cannot be neglected, and it wold bereally helpful to reduce it to a time below 1 second. If you're looking intothe net traffic caused by globusrun-ws -submit, you can see thereare a lot ofcommunication cicles (I think it are 9) between the submitting and theexecution host. Is this really necessary? SOAP only requires one...
So please note: there is no "real Grid deployment" in that way, you'vementioned it. I think this problem will get still more bothersome, if ascheduler as e.g. Gridway is coming into the game.
Cheers

Alexander


--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: [EMAIL PROTECTED]
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================

Re: [gt-user] excessive latency

Reply via email to