Let me see if I understand this right. Your setup is such that you are running a task farming grid, where each compute resource (i.e. 1 CPU, or 1 node) has GRAM installed and waiting to receive work? Or do you have a gateway node that has GRAM configured and is waiting for work, which then gets passed down to another LRM (i.e. BOINC) to dispatch out to the remote CPUs? So, there are 2 paradigms here: 1 to 1 GRAM submission, and 1 to many GRAM submission. The 1-1 GRAM submission is what I was referring to below, when I said that its OK to have 1~60 sec latencies if your jobs are hours long each. Note that GRAM parallelizes quite well, so if your submission client is multi-threaded, you should be able to get around 1 job/sec throughput (which translates to about 1 sec amortized latencies). The 1-many GRAM submission is the trickier one. Instead of running GRAM on each remote CPU (i.e. a server), in Falkon we decided to make the remote CPUs clients, which communicated back to a GT4 instance to collect work. This also avoided us having to run a full GT4 on each remote CPU.

Can you give us more details about your deployment, such as network topology, how many CPUs, LRM used (i.e. BOINC, PBS, Condor), is the client submitting to GRAM multi-threaded, number of jobs injected into the system over a period of time, min/average/max job run times, and how much control you have over the various pieces (which ones you have control of changing if you need to)? Better understanding your deployment will help us better point you to a solution that is right for you!

Ioan

Alexander Beck-Ratzka wrote:
On Sonntag, 20. Juli 2008 18:19:57 Ioan Raicu wrote:
Hi,
You are forgetting that in real Grid deployments, the majority of the
wait time will be in queue wait times in batch schedulers.  For example,
in some logs I looked at from 2005 from SDSC, I recall seeing queue wait
times of 6 hours on average over a 1 year period.  So, having some extra
latency on the order of 1~60 seconds is not a big deal when your average
job lengths are hours, or more.

This might be write for your usecase. However, there are also other usecases around in the grid world. We are running [EMAIL PROTECTED] as a task farming application on the ressourece of D-Grid, and we consume per day about 100000 CPU hours. So it is really a productive application. Because we are submitting hundred of jobs, the latency cannot be neglected, and it wold be really helpful to reduce it to a time below 1 second. If you're looking into the net traffic caused by globusrun-ws -submit, you can see thereare a lot of communication cicles (I think it are 9) between the submitting and the execution host. Is this really necessary? SOAP only requires one...

So please note: there is no "real Grid deployment" in that way, you've mentioned it. I think this problem will get still more bothersome, if a scheduler as e.g. Gridway is coming into the game.

Cheers

Alexander


--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: [EMAIL PROTECTED]
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================


Reply via email to