Hi Alexander:

A few comments on this topic:

a) We have put a fair bit of time into streamlining GRAM4 job submission. We'll keep working on it, and it will surely improve somewhat further in the future, but it will always be somewhat expensive because we are using SOAP, performing authentication/ authorization, etc. So the use case that we support is not "lots of few second jobs."

b) That said, I note that the *throughput* that GRAM4 achieves is better than the *latency*. See http://www.globus.org/alliance/publications/papers/TG07-GRAM-comparison-final.pdf for a somewhat dated discussion of performance issues--note the results in Table 2 and Table 3. These are old data--GRAM4 has improved since then, but observe that back in 2007, we saw that when streaming multiple jobs, per-job cost was a few seconds.

c) If the use case in question is "lots of few second jobs" then the approach that we recommend is to use multi-level scheduling, e.g., via Falkon, MyCluster, etc.

d) If the use case is "lots of many-second jobs" and the concern is the rate at which your submitting client can send jobs to the many CPUs of D-Grid, then we should consult with the GRAM team to see whether there is some alternative way of implementing your submitting client to increase throughput.

Regards -- Ian.



On Jul 22, 2008, at 4:24 AM, Alexander Beck-Ratzka wrote:

On Sonntag, 20. Juli 2008 18:19:57 Ioan Raicu wrote:
Hi,
You are forgetting that in real Grid deployments, the majority of the
wait time will be in queue wait times in batch schedulers. For example, in some logs I looked at from 2005 from SDSC, I recall seeing queue wait times of 6 hours on average over a 1 year period. So, having some extra latency on the order of 1~60 seconds is not a big deal when your average
job lengths are hours, or more.

This might be write for your usecase. However, there are also other usecases around in the grid world. We are running [EMAIL PROTECTED] as a task farming application on the ressourece of D-Grid, and we consume per day about 100000
CPU hours. So it is really a productive application. Because we are
submitting hundred of jobs, the latency cannot be neglected, and it wold be really helpful to reduce it to a time below 1 second. If you're looking into the net traffic caused by globusrun-ws -submit, you can see thereare a lot of
communication cicles (I think it are 9) between the submitting and the
execution host. Is this really necessary? SOAP only requires one...

So please note: there is no "real Grid deployment" in that way, you've
mentioned it. I think this problem will get still more bothersome, if a
scheduler as e.g. Gridway is coming into the game.

Cheers

Alexander


Reply via email to