Hi Alexander:
A few comments on this topic:
a) We have put a fair bit of time into streamlining GRAM4 job
submission. We'll keep working on it, and it will surely improve
somewhat further in the future, but it will always be somewhat
expensive because we are using SOAP, performing authentication/
authorization, etc. So the use case that we support is not "lots of
few second jobs."
b) That said, I note that the *throughput* that GRAM4 achieves is
better than the *latency*. See http://www.globus.org/alliance/publications/papers/TG07-GRAM-comparison-final.pdf
for a somewhat dated discussion of performance issues--note the
results in Table 2 and Table 3. These are old data--GRAM4 has improved
since then, but observe that back in 2007, we saw that when streaming
multiple jobs, per-job cost was a few seconds.
c) If the use case in question is "lots of few second jobs" then the
approach that we recommend is to use multi-level scheduling, e.g., via
Falkon, MyCluster, etc.
d) If the use case is "lots of many-second jobs" and the concern is
the rate at which your submitting client can send jobs to the many
CPUs of D-Grid, then we should consult with the GRAM team to see
whether there is some alternative way of implementing your submitting
client to increase throughput.
Regards -- Ian.
On Jul 22, 2008, at 4:24 AM, Alexander Beck-Ratzka wrote:
On Sonntag, 20. Juli 2008 18:19:57 Ioan Raicu wrote:
Hi,
You are forgetting that in real Grid deployments, the majority of the
wait time will be in queue wait times in batch schedulers. For
example,
in some logs I looked at from 2005 from SDSC, I recall seeing queue
wait
times of 6 hours on average over a 1 year period. So, having some
extra
latency on the order of 1~60 seconds is not a big deal when your
average
job lengths are hours, or more.
This might be write for your usecase. However, there are also other
usecases
around in the grid world. We are running [EMAIL PROTECTED] as a task
farming
application on the ressourece of D-Grid, and we consume per day
about 100000
CPU hours. So it is really a productive application. Because we are
submitting hundred of jobs, the latency cannot be neglected, and it
wold be
really helpful to reduce it to a time below 1 second. If you're
looking into
the net traffic caused by globusrun-ws -submit, you can see thereare
a lot of
communication cicles (I think it are 9) between the submitting and the
execution host. Is this really necessary? SOAP only requires one...
So please note: there is no "real Grid deployment" in that way, you've
mentioned it. I think this problem will get still more bothersome,
if a
scheduler as e.g. Gridway is coming into the game.
Cheers
Alexander