Re: [gt-user] Globus not for real-time application?

Ioan Raicu Thu, 10 Jul 2008 11:29:00 -0700

Hi Denny,

With the multi-level scheduling approach (what Stu called "pilot" jobs)that Falkon uses (which builds on top of GT4, and makes extensive use ofweb services), you can get single task (aka job) latencies of100ms~500ms depending on the security used, but things parallelize quitewell, which means that if you submit many short tasks, the amortizedlatencies are on the order of 1~10 ms. We have run workloads with 100mstask execution times on 100s of CPUs with extremely good efficiency(90%+). By the time you hit 1 second tasks, we can get 95% utilization(on 100s of CPUs), and with several second tasks, we can get 99%+utilization. Our original paper on Falkon has a nice figure that showsefficiency as a function of number of CPUs and task lengths(http://people.cs.uchicago.edu/~iraicu/publications/2007_SC07_Falkon.pdf,Figure 6). Also, Falkon's web page with all related papers, mailinglists, and source can be found athttp://dev.globus.org/wiki/Incubator/Falkon.


If you have any other questions, let me know.

Ioan

Stuart Martin wrote:

Hi Denny,
For a simple /bin/date job without delegation, staging, cleanup,submitted to Fork, our performance measurements for 4.0.7 were ~1.5seconds. So you are close to our results. The difference could bethe testing hosts. Another possibility is that the first jobsubmitted to a container incurs some service activation costs. Sosubsequent jobs should perform better. Was the below job the firstone submitted to the container?
Authentication is costly, but also the gram service maintains the jobinfo/state in a file on disk. And then there is the execution of theapplication. When profiling, we have not seen any obviousbottlenecks. So, I think 1.5 seconds is the cost of the gram service.
I'm not sure if this fits your scenario, but for a client that ismanaging 1K/10K/100K <1 second execution jobs, methods have beenimplemented to submit a "pilot" job through gram. The pilot jobstarts up under the user account on the remote compute resource andconnects back to the client. The client then sends jobs directly tothe pilot service (not through gram). gram is used to bootstrap thisservice on the remote compute resource. Condor-G does this throughglide-ins. Falkon is another implementation that has proven to scalevery well and has some impressive results. More can be read here:http://dev.globus.org/wiki/Incubator/Falkon
Cheers,
-Stu
On Jul 10, 2008, at Jul 10, 1:50 AM, <[EMAIL PROTECTED]><[EMAIL PROTECTED]> wrote:
Hi all,
I found that it costs 3-4s by average for Globus to execute a simplejob, and a little longer when there are data stage-in and stage out.As in the example below, the real cost time is 0m2.510s, but the userCPU time is just 0m0.430s. How do you think the extra time is usedfor, Globus authentication? Communication of network?
My other question is, does this mean Globus is not suitable forreal-time application (less than 1s response time)?
Example:
-bash-3.00$ time globusrun-ws -submit -c /bin/true
Submitting job...Done.
Job ID: uuid:a877ba4c-4e47-11dd-9443-224466880045
Termination time: 07/11/2008 06:15 GMT
Current job state: CleanUp
Current job state: Done
Destroying job...Done.

real    0m2.510s
user    0m0.430s
sys     0m0.030s

Regards,
Denny

Re: [gt-user] Globus not for real-time application?

Reply via email to