Hi,

demingyin wrote:
Hi all,

Thank you all for your reply.
I'm using the default GRAM-fork.
My idea of using Grid to achieve real-time result originates from Google
service. It's said that every search request is processed by about 1,000
machines located in Google's data centres. And the point is the request
result will be return with one second (most time).

My situation here is the cost 3-4s by average is got on one Grid node (one
supercomputer, based on an IGTF approved x509 Certificate Authority) of Grid
Australia. And it costs 2-3s on my PC - Globus 4.0.5 Binary under Debian 3.1
r0a "Sarge". But I think the difference may come from the communication
cost.

But the point here is that it seems cost at least 2s.
This is when you use GRAM4 with fork, but fork will execute the search request on the local machine that runs GRAM. In practice, GRAM would interface with other lower level LRMs, such as Condor, PBS, SGE, etc... which means that the latency increases further. In an idle local cluster that runs GRAM4 and PBS, we see latencies in the 10 seconds to 60 seconds to execute a no-op program (i.e. sleep 0). This gets compounded even more when the cluster is busy, and jobs have to wait in the LRM's queue. I have seen traces from various clusters that shows the job queue time range in the 7+ hours. All this makes the use of Grids difficult for applications that require real time low latency interactions.
The result of
Falkon-like light-weight multi-level scheduling approach is really good. But
my question is, since,
        Authentication cost still exist (I can't change the security
solution)
        The application execution time is to some extent fixed
Can Falkon reduce the schedule time dramatically by submitting GRAM with the
light-weight scheduler to 1 second including the authentication cost and
application execution time?
That is exactly what it can do for you! You incur the higher cost once, at the time that you get the initial set of resources via GRAM and your favorite LRM. Then, once Falkon is started and managing your resources, then any single request with a work payload of a few hundred milliseconds, should complete end to end in less than a second. The actual overheads will vary with the security mechanism you use and CPU speed of the machines used, but the overheads should be all less than 1 second in all cases in an idle system. As the load and concurrency increases, you might see overheads increase as well. If you need to support some QoS (i.e. requests handled in less than 1 sec), then you might need to implement some way to reject some requests once there are too many concurrent requests.
I also have tested Condor as the local scheduler, but it seems it's quite
high-throughput, but not high-efficiency with medium-scale data volume.
It probably still cannot give you the sub 1 second latency that you are looking for.
Does someone know some other light-weight Grid middleware which can do the
security and scheduling jobs?
Falkon and Condor glide-ins are the only generic methods to let you do multi-level scheduling. There might be other solutions out there, but they usually tightly coupled to some specific application.

Cheers,
Ioan
Regards,
Denny (Deming Yin)


-----Original Message-----
From: Stuart Martin [mailto:[EMAIL PROTECTED]
Sent: Friday, 11 July 2008 12:51 AM
To: <[EMAIL PROTECTED]>
Cc: Stuart Martin; [email protected]
Subject: Re: [gt-user] Globus not for real-time application?

Hi Denny,

For a simple /bin/date job without delegation, staging, cleanup,
submitted to Fork, our performance measurements for 4.0.7 were ~1.5
seconds.  So you are close to our results.  The difference could be
the testing hosts.  Another possibility is that the first job
submitted to a container incurs some service activation costs.  So
subsequent jobs should perform better.  Was the below job the first
one submitted to the container?

Authentication is costly, but also the gram service maintains the job
info/state in a file on disk.  And then there is the execution of the
application.  When profiling, we have not seen any obvious
bottlenecks.  So, I think 1.5 seconds is the cost of the gram service.

I'm not sure if this fits your scenario, but for a client that is
managing 1K/10K/100K <1 second execution jobs, methods have been



implemented to submit a "pilot" job through gram.  The pilot job
starts up under the user account on the remote compute resource and
connects back to the client.  The client then sends jobs directly to
the pilot service (not through gram).  gram is used to bootstrap this
service on the remote compute resource.  Condor-G does this through
glide-ins.   Falkon is another implementation that has proven to scale
very well and has some impressive results.  More can be read here:
http://dev.globus.org/wiki/Incubator/Falkon

Cheers,
-Stu

On Jul 10, 2008, at Jul 10, 1:50 AM, <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]
 > wrote:

Hi all,

I found that it costs 3-4s by average for Globus to execute a simple
job, and a little longer when there are data stage-in and stage out.
As in the example below, the real cost time is 0m2.510s, but the
user CPU time is just 0m0.430s. How do you think the extra time is
used for, Globus authentication? Communication of network?

My other question is, does this mean Globus is not suitable for real-
time application (less than 1s response time)?

Example:
-bash-3.00$ time globusrun-ws -submit -c /bin/true
Submitting job...Done.
Job ID: uuid:a877ba4c-4e47-11dd-9443-224466880045
Termination time: 07/11/2008 06:15 GMT
Current job state: CleanUp
Current job state: Done
Destroying job...Done.

real    0m2.510s
user    0m0.430s
sys     0m0.030s

Regards,
Denny





--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: [EMAIL PROTECTED]
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================


Reply via email to