On May 27, 2008, at 12:20 AM, Yuriy wrote:

We have 10 node cluster with 2
quad-core processors per node, and when number of jobs is greater then
160


Why are you treating globus job submissions like an extended batch queue?

There are a number of things you can do to cut down on the number of open connections that are tied up just waiting for batch slots to open on your resource. Note you have overloaded the total number of Globus-managed jobs compared to your simultaneous processing capabilities by two-to-one, in tis case. This does not make any sense -- half of your connections are simply waiting for resources to become available.

Here are some options:

(1) Use an external scheduler such as Condor-G to throttle the number of submissions to your remote resource to a reasonable value.

(2) Alternatively, use a pilot-job or glide-in job submission scenario to send a single job to the remote grid resource and interact with it locally to handle your local submissions

(3) Do a combination of the above (either or both) in conjunction with an adaptive workflow management tool (e.g. Pegasus, Kepler, Gridway, etc.). The resulting combination can be tuned to adapt to changes in availability among multiple remote resources, allowing you to move jobs to ehere resources are available.

This is one of the most often-made mistakes from my point of view in handling multiple grid jobs - to expect that multiple grid job submissions will act like they would submitting to a simple local queue. They key point to understand, I believe, is that a grid job is "live" while it is communicating with your resource, and to use one of the above strategies to minimize the number of live connections.

P.S.: I have encouraged the Globus team privately to consider integrating pilot-job or glide-in capabilities more closely into the core software to provide the user with easier hooks for this, and to minimize the need for users to reinvent this type of workflow control. There could be other ideas out there to handle this also. Anyone eant to chime in?

Hope this helps

Alan Sill, Ph.D
TIGRE Senior Scientist, High Performance Computing Center
Adjunct Professor of Physics
TTU

====================================================================
:  Alan Sill, Texas Tech University  Office: Admin 233, MS 4-1167  :
:  e-mail: [EMAIL PROTECTED]   ph. 806-742-4350  fax 806-742-4358  :
====================================================================


Reply via email to