Hi,

demingyin wrote:
Hi all,

Jan & Martin & Tino, thanks for your help. I tried MPICH and also read the
instructions of 'Submitting Condor Jobs to Globus Toolkit 4' at
https://bi.offis.de/wisent/tiki-index.php?page=Condor-GT4. But I'm still
quite confusing.

What I really want to do is just similar as described in the beginning of
'Chapter 4 Execution Management' in 'GT4_Primer_0.6.pdf', that is,
So you want to:
. Make a program available as a network service (with size varying)
. Dispatch
. Run an executable on a remote computer.
. Run an parallel program across multiple distributed computers.
. Run a set of loosely coupled tasks
. Steer a computation (?)
These tasks all fall within the purview of execution management.

I'm just going to try out the idea of distributed computing, no need of
parallel computing. For example, the server sends out some commands to the
grid nodes, and after the grid nodes execute the commands, results are
collected back to the server.
If your command executions are relatively long (minutes for small cluster, hours for large clusters), then interacting directly with GRAM/PBS/Condor/SGE is a good approach; interfacing to GRAM4 is a good way to make your system work on most Grids, as they usually have GRAM4 deployed as a front end to the LRMs running at each site. If your command executions are too short to get good utilization of your resources (due to the high per job overhead of production LRMs), or you are finding queue times to be too large, then a glide-in approach might work better where you allocate some resources at a coarse granularity,, but then manage them yourself and dispatch fine granular tasks to each processor. Condor has support for glide-ins, a project that comes to mind is MyCluster (http://www.tacc.utexas.edu/services/userguides/mycluster/). Another project that supports glide-ins is Falkon (http://dev.globus.org/wiki/Incubator/Falkon), a Globus Incubator project, which also supports the efficient dispatch and execution of short tasks. Falkon supports the following things from your list above:

   * Dispatch
   * Run an executable on a remote computer.
   * Run a set of loosely coupled tasks

Falkon supports a "black box" execution typically found in batch-scheduled systems. There are projects out there that allow a black box application to be converted to a network or web service by defining the inputs and outputs of the application. This takes care of "Make a program available as a network service (with size varying)"

To support "Steer a computation (?)", you'd like need a workflow system. A few that come to mind is Swift, Taverna, etc.

Falkon currently does not support (its possible that Condor glide-ins do, but I am not sure):

   * Run an parallel program across multiple distributed computers.


Hope all this helps!

Cheers,
Ioan


Have I made myself clearly? Hope for your guidance and instructions.
Regards,
Denny(Deming Yin)
The grid really makes me confusing and exhausted...:)

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of Jan Ploski
Sent: Thursday, 8 May 2008 5:44 PM
To: demingyin
Cc: [email protected]
Subject: Re: [gt-user] some problem of WS GRAM

[EMAIL PROTECTED] schrieb am 05/08/2008 08:19:04 AM:

Hi all,

These days I?m trying to use WS GRAM to submit some jobs. But I?m still not quite understanding the mechanism WS GRAM. I can now submit some dummy job on my Grid node, such as,
?globusrun-ws -submit -c /bin/touch touched_it?
Or ?globusrun-ws -submit ?S ?f a.rsl?.

But for example, if I want to add 1 to 2n, and in order to speed up the process, I want to add 1 to n on Grid Node1, and n+1 to 2n on Grid Node2. How could I do that? Maybe first I should write a web service following online Documentation ?Submitting a job in Java using WS GRAM?, and then how
WS GRAM can distribute the task to different Grid Nodes for me?

Can anyone give some directions? Some detail example would be much appreciated.

There are several ways, none too easy.

1. Write a small MPI program and submit a job of type 'mpi' to run it. Within your program you distribute work to the nodes using MPI calls and gather the results. This is the best performing solution, but it forces you to program in C/Fortran, and it depends on MPI being available and correctly configured at the target site. 2. Submit two jobs, each of which does a part of the computation and stores away the results (say, in a file). Submit a third job which combines the results. Because you have three interdependent jobs, you will already need a metascheduler/workflow engine to coordinate the submissions automatically. 3. Submit a job of type 'multiple' which then does all the process coordination on site. Because the job type 'multiple' simply runs the specified executable, with the same command-line arguments, on n nodes, you will need some mechanism to compute the process numbers within that executable in order to allocate work to processes. It should also synchronize executions because you want the results to be combined in the end. You could use our 'MultiJob' module, described at https://bi.offis.de/wisent/tiki-index.php?page=Condor-GT4-BigJobs The examples provided on this page assume that you're using Condor as your job submission client, however, they could be adapted to use globusrun-ws (see also http://www.teragridforum.org/mediawiki/index.php?title=WS-Gram)

Regards,
Jan Ploski




--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: [EMAIL PROTECTED]
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================


Reply via email to