Re: [gt-user] some problem of WS GRAM

Ioan Raicu Wed, 28 May 2008 10:38:56 -0700

Hi,

demingyin wrote:

Hi all,


Jan & Martin & Tino, thanks for your help. I tried MPICH and also read the
instructions of 'Submitting Condor Jobs to Globus Toolkit 4' at
https://bi.offis.de/wisent/tiki-index.php?page=Condor-GT4. But I'm still
quite confusing.

What I really want to do is just similar as described in the beginning of
'Chapter 4 Execution Management' in 'GT4_Primer_0.6.pdf', that is,
So you want to:
. Make a program available as a network service (with size varying)
. Dispatch
. Run an executable on a remote computer.
. Run an parallel program across multiple distributed computers.
. Run a set of loosely coupled tasks
. Steer a computation (?)
These tasks all fall within the purview of execution management.

I'm just going to try out the idea of distributed computing, no need of
parallel computing. For example, the server sends out some commands to the
grid nodes, and after the grid nodes execute the commands, results are
collected back to the server.

If your command executions are relatively long (minutes for smallcluster, hours for large clusters), then interacting directly withGRAM/PBS/Condor/SGE is a good approach; interfacing to GRAM4 is a goodway to make your system work on most Grids, as they usually have GRAM4deployed as a front end to the LRMs running at each site. If yourcommand executions are too short to get good utilization of yourresources (due to the high per job overhead of production LRMs), or youare finding queue times to be too large, then a glide-in approach mightwork better where you allocate some resources at a coarse granularity,,but then manage them yourself and dispatch fine granular tasks to eachprocessor. Condor has support for glide-ins, a project that comes tomind is MyCluster(http://www.tacc.utexas.edu/services/userguides/mycluster/). Anotherproject that supports glide-ins is Falkon(http://dev.globus.org/wiki/Incubator/Falkon), a Globus Incubatorproject, which also supports the efficient dispatch and execution ofshort tasks. Falkon supports the following things from your list above:


   * Dispatch
   * Run an executable on a remote computer.
   * Run a set of loosely coupled tasks

Falkon supports a "black box" execution typically found inbatch-scheduled systems. There are projects out there that allow ablack box application to be converted to a network or web service bydefining the inputs and outputs of the application. This takes care of"Make a program available as a network service (with size varying)"

To support "Steer a computation (?)", you'd like need a workflowsystem. A few that come to mind is Swift, Taverna, etc.

Falkon currently does not support (its possible that Condor glide-insdo, but I am not sure):


   * Run an parallel program across multiple distributed computers.


Hope all this helps!

Cheers,
Ioan

Have I made myself clearly? Hope for your guidance and instructions.
Regards,
Denny(Deming Yin)
The grid really makes me confusing and exhausted...:)

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of Jan Ploski
Sent: Thursday, 8 May 2008 5:44 PM
To: demingyin
Cc: [email protected]
Subject: Re: [gt-user] some problem of WS GRAM

[EMAIL PROTECTED] schrieb am 05/08/2008 08:19:04 AM:
Hi all,
These days I?m trying to use WS GRAM to submit some jobs. But I?mstill not quite understanding the mechanism WS GRAM. I can nowsubmit some dummy job on my Grid node, such as,
?globusrun-ws -submit -c /bin/touch touched_it?
Or ?globusrun-ws -submit ?S ?f a.rsl?.
But for example, if I want to add 1 to 2n, and in order to speed upthe process, I want to add 1 to n on Grid Node1, and n+1 to 2n onGrid Node2. How could I do that?Maybe first I should write a web service following onlineDocumentation ?Submitting a job in Java using WS GRAM?, and then how
WS GRAM can distribute the task to different Grid Nodes for me?
Can anyone give some directions? Some detail example would be muchappreciated.
There are several ways, none too easy.
1. Write a small MPI program and submit a job of type 'mpi' to run it.Within your program you distribute work to the nodes using MPI calls andgather the results. This is the best performing solution, but it forcesyou to program in C/Fortran, and it depends on MPI being available andcorrectly configured at the target site.2. Submit two jobs, each of which does a part of the computation andstores away the results (say, in a file). Submit a third job whichcombines the results. Because you have three interdependent jobs, you willalready need a metascheduler/workflow engine to coordinate the submissionsautomatically.3. Submit a job of type 'multiple' which then does all the processcoordination on site. Because the job type 'multiple' simply runs thespecified executable, with the same command-line arguments, on n nodes,you will need some mechanism to compute the process numbers within thatexecutable in order to allocate work to processes. It should alsosynchronize executions because you want the results to be combined in theend. You could use our 'MultiJob' module, described athttps://bi.offis.de/wisent/tiki-index.php?page=Condor-GT4-BigJobsThe examples provided on this page assume that you're using Condor as yourjob submission client, however, they could be adapted to use globusrun-ws(see also http://www.teragridforum.org/mediawiki/index.php?title=WS-Gram)
Regards,
Jan Ploski


--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: [EMAIL PROTECTED]
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================

Re: [gt-user] some problem of WS GRAM

Reply via email to