Re: [gt-user] How to set cores-per-node in WS job submission?

Stuart Martin Thu, 08 May 2008 08:09:54 -0700


On May 8, 2008, at May 8, 5:36 AM, Steve White wrote:

Stuart

On  7.05.08, Stuart Martin wrote:

On May 7, 2008, at May 7, 7:19 AM, Steve White wrote:

These are things we'd like to do, but have not been able to get tothem.


        1) the lack of a "prologue/epilogue script" in the job submission


We had a prototype of a GRAM service java RM API that included a
prologue and epilogue callouts, but we have not been able to give it
attention to get it in release form.

I'm not sure I understand "RM API" and "callouts".

I have expanded on my idea of a good user interface (for JDD anyway)
in Jan's bug report: http://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=5698

Excellent! Thanks for the ideas on the prologue/epilogue. Yeah, Ithink that could be very useful.

Thusfar, the memory directives in GRAM4 JDD have not been used much(to my knowledge). Changes based on your and Jan's suggestions wouldbe welcome. I have not been able to digest the entire conversationyet into something concrete.

        2) generic control of RAM-per-process


We would like to move to JSDL where I would think this would be
covered, but after scanning, it looks like it isn't.  jsdl posix has
MemoryLimit, but that is for the job and not for each process in the
job.  So I don't think even JSDL provides this.


This would suffice if implemented properly.

The memory per process would be
        mem_per_process = MemoryLimit / count

The number of cores to assign per node on cluster with multi-corenodes

could be calculated as

        available_RAM_per_node / mem_per_process

Cheers!

From the JSDL 1.0 doc

jsdl-posix:MemoryLimit

8.1.14 MemoryLimit Element

8.1.14.1 Definition
This element is a positive integer that describes the maximum amount
of physical memory that
the job should use when executing. The amount is given in bytes. If
this is not present then the
consuming system MAY choose its default value10.
<<<<


I regard these omissions as bugs.


Generally, it is bad policy to add another layer of software to
compensate
for bugs in a lower layer.  It is to put bandages on bandages.

On the other hand, if we can fix these middle-layer problems, much
better
higher-layer software can be made, much more easily.

Cheers!


On  6.05.08, Jan Ploski wrote:

Steve White wrote:

Jan,

I agree with your assessment that the need to adjust the memory
use per
process is a general one in cluster job submission, and that it is
in
some way implemented by any underlying job management system, and
that
these extensions ought not to be PBS-specific.

I also looked at your "messy solution".  (The code looks very
professional,
really.)  It won't do for my purposes, because I need to present a
minimal,
easily understood solution.

Let me explain my situation:

None of the compute resources is under my control. I can pointout

problems to admins, that is all.

I have been assigned two jobs.

I and our users are familiar with doing conventional cluster job
submission. One job was to bring them into the grid fold, showing
them the
advantages
of globusrun-ws.  If it can be shown to be really a cross-platform
solution, giving them the ability to (almost) effortlessly switch
between grid clusters, the effort will be a success.

My other job is to write a report on practical MPI job submission
over
the grid.

We have come a long way, but still have to deal with a couple of
practical
details.  At this point, it looks like both of them will end up as
work-arounds to incomplete implementation of a job submission
interface
in Globus.

If with a future release of Globus, these issues can be dealt
with, grid
job submission will look very attractive to real researchers.


Hi,

Based on my experience with Globus, you might be following a wrong
route
(the route to disenchantment). I view Globus more as a middleware
that
has to be adapted (as in: "wrapped around" or "slightly modified")
according to your users' needs and which plays an important role
behind
the scenes, but it probably should not be exposed directly to users
as a
drop-in replacement for their familiar job submission tools.

There is a reason for that more important than the limitations you
have
discovered so far: Globus doesn't ship with command-line job
management

commands on par with those of TORQUE/Maui, Condor or SGE. If youlet

users submit jobs with globus-job-submit, the next thing they are
going

to ask you is "how can I see what jobs I have submitted", "howcan Icancel the job or resubmit it elsewhere", "is my job running ornot",

"why is my job not running", "when is my job going to start", etc.

You need something in front of Globus to make your users' life
bearable.
Some projects lean toward application-specific web portals (I think
that's AstroGrid's approach). In our project, we have deployed a
largely
application-agnostic frontend based on Condor-G, but even so there
was
some customization and some user training required. The Condor-G

approach might be relevant for you because it covers the scenarioofmaking a transparent transition from a local batch system to aGrid -

the Condor tools for submitting jobs and status querying are pretty
much
the same regardless of whether your job goes to a machine from a
local
pool (equivalent to an SGE or PBS-managed cluster) or to a pool of
Globus hosts. (In fact, Condor can submit to GT2 [gLite], GT4,
Unicore,
and some more Grid middlewares.)

The disadvantage of Condor is that it is a rather huge software
product
and trying to understand all of it can be daunting. Still, I
suppose you
could get the Grid submission piece of it running in a couple of
hours
if you wish to give it a try (by following our tutorials and asking
questions where necessary).

Regards,
Jan Ploski


--
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
-  -
Steve White
+49(331)7499-202
e-Science / AstroGrid-D                                   Zi. 35
Bg. 20
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
-  -
Astrophysikalisches Institut Potsdam (AIP)
An der Sternwarte 16, D-14482 Potsdam

Vorstand: Prof. Dr. Matthias Steinmetz, Peter A. Stolz

Stiftung privaten Rechts, Stiftungsverzeichnis Brandenburg: III/
7-71-026
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
-  -

--

- - - - - - - - - - - - - - - - - - - - - - -- -Steve White+49(331)7499-202e-Science / AstroGrid-D Zi. 35Bg. 20- - - - - - - - - - - - - - - - - - - - - - -- -

Astrophysikalisches Institut Potsdam (AIP)
An der Sternwarte 16, D-14482 Potsdam

Vorstand: Prof. Dr. Matthias Steinmetz, Peter A. Stolz

Stiftung privaten Rechts, Stiftungsverzeichnis Brandenburg: III/7-71-026- - - - - - - - - - - - - - - - - - - - - - -- -

Re: [gt-user] How to set cores-per-node in WS job submission?

Reply via email to