Steve White <[EMAIL PROTECTED]> schrieb am 05/08/2008 02:34:48 PM:

> Jan,
> 
> On  8.05.08, Jan Ploski wrote:
> > [EMAIL PROTECTED] schrieb am 05/08/2008 12:36:23 PM:
> > 
> > > > >   2) generic control of RAM-per-process
> > > > 
> > > > We would like to move to JSDL where I would think this would be 
> > > > covered, but after scanning, it looks like it isn't.  jsdl posix 
has 
> > > > MemoryLimit, but that is for the job and not for each process in 
the 
> > > > job.  So I don't think even JSDL provides this.
> > 
> > > > 8.1.14.1 Definition
> > > > This element is a positive integer that describes the maximum 
amount 
> > > > of physical memory that the job should use when executing.
> > > > The amount is given in bytes. If this is not present then the
> > > > consuming system MAY choose its default value10.
> > > > 
> > > 
> > > This would suffice if implemented properly. 
> > > 
> > > The memory per process would be 
> > >    mem_per_process = MemoryLimit / count
> > > 
> > > The number of cores to assign per node on cluster with multi-core 
nodes
> > > could be calculated as
> > > 
> > >    available_RAM_per_node / mem_per_process
> > 
> > Steve,
> > 
> > I'm not sure whether it would be a correct implementation or another 
quick 
> > hack. JSDL says nothing about the relationship of a POSIXApplication 
to a 
> > group of processes launched by MPI. As a matter of fact, it is 
remarkably 
> > silent about the relationships between jobs and processes and says 
nothing 
> > about relationships among processes. Maybe noone familiar with MPI 
> > participated in writing JSDL or maybe - more likely - the tough issue 
was 
> > put off "until later".
> > 
> These are just suggestions.  As to correctness (as you point out) the
> question is "according to what"?  My intent here is to help the Globus 
> developers to find a solution, by explaining the need.
> 
> The need is pretty clear, although some of the details are fuzzy.
> 
> These days, a cluster user can effectively quadruple their memory 
> per process, say by specifying the processes per node.  For certain
> applications, this can be crucial.
> 
> > Anyway, one can reason about the exectuion of an MPI application as a 
> > scenario involving the execution of n instances of a POSIXApplication. 

> > This interpretation would fit quite well the actual MPI runner 
> > implementations whose job is always to launch n processes of the 
> > user-specified executable on m <= n machines, using whatever 
> > system-specific means are available. Therefore, I would suggest that 
if 
> > JSDL is used, the MemoryLimit in the POSIXApplication element is not 
some 
> > aggregate "physical memory that the job should use when executing" to 
be 
> > divided among processes using a rule of thumb. Instead, treat it as a 
> > specification which applies to each single process of a multi-process 
job; 
> > it *is*, after all, a description of an executable POSIX process. For 
> > maximum flexibility, one should probably be able to specify a 
different 
> > POSIXApplication element for each MPI process.
> > 
> > Apart from these considerations, I am not sure if your "RAM per 
process" 
> > requirement is covered by the intent of MemoryLimit. MemoryLimit 
basically 
> > translates to "ulimit -m" in bash (JSDL authors also forgot to mention 

> > whether the hard or soft limit was meant). Is this what you are 
looking 
> > for? Or do you want to guarantee that a certain amount of memory can 
be 
> > allocated by a process without incurring paging activity during its 
entire 
> > execution? Perhaps both?
> > 
> The user wants to set a minimum amount of RAM available to each process. 

> That might be specified in more than one way, in principle.

Ok, that would be the second of my options above.

> The issues on conventional clusters are different from those on
> shared-memory machines, but the request is the same. 
> 
> There are at least two uses of these parameters.  They are unfortunately
> not clearly stated.
> 
> One is a contract with a resource allocator/scheduler.  For instance, a
> maximum memory requirement, like a maximum walltime requirement, can be
> used by the allocator to efficiently and effectively allocate resources 
for
> and schedule the job. 

That would be a promise on behalf of a job not to exceed a certain limit. 
That's what I meant with my first option, and that's what ulimit in bash 
(more generally, the setrlimit POSIX system call) and MemoryLimit in JSDL 
were intended for.

> Another is what you call a "hard limit": my user wants a certain amount
> of RAM per process, no less.

No, my distinction of 'soft limit' and 'hard limit' is also related to the 
POSIX case above. A soft limit may be raised by a process by explicitly 
using the setrlimit system call. A hard limit can only be raised by the 
administrator.

> This is like the requirement of number of
> processes.  In our case, the job should fail immediately (with an
> informative message) if that requirement can't be satisfied.

That would be indeed a requirement and corresponding to your desired 
"minimum amount of RAM available to each process".

> The Globus WS-GRAM documentation is completely silent on the intended
> purpose of such parameters as minMemory.  Better documentation alone
> would help a great deal.

Indeed, the description of "purpose" is lacking. minMemory is a good 
example:

"Explicitly set the minimum amount of memory for a single execution of the 
executable. The units is in Megabytes. The value will go through an atoi() 
conversion in order to get an integer. If the GRAM scheduler cannot set 
minMemory, then an error will be returned."

To explain the "purpose", one has to rely on some "causal model", where 
user actions (of setting that particular parameter) affect the 
user-observable states of some user-perceivable entities... so that the 
user can effectively simulate the effect of an action in his mind. To 
enable this kind of reasoning, up-front crisp definitions of new concepts, 
their relationships, and connections to older concepts already familiar to 
the user (such as POSIX processes?) would be necessary. There is a 
rudimentary attempt at it in the WS-GRAM documentation ("Key Concepts"), 
but it fails. The document is littered with importantly sounding 
abstractions from developer jargon without ever striking connections to 
the user's prior knowledge. Fundamental concepts such as "job" are invoked 
without ever being explained. The silent assumption is that the users have 
experience with the same batch processing software as the author and will 
intuitively associate the same meaning with that concept. But as we see 
just avoiding talking about implementation details doesn't mean that these 
implementation details become irrelevant to users.

Apart from that: the atoi() remark from the description is clearly 
out-of-scope. The reference to a GRAM scheduler not being able to "set 
minMemory" - why and when would that condition occur and what could a user 
do about it? What is guaranteed by Globus in this context?

> Maybe I'll make another bug report about documentation.

You'd probably need to describe a desired correction in a bug report. I 
think this is non-trivial, as the issue is not spelling or unfortunate 
grammar. Here, the intended and actual meanings of concepts are concerned, 
and to clarify them you'd have to recursively clarify many other concepts. 
I could imagine a whole project around that... After all, this is more or 
less what standardization/specification efforts are about.

> The amount of memory used by a scientific process is typically quite 
well
> known by its user.  It is not at all magical, and is something they
> regularly calculate.  There may be cases where some experimentation is
> required, but then they still know the value.

Yes, what they want is a guarantee from the system "that the given amount 
of memory can be allocated by a process without degrading performance".

Regards,
Jan Ploski

Reply via email to