date:20051207

Re: [O-MPI devel] [PATH] ompi_info doesn't show use_mem_hooks flag

2005-12-07 Thread Gleb Natapov

On Tue, Dec 06, 2005 at 11:07:44AM -0500, Brian Barrett wrote:
> On Dec 6, 2005, at 10:53 AM, Gleb Natapov wrote:
> 
> > On Tue, Dec 06, 2005 at 08:33:32AM -0700, Tim S. Woodall wrote:
> >>> Also memfree hooks decrease cache efficiency, the better solution  
> >>> would
> >>> be to catch brk() system calls and remove memory from cache only  
> >>> then,
> >>> but there is no way to do it for now.
> >>
> >> We are look at other options, including catching brk/munmap system  
> >> calls, and
> >> will be experimenting w/ these on the trunk.
> >>
> > This will be really interesting. How are you going to catch brk/munmap
> > without kernel help? Last time I checked preload tricks don't work if
> > syscall is done from inside libc itself.
> 
> All of the tricks we are looking at assume that nothing in libc calls  
> munmap.  

glibc does call mmap/munmap internally for big allocations as strace of
this program shows:

int main ()
{
void *p = malloc (1024*1024);
free (p);
}

>  We can successfully catch free() calls from inside libc  
> without any problems.  The LAM/MPI team and Myricom (with MPICH-gm)  
> have been doing this for many years without any problems.  On the  
> small percentage of MPI applications that require some linker tricks  
> (some of the commercial apps are this way), we won't be able to  
> intercept any free/munmap calls, so we're going to fall back to our  
> RDMA pipeline algorithm.
> 
Yes, but catching free is not good enough. This way we sometimes evict
cache entries that may safely remains in the cache. Ideally we should be 
able to catch events that return memory to OS (munmap/brk) and remove the 
memory from cache only then.

--
Gleb.

Re: [O-MPI devel] [PATH] ompi_info doesn't show use_mem_hooks flag

2005-12-07 Thread Brian Barrett



On Dec 7, 2005, at 9:44 AM, Gleb Natapov wrote:


On Tue, Dec 06, 2005 at 11:07:44AM -0500, Brian Barrett wrote:

On Dec 6, 2005, at 10:53 AM, Gleb Natapov wrote:


On Tue, Dec 06, 2005 at 08:33:32AM -0700, Tim S. Woodall wrote:

Also memfree hooks decrease cache efficiency, the better solution
would
be to catch brk() system calls and remove memory from cache only
then,
but there is no way to do it for now.


We are look at other options, including catching brk/munmap system
calls, and
will be experimenting w/ these on the trunk.

This will be really interesting. How are you going to catch brk/ 
munmap
without kernel help? Last time I checked preload tricks don't  
work if

syscall is done from inside libc itself.


All of the tricks we are looking at assume that nothing in libc calls
munmap.


glibc does call mmap/munmap internally for big allocations as  
strace of

this program shows:

int main ()
{
void *p = malloc (1024*1024);
free (p);
}


Ah, yes, I wasn't clear.  On Linux, we actually ship our own version  
of ptmalloc2 (the allocator used by glibc on Linux).  We use the  
standard linker search order tricks to have the linker choose our  
versions of malloc, calloc, realloc, valloc, and free, which are from  
ptmalloc2.  We've modified our version of ptmalloc2 such that any  
time it calls mmap or sbrk with a positive number, it then  
immediately allows the cache to know about the allocation.  Any time  
it's about to call munmap or sbrk with a negative number, it informs  
the cache code before giving the memory back to the OS.  We also  
catch mmap and munmap so that we can track when the user calls mmap /  
munmap.  Note that we play with ptmalloc2's code such that it calls  
our mmap (which either uses the syscall interface directly or calls  
__mmap depending on what the system supports), so we don't intercept  
that call to mmap twice or anything like that.


This works pretty well (like I said - it's worked fine for LAM and  
MPICH-gm for years), but has the problem of requiring the user to  
either use the wrapper compilers or add the -lmpi -lorte -lopal to  
the link line (ie, can't use shared library dependencies to load in  
libopal.so) or our ptmalloc2 / mmap / munmap isn't used.  We can  
detect that this happened pretty easily and then we fall back to the  
pipelined RDMA code that doesn't offer the same performance but also  
doesn't have a pinning problem.



 We can successfully catch free() calls from inside libc
without any problems.  The LAM/MPI team and Myricom (with MPICH-gm)
have been doing this for many years without any problems.  On the
small percentage of MPI applications that require some linker tricks
(some of the commercial apps are this way), we won't be able to
intercept any free/munmap calls, so we're going to fall back to our
RDMA pipeline algorithm.


Yes, but catching free is not good enough. This way we sometimes evict
cache entries that may safely remains in the cache. Ideally we  
should be
able to catch events that return memory to OS (munmap/brk) and  
remove the

memory from cache only then.


This is essentially what we do on Linux - we only tell the rcache  
code about allocations / deallocations when we are talking about  
getting memory from or giving memory back to the operating system.


On Mac OS X / Darwin, due to their two level namespaces, we can't  
replace malloc / free with a customized version of the Darwin  
allocator like we could with ptmalloc2.  There are some things you  
can do to simulate such behavior, but it requires linking in a flat  
namespace and doing some other things that nearly the Darwin  
engineers to pass out when I was talking to them about said tricks.   
So instead, we use the Darwin hooks for catching malloc / free /  
etc.  It's not optimal, but it's the best we can do in the  
situation.  And it doesn't force us to link all OMPI applications in  
a flat namespace, which is always nice.  Of course, we still  
intercept mmap / munmap in the tradition linker tricks style.  But  
again, there are very few function calls in libSystem.dylib that call  
mmap that we care about (malloc / free are already taken care of by  
the standard hooks), so this doesn't cause a problem.


Hopefully this made some sense.  If not, on to the next round of e- 
mails :).


Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/

[O-MPI devel] Fwd: (j3.2005) Re: Derived types according to MPI2

2005-12-07 Thread Craig Rasmussen

Begin forwarded message:

From: Aleksandar Donev 
Date: November 21, 2005 9:30:18 AM MST
To: J3 
Subject: (j3.2005) Re: Derived types according to MPI2

Hello,

Malcolm Cohen wrote:

Which just goes to show that the authors of MPI2 didn't understand
Fortran, since that is completely and utterly false in every sense
that matters.
Yes, but the interesting thing is neither me nor Van were aware of  
what

the standard actually allows in terms of derived types and the storage
for the components, and presumably we know Fortran better. Can storage
for the components be separated from the scalar derived type itself?
This probably makes no visible difference for scalars, but for arrays
it does. Again, I am asking about what STORAGE_SIZE for derived types
should mean.

Dan Nagle wrote:

Please be aware that the "external world" of the MPI standard
is really the virtual machine of the C standard.

Yes, of course, I am certainly not proposing binding to hardware.

When defining a programming language, the "needless abstraction"

I should have qualified with "some needless abstractions". Of course
abstractions are good, especially when it does not matter to the user
how something is done as long as it is done well. But if you want to
pass an array of derived types to a parallel IO routine that is not
compiled by your super-smart Fortran compiler that chooses to scatter
the components across virtual-address space (yes, I mean virtual),  
then

you do NOT want that abstraction.

It is about choice. Leave preaching to the preachers. Programming is a
profession for a reason---programmers are experienced and educated and
understand the issues and don't need lectures on abstractions.

Aleks

[O-MPI devel] Fwd: (j3.2005) Re: Derived types according to MPI2

2005-12-07 Thread Craig Rasmussen

Begin forwarded message:

From: Bill Long 
Date: November 21, 2005 11:03:46 AM MST
To: Malcolm Cohen 
Cc: J3 
Subject: (j3.2005) Re: Derived types according to MPI2
Reply-To: lo...@cray.com

Malcolm Cohen wrote:

Aleksandar Donev said:

(like MPI standard).
Gak. Just because MPI is a load of dingo's kidneys doesn't mean  
everyone else should make a horrible mess.

MPI is a bad idea that spun out of control.  It is mainly useful as  
an example of what to avoid.  It certainly is diametrically opposed  
to the programmer productivity goals being pushed by DARPA.  One  
hopes that the combination of Fortran 2008 and UPC will finally let  
users abandon this archaic monstrosity.

Cheers,
Bill

-- Bill Long lo...@cray.com Fortran Technical Support & voice:  
651-605-9024 Bioinformatics Software Development fax: 651-605-9142  
Cray Inc., 1340 Mendota Heights Rd., Mendota Heights, MN, 55120

[O-MPI devel] Fwd: (j3.2005) Re: Derived types according to MPI2

2005-12-07 Thread Craig Rasmussen

Begin forwarded message:

From: Malcolm Cohen 
Date: November 21, 2005 11:23:59 AM MST
To: Aleksandar Donev 
Cc: J3 
Subject: (j3.2005) Re: Derived types according to MPI2

Aleksandar Donev said:
Yes, but the interesting thing is neither me nor Van were aware of  
what
the standard actually allows in terms of derived types and the  
storage
for the components, and presumably we know Fortran better. Can  
storage

One might have hoped so.

for the components be separated from the scalar derived type itself?

Hey, when *I* am the Fortran processor there's no contiguous storage,
or for that matter addressable storage!  Don't take too limited a view
of current "hard" ware.

how something is done as long as it is done well. But if you want to
pass an array of derived types to a parallel IO routine that is not
compiled by your super-smart Fortran compiler that chooses to scatter
the components across virtual-address space (yes, I mean virtual),  
then

you do NOT want that abstraction.

You cannot be serious.  You do realise that there is no requirement on
any array even on intrinsic data types to contain the "actual data".
Is that a problem in practice?  No of course not.

The Fortran standard doesn't mention virtual addressing, physical
addressing or any of these things.  Is that a problem?  No.

What the standard should do (and usually does) is to specify the  
behaviour
of the Fortran "virtual machine", i.e. the meaning of the program.   
How

that program gets mapped to hardware is way outside the scope of the
standard.

It is about choice. Leave preaching to the preachers. Programming  
is a
profession for a reason---programmers are experienced and educated  
and

understand the issues and don't need lectures on abstractions.

Apparently not.

Cheers,
--
...Malcolm Cohen, NAG Ltd., Oxford, U.K.
   (malc...@nag.co.uk)

__ 
__

This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
__ 
__

Re: [O-MPI devel] [PATH] ompi_info doesn't show use_mem_hooks flag

Re: [O-MPI devel] [PATH] ompi_info doesn't show use_mem_hooks flag

[O-MPI devel] Fwd: (j3.2005) Re: Derived types according to MPI2

[O-MPI devel] Fwd: (j3.2005) Re: Derived types according to MPI2

[O-MPI devel] Fwd: (j3.2005) Re: Derived types according to MPI2

5 matches

Site Navigation

Mail list logo

Footer information