Re: [OMPI devel] MALLOC_MMAP_MAX (and MALLOC_MMAP_THRESHOLD)

Barrett, Brian W Sat, 9 Jan 2010 22:48:01 -0500

We should absolutely not change this.  For simple applications, yes, things 
work if large blocks are allocated on the heap.  However, ptmalloc (and most 
allocators, really), can't rationally cope with repeated allocations and 
deallocations of large blocks.  It would be *really bad* (as we've seen before) 
to change the behavior of our version of ptmalloc from that which is provided 
by Linux.  Pain and suffering is all that path has ever lead to.


Just my $0.02, of course.

Brian

________________________________________
From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] On Behalf Of 
Eugene Loh [eugene....@sun.com]
Sent: Saturday, January 09, 2010 9:55 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] MALLOC_MMAP_MAX (and MALLOC_MMAP_THRESHOLD)

Jeff Squyres wrote:

>I'm not sure I follow -- are you saying that Open MPI is disabling the large 
>mmap allocations, and we shouldn't?
>
>
Basically the reverse.  The default (I think this means Linux, whether
with gcc, gfortran, Sun f90, etc.) is to use mmap to malloc large
allocations.  We don't change this, but arguably we should.

Try this:

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv) {
  size_t size, nextsize;
  void  *ptr, *nextptr;

  size = 1;
  ptr  = malloc(size);
  while ( size < 1000000 ) {
    nextsize = 1.1 * size + 1;
    nextptr  = malloc(nextsize);
    printf("%9ld %18lx %18lx %18lx\n", size, size, nextptr - ptr, ptr);
    size = nextsize;
    ptr  = nextptr ;
  }

  return 0;
}

Here is sample output:

   # bytes         #bytes (hex)           #bytes          ptr (hex)
                                       to next ptr
                                          (hex)

    58279               e3a7               e3b0             58f870
    64107               fa6b               fa80             59dc20
    70518              11376              11380             5ad6a0
    77570              12f02              12f10             5bea20
    85328              14d50              14d60             5d1930
    93861              16ea5              16eb0             5e6690
   103248              19350              19360             5fd540
   113573              1bba5              1bbb0             6168a0
   124931              1e803       2b3044655bc0             632450
   137425              218d1              22000       2b3044c88010
   151168              24e80              25000       2b3044caa010
   166285              2898d              29000       2b3044ccf010
   182914              2ca82              2d000       2b3044cf8010
   201206              311f6             294000       2b3044d25010
   221327              3608f              37000       2b3044fb9010
   243460              3b704              3c000       2b3044ff0010

So, below 128K allocations, pointers are allocated at successively
higher addresses, each one just barely far enough to make room for the
allocation.  E.g., an allocation of 0xE3A7 will push the "high-water
mark" up 0xE3B0 further.

Beyond 128K allocations, allocations are page aligned.  The pointers all
end in 0x010.  That is, whole numbers of pages are allocated and the
returned address is 16 bytes (0x10) into the first page.  The size of
the allocations are the requested amount, plus a few bytes of padding,
rounded up to the nearest whole page size multiple.

The motivation to change, in my case, is performance.  I don't know how
widespread this problem is, but...

>On Jan 8, 2010, at 9:25 AM, Sylvain Jeaugey wrote:
>
>
>>On Thu, 7 Jan 2010, Eugene Loh wrote:
>>
>>>setenv MALLOC_MMAP_MAX_        0
>>>setenv MALLOC_TRIM_THRESHOLD_ -1
>>>
>>>
>>But yes, this set of settings is the number one tweak on HPC code that I'm
>>aware of.
>>
>>
Wow!  I might vote for "compiling with -O", but let's not pick nits here.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] MALLOC_MMAP_MAX (and MALLOC_MMAP_THRESHOLD)

Reply via email to