Re: [O-MPI devel] couple of problems in openib mpool.
Hey Gleb, Sorry for the delay.. we have been doing a bit of reworking of the pml/btl so that the btl's can be shared outside of just the pml (collectives, etc). I have added the bug fix (old_reg). Will look at the assumption of non-null registration next. Thanks (and keep them coming ;-) , Galen On Aug 11, 2005, at 8:27 AM, Gleb Natapov wrote: Hello, There are couple of bugs/typos in openib mpool. First one is fixed by included patch. Second one is in function mca_mpool_openib_free(). This function assumes that registration is never NULL, but there are callers that think different (ompi/class/ompi_fifo.h, ompi/class/ompi_circular_buffer_fifo.h) Index: ompi/mca/mpool/openib/mpool_openib_module.c === --- ompi/mca/mpool/openib/mpool_openib_module.c (revision 6806) +++ ompi/mca/mpool/openib/mpool_openib_module.c (working copy) @@ -127,7 +127,7 @@ mca_mpool_base_registration_t* old_reg = *registration; void* new_mem = mpool->mpool_alloc(mpool, size, 0, registration); memcpy(new_mem, addr, old_reg->bound - old_reg->base); -mpool->mpool_free(mpool, addr, &old_reg); +mpool->mpool_free(mpool, addr, old_reg); return new_mem; } -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[O-MPI devel] build warnings..
Current build warnings: mca_base_parse_paramfile_lex.c:1664: warning: 'yy_flex_realloc' defined but not used qsort.c:163: warning: cast from pointer to integer of different size show_help_lex.c:1606: warning: 'yy_flex_realloc' defined but not used rmgr_proxy.c:237: warning: ISO C forbids conversion of object pointer to function pointer type rmgr_proxy.c:356: warning: ISO C forbids conversion of function pointer to object pointer type rmgr_urm.c:184: warning: ISO C forbids conversion of object pointer to function pointer type rmgr_urm.c:309: warning: ISO C forbids conversion of function pointer to object pointer type comm_cid.c:167: warning: comparison between signed and unsigned fake_stack.c:46: warning: no previous prototype for 'ompi_convertor_create_stack_with_pos_general'
Re: [O-MPI devel] build warnings..
On Aug 12, 2005, at 9:36 AM, Galen Shipman wrote: Current build warnings: mca_base_parse_paramfile_lex.c:1664: warning: 'yy_flex_realloc' defined but not used This one is pretty much impossible to fix (it's in a file generated by lex, and isn't easy to deal with. qsort.c:163: warning: cast from pointer to integer of different size This one is pretty difficult to get rid of without doing really dumb things. But that code really shouldn't be built on platforms where the native qsort works. I'll fix this today. show_help_lex.c:1606: warning: 'yy_flex_realloc' defined but not used Same as the other lex. rmgr_proxy.c:237: warning: ISO C forbids conversion of object pointer to function pointer type rmgr_proxy.c:356: warning: ISO C forbids conversion of function pointer to object pointer type rmgr_urm.c:184: warning: ISO C forbids conversion of object pointer to function pointer type rmgr_urm.c:309: warning: ISO C forbids conversion of function pointer to object pointer type comm_cid.c:167: warning: comparison between signed and unsigned This could have been due to my MAX_CID changes - I'll have a look and make it right. fake_stack.c:46: warning: no previous prototype for 'ompi_convertor_create_stack_with_pos_general' Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/
[O-MPI devel] OMPI 32bit on a 64bit Linux box
We've got a 64bit Linux (SUSE) box here. For a variety of reasons (Java, JNI, linking in with OMPI libraries, etc which I won't get into) I need to compile OMPI 32 bit (or get 64bit versions of a lot of other libraries). I get various compile errors when I try different things, but first let me explain the system we have: [sparkplug]~/ompi > uname -a Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64 x86_64 x86_64 GNU/Linux [sparkplug]~/ompi > [sparkplug]~/ompi > cat /etc/issue Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l). [sparkplug]~/ompi > I tried the obvious: ./configure CFLAGS=-m32 FFLAGS=-m32 .. The make then bailed out with compile errors: gcc -m32 -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing- prototypes -Wstrict-prototypes -Wcomment -pedantic -Werror- implicit-function-declaration -fno-strict-aliasing -c atomic-asm.s -o atomic-asm.o atomic-asm.s: Assembler messages: atomic-asm.s:6: Error: suffix or operands invalid for `push' atomic-asm.s:7: Error: suffix or operands invalid for `movq' atomic-asm.s:16: Error: suffix or operands invalid for `push' atomic-asm.s:17: Error: suffix or operands invalid for `movq' atomic-asm.s:26: Error: suffix or operands invalid for `push' atomic-asm.s:27: Error: suffix or operands invalid for `movq' atomic-asm.s:36: Error: suffix or operands invalid for `push' atomic-asm.s:37: Error: suffix or operands invalid for `movq' atomic-asm.s:38: Error: `-8(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:39: Error: `-12(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:40: Error: `-16(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:41: Error: `-16(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:42: Error: `-8(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:43: Error: `-12(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:45: Error: `(%rdx)' is not a valid 32 bit base/index expression atomic-asm.s:47: Error: `-24(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:48: Error: `-24(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:49: Error: `-28(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:50: Error: `-28(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:51: Error: `-12(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:54: Error: `-28(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:55: Error: `-28(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:64: Error: suffix or operands invalid for `push' atomic-asm.s:65: Error: suffix or operands invalid for `movq' atomic-asm.s:66: Error: `-8(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:67: Error: `-16(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:68: Error: `-24(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:69: Error: `-24(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:70: Error: `-8(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:71: Error: `-16(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:73: Error: `(%rdx)' is not a valid 32 bit base/index expression atomic-asm.s:76: Error: `-32(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:77: Error: `-32(%rbp)' is not a valid 32 bit base/ index expression atomic-asm.s:78: Error: `-16(%rbp)' is not a valid 32 bit base/ index expression make[2]: *** [atomic-asm.lo] Error 1 make[2]: Leaving directory `/home/ndebard/ompi/opal/asm' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/ndebard/ompi/opal' make: *** [all-recursive] Error 1 Greg Watson then suggested I add to me configure: --build=i586-suse-linux That got the Make further, but now it dies saying: /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin/ld: warning: i386 architecture of input file `../../../opal/.libs/libopal.a(memory.o)' is incompatible with i386:x86-64 output ../../../ompi/.libs/libmpi.a(mca_io_romio_dist_ad_nfs_read.o)(.text+0x106e): In function `mca_io_romio_dist_ADIOI_NFS_ReadStrided': /home/ndebard/ompi/ompi/mca/io/romio/romio-dist/adio/ad_nfs/mca_io_romio_dist_ad_nfs_read.c:230: undefined reference to `__divdi3' ../../../ompi/.libs/libmpi.a(mca_io_romio_dist_ad_nfs_read.o)(.text+0x108f):/home/ndebard/ompi/ompi/mca/io/romio/romio-dist/adio/ad_nfs/mca_io_romio_dist_ad_nfs_read.c:231: undefined reference to `__moddi3' ../../../ompi/.libs/libmpi.a(mca_io_romio_dist_ad_nfs_write.o)(.text+0xf76): In function `mca_io_romio_dist_ADIOI_NFS_WriteStrided': /home/ndebard/ompi/ompi/mca/io/romio/romio-dist/adio/ad_nfs/mca_io_romio_dist_ad_nfs_write.c:268: undefined reference to `__divdi3' ../../../ompi/.libs/libmpi.a(mca_io_romio_dist_ad_nfs_write.o)(.text+0xf97):/home/ndebard/ompi/ompi/mca/io/romio/romio-dist/adio/ad_nfs/mca_io_romio_dist_ad_nfs_write.c:269:
Re: [O-MPI devel] OMPI 32bit on a 64bit Linux box
On Aug 12, 2005, at 3:13 PM, Nathan DeBardeleben wrote: We've got a 64bit Linux (SUSE) box here. For a variety of reasons (Java, JNI, linking in with OMPI libraries, etc which I won't get into) I need to compile OMPI 32 bit (or get 64bit versions of a lot of other libraries). I get various compile errors when I try different things, but first let me explain the system we have: This goes on and on and on actually. And the 'is incompatible with i386:x86-64 output' looks to be repeated for every line before this error which actually caused the Make to bomb. Any suggestions at all? Surely someone must have tried to force OMPI to build in 32bit mode on a 64bit machine. I don't think anyone has tried to build 32 bit on an Opteron, which is the cause of the problems... I think I know how to fix this, but won't happen until later in the weekend. I can't think of a good workaround until then. Well, one possibility is to set the target like you were doing and disable ROMIO. Actually, you'll also need to disable Fortran 77. So something like: ./configure [usual options] --build=i586-suse-linux --disable-io- romio --disable-f77 might just do the trick. Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/
Re: [O-MPI devel] OMPI 32bit on a 64bit Linux box
Thanks, trying that now. While I'd like those things in the long run, they're not needed right now to test what I'm trying to test. Will let you know how it goes! (What's the problem, by the way?) -- Nathan Correspondence - Nathan DeBardeleben, Ph.D. Los Alamos National Laboratory Parallel Tools Team High Performance Computing Environments phone: 505-667-3428 email: ndeb...@lanl.gov - Brian Barrett wrote: On Aug 12, 2005, at 3:13 PM, Nathan DeBardeleben wrote: We've got a 64bit Linux (SUSE) box here. For a variety of reasons (Java, JNI, linking in with OMPI libraries, etc which I won't get into) I need to compile OMPI 32 bit (or get 64bit versions of a lot of other libraries). I get various compile errors when I try different things, but first let me explain the system we have: This goes on and on and on actually. And the 'is incompatible with i386:x86-64 output' looks to be repeated for every line before this error which actually caused the Make to bomb. Any suggestions at all? Surely someone must have tried to force OMPI to build in 32bit mode on a 64bit machine. I don't think anyone has tried to build 32 bit on an Opteron, which is the cause of the problems... I think I know how to fix this, but won't happen until later in the weekend. I can't think of a good workaround until then. Well, one possibility is to set the target like you were doing and disable ROMIO. Actually, you'll also need to disable Fortran 77. So something like: ./configure [usual options] --build=i586-suse-linux --disable-io- romio --disable-f77 might just do the trick. Brian
Re: [O-MPI devel] OMPI 32bit on a 64bit Linux box
On Aug 12, 2005, at 3:22 PM, Nathan DeBardeleben wrote: Thanks, trying that now. While I'd like those things in the long run, they're not needed right now to test what I'm trying to test. Will let you know how it goes! (What's the problem, by the way?) The problem is that I key off the target host string to decide what assembly to use for the atomic operations. For most 64 bit platforms, the architecture string is the same for 32/64 bit and then you use sizeof(long) to determine whether to use 32 or 64 bit instructions. So what I need to add to the configure script is a check if we're on x86_64 that if sizeof(long) == 4, we use the assembly for x86, not x86_64. Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/
Re: [O-MPI devel] OMPI 32bit on a 64bit Linux box
OK, so I reconfigured, made, etc: 137 14:29 ./configure CFLAGS=-m32 FFLAGS=-m32 --build=i586-suse-linux --enable-static --disable-shared --without-threads --prefix=/home/ndebard/local/ompi --with-devel-headers --disable-io-romio --disable-f77 138 14:48 make clean all install But mpicc now segfaults immediately: [sparkplug]~/ompi > /home/ndebard/local/ompi/bin/mpicc Segmentation fault [sparkplug]~/ompi > gdb /home/ndebard/local/ompi/bin/mpicc GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"...DW_FORM_strp pointing outside of .debug_str section [in module /home/ndebard/local/ompi/bin/mpicc] Using host libthread_db library "/lib64/tls/libthread_db.so.1". (gdb) run Starting program: /home/ndebard/local/ompi/bin/mpicc (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. 0x00408d4a in ?? () (gdb) where #0 0x00408d4a in ?? () Cannot access memory at address 0xbfffecf8 (gdb) [sparkplug]~/ompi > /home/ndebard/local/ompi/bin/mpic++ Segmentation fault [sparkplug]~/ompi > -- Nathan Correspondence - Nathan DeBardeleben, Ph.D. Los Alamos National Laboratory Parallel Tools Team High Performance Computing Environments phone: 505-667-3428 email: ndeb...@lanl.gov - Brian Barrett wrote: On Aug 12, 2005, at 3:22 PM, Nathan DeBardeleben wrote: Thanks, trying that now. While I'd like those things in the long run, they're not needed right now to test what I'm trying to test. Will let you know how it goes! (What's the problem, by the way?) The problem is that I key off the target host string to decide what assembly to use for the atomic operations. For most 64 bit platforms, the architecture string is the same for 32/64 bit and then you use sizeof(long) to determine whether to use 32 or 64 bit instructions. So what I need to add to the configure script is a check if we're on x86_64 that if sizeof(long) == 4, we use the assembly for x86, not x86_64. Brian
Re: [O-MPI devel] OMPI 32bit on a 64bit Linux box
That's a neat one. mpicc shouldn't care about any of this stuff -- it's a trivial C++ program that invokes none of the MCA framework stuff, etc. I'll try to replicate. Just out of curiosity -- do other C++ applications work nicely in 32 bit on that machine? (particularly ones that use std::vector and std::string) On Aug 12, 2005, at 5:02 PM, Nathan DeBardeleben wrote: OK, so I reconfigured, made, etc: 137 14:29 ./configure CFLAGS=-m32 FFLAGS=-m32 --build=i586-suse-linux --enable-static --disable-shared --without-threads --prefix=/home/ndebard/local/ompi --with-devel-headers --disable-io-romio --disable-f77 138 14:48 make clean all install But mpicc now segfaults immediately: [sparkplug]~/ompi > /home/ndebard/local/ompi/bin/mpicc Segmentation fault [sparkplug]~/ompi > gdb /home/ndebard/local/ompi/bin/mpicc GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"...DW_FORM_strp pointing outside of .debug_str section [in module /home/ndebard/local/ompi/bin/mpicc] Using host libthread_db library "/lib64/tls/libthread_db.so.1". (gdb) run Starting program: /home/ndebard/local/ompi/bin/mpicc (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. 0x00408d4a in ?? () (gdb) where #0 0x00408d4a in ?? () Cannot access memory at address 0xbfffecf8 (gdb) [sparkplug]~/ompi > /home/ndebard/local/ompi/bin/mpic++ Segmentation fault [sparkplug]~/ompi > -- Nathan Correspondence - Nathan DeBardeleben, Ph.D. Los Alamos National Laboratory Parallel Tools Team High Performance Computing Environments phone: 505-667-3428 email: ndeb...@lanl.gov - Brian Barrett wrote: On Aug 12, 2005, at 3:22 PM, Nathan DeBardeleben wrote: Thanks, trying that now. While I'd like those things in the long run, they're not needed right now to test what I'm trying to test. Will let you know how it goes! (What's the problem, by the way?) The problem is that I key off the target host string to decide what assembly to use for the atomic operations. For most 64 bit platforms, the architecture string is the same for 32/64 bit and then you use sizeof(long) to determine whether to use 32 or 64 bit instructions. So what I need to add to the configure script is a check if we're on x86_64 that if sizeof(long) == 4, we use the assembly for x86, not x86_64. Brian ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] OMPI 32bit on a 64bit Linux box
Actually, Brian just pointed out the problem -- you also need to set CXXFLAGS=-m32. On Aug 12, 2005, at 5:15 PM, Jeff Squyres wrote: That's a neat one. mpicc shouldn't care about any of this stuff -- it's a trivial C++ program that invokes none of the MCA framework stuff, etc. I'll try to replicate. Just out of curiosity -- do other C++ applications work nicely in 32 bit on that machine? (particularly ones that use std::vector and std::string) On Aug 12, 2005, at 5:02 PM, Nathan DeBardeleben wrote: OK, so I reconfigured, made, etc: 137 14:29 ./configure CFLAGS=-m32 FFLAGS=-m32 --build=i586-suse-linux --enable-static --disable-shared --without-threads --prefix=/home/ndebard/local/ompi --with-devel-headers --disable-io-romio --disable-f77 138 14:48 make clean all install But mpicc now segfaults immediately: [sparkplug]~/ompi > /home/ndebard/local/ompi/bin/mpicc Segmentation fault [sparkplug]~/ompi > gdb /home/ndebard/local/ompi/bin/mpicc GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"...DW_FORM_strp pointing outside of .debug_str section [in module /home/ndebard/local/ompi/bin/mpicc] Using host libthread_db library "/lib64/tls/libthread_db.so.1". (gdb) run Starting program: /home/ndebard/local/ompi/bin/mpicc (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. 0x00408d4a in ?? () (gdb) where #0 0x00408d4a in ?? () Cannot access memory at address 0xbfffecf8 (gdb) [sparkplug]~/ompi > /home/ndebard/local/ompi/bin/mpic++ Segmentation fault [sparkplug]~/ompi > -- Nathan Correspondence - Nathan DeBardeleben, Ph.D. Los Alamos National Laboratory Parallel Tools Team High Performance Computing Environments phone: 505-667-3428 email: ndeb...@lanl.gov - Brian Barrett wrote: On Aug 12, 2005, at 3:22 PM, Nathan DeBardeleben wrote: Thanks, trying that now. While I'd like those things in the long run, they're not needed right now to test what I'm trying to test. Will let you know how it goes! (What's the problem, by the way?) The problem is that I key off the target host string to decide what assembly to use for the atomic operations. For most 64 bit platforms, the architecture string is the same for 32/64 bit and then you use sizeof(long) to determine whether to use 32 or 64 bit instructions. So what I need to add to the configure script is a check if we're on x86_64 that if sizeof(long) == 4, we use the assembly for x86, not x86_64. Brian ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} jsquy...@lam-mpi.org {+} http://www.lam-mpi.org/
[O-MPI devel] Memory manager changes
Hi all - For those not on the telecon Tuesday, we finally broke down and decided we needed to do all the system nastiness to intercept free() and munmap() and the like for high speed interconnects so that we can do pinned page caching and not take the pinning performance hit on applications like NetPIPE (and, to be fair, many user applications). Unlike LAM, however, we're going to try to make this not be the center of all pain and suffering ;). While we'll support the ptmalloc2 trick that LAM and MPICH-gm use, it will not be on by default and we're trying to find better alternatives. Below are your current choices for intercepting memory releases back to the operating system. The default is malloc_hooks on platforms that support it when threads aren't enabled. Otherwise the current default is "none". In all cases, in addition to dealing with free() and realloc(), we provide intercepts for munmap() to catch the user doing his own memory management. We may also want to intercept SysV shared memory functions. You can choose exactly which "memory manager" to use with the --with- memory-manager=TYPE option to configure, where TYPE is one of "ptmalloc2", "malloc_hooks", "darwin7", or "ldpreload". Of course, you can also use --without-memory-manager or --with-memory- manager=none to completely disable the things. * PTMALLOC2 + Very fast implementation of the full malloc/free suite. Directly used by glibc as their memory manager. + Works properly in threaded environment + Only call unpin callbacks when giving memory back to the OS (ie, when sbrk() or munmap() are called) - Does not work properly in some situations (abacus linker tricks, for example) that appear to be within the spirit of using the MPI library - Does not work on many platforms (everywhere but linux, really) - Feels massively icky * MALLOC_HOOKS + Use the hooks proviced by ptmalloc2 (and therefore glibc) to get callbacks when free(), realloc(), etc are called + No "corner cases" that cause unexpected behavior like with ptmalloc2 - Does not support threads (disables itself if either progress or mpi threads are enabled) - Have to call unpin callbacks when memory is free()d or realloc()ed, not when giving back to OS - Very low performance impact (1-2%) on calling free() when there are no mpools registering callbacks * LDPRELOAD + Thread safe + No "corner cases" that cause unexpected behavior like with ptmalloc2 + Should work on every platform that supports LD Preload and dlsym() - Requires doing ldpreload tricks - On some platforms, have to call unpin callbacks when memory is free()d or realloc()ed, not when giving back to the OS - Did I mention, it requires doing ldpreload? + If LDPRELOAD doesn't succeed, opal can properly determine this and will just say free() interception is unavailable * DARWIN7 + Thread safe - Requires some nasty linker tricks to make work. User application must be linked with mpicc or a long list of special flags + If application is not linked with the special sauce, opal should be able to properly determine this and just say free() interception is unavailable. - Total hack of linker tricks LD Preload is not yet implemented, but should be by the end of the weekend. The initial version will most likely only support making callbacks every time free() / realloc() is called, rather than every time memory is given back to the OS. Not optimal, but better than nothing. I'm going to talk with some Darwin developers about better ways to do things on Darwin, but probably won't have any results on that front until sometime middle of next week. Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/
[O-MPI devel] Fwd: Memory manager changes
Brian, Sounds like I got off the call a bit too early ;-) Can we choose to use standard platform libraries, or are we pinning ourselves into a corner ? I.e., is this optional ? What sort of problems are we getting into playing with pre-load options ? I would be VERY careful here, and do plenty of testing, especially with c++ codes, before you decide to do this. We used to use some of these tricks in LA-MPI, but backed off because of loader ordering issues. As you can tell, I am VERY leery of these sort of tricks for a production grade bit of code. If it is easy to decide at run-time if to use these tricks (w/o a performance penalty), this is a different question. Rich Begin forwarded message: From: Brian Barrett Date: August 12, 2005 7:47:45 PM MDT To: Open MPI Developers Subject: [O-MPI devel] Memory manager changes Reply-To: Open MPI Developers Hi all - For those not on the telecon Tuesday, we finally broke down and decided we needed to do all the system nastiness to intercept free() and munmap() and the like for high speed interconnects so that we can do pinned page caching and not take the pinning performance hit on applications like NetPIPE (and, to be fair, many user applications). Unlike LAM, however, we're going to try to make this not be the center of all pain and suffering ;). While we'll support the ptmalloc2 trick that LAM and MPICH-gm use, it will not be on by default and we're trying to find better alternatives. Below are your current choices for intercepting memory releases back to the operating system. The default is malloc_hooks on platforms that support it when threads aren't enabled. Otherwise the current default is "none". In all cases, in addition to dealing with free() and realloc(), we provide intercepts for munmap() to catch the user doing his own memory management. We may also want to intercept SysV shared memory functions. You can choose exactly which "memory manager" to use with the --with- memory-manager=TYPE option to configure, where TYPE is one of "ptmalloc2", "malloc_hooks", "darwin7", or "ldpreload". Of course, you can also use --without-memory-manager or --with-memory- manager=none to completely disable the things. * PTMALLOC2 + Very fast implementation of the full malloc/free suite. Directly used by glibc as their memory manager. + Works properly in threaded environment + Only call unpin callbacks when giving memory back to the OS (ie, when sbrk() or munmap() are called) - Does not work properly in some situations (abacus linker tricks, for example) that appear to be within the spirit of using the MPI library - Does not work on many platforms (everywhere but linux, really) - Feels massively icky * MALLOC_HOOKS + Use the hooks proviced by ptmalloc2 (and therefore glibc) to get callbacks when free(), realloc(), etc are called + No "corner cases" that cause unexpected behavior like with ptmalloc2 - Does not support threads (disables itself if either progress or mpi threads are enabled) - Have to call unpin callbacks when memory is free()d or realloc()ed, not when giving back to OS - Very low performance impact (1-2%) on calling free() when there are no mpools registering callbacks * LDPRELOAD + Thread safe + No "corner cases" that cause unexpected behavior like with ptmalloc2 + Should work on every platform that supports LD Preload and dlsym() - Requires doing ldpreload tricks - On some platforms, have to call unpin callbacks when memory is free()d or realloc()ed, not when giving back to the OS - Did I mention, it requires doing ldpreload? + If LDPRELOAD doesn't succeed, opal can properly determine this and will just say free() interception is unavailable * DARWIN7 + Thread safe - Requires some nasty linker tricks to make work. User application must be linked with mpicc or a long list of special flags + If application is not linked with the special sauce, opal should be able to properly determine this and just say free() interception is unavailable. - Total hack of linker tricks LD Preload is not yet implemented, but should be by the end of the weekend. The initial version will most likely only support making callbacks every time free() / realloc() is called, rather than every time memory is given back to the OS. Not optimal, but better than nothing. I'm going to talk with some Darwin developers about better ways to do things on Darwin, but probably won't have any results on that front until sometime middle of next week. Brian -- Brian Barrett Open MPI developer http://www.open-mpi.org/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [O-MPI devel] Fwd: Memory manager changes
On Aug 12, 2005, at 9:43 PM, Rich L. Graham wrote: Sounds like I got off the call a bit too early ;-) Can we choose to use standard platform libraries, or are we pinning ourselves into a corner ? I.e., is this optional ? Yes - the code is all built around trying to use the standard platform. And yes, everything is optional. In many cases (pretty much everywhere but single threaded Linux), the default will be to not do any memory manager tricks at all. Of course, not having any memory manager hooks lessens the performance of the BTLs since we have to do pin/rdma pipelining, but that's the price we have to pay. What sort of problems are we getting into playing with pre-load options ? I would be VERY careful here, and do plenty of testing, especially with c++ codes, before you decide to do this. We used to use some of these tricks in LA- MPI, but backed off because of loader ordering issues. Agreed - I'm one of the ones who was very against doing it in the first place :). Currently, the default on everywhere but single threaded Linux is to not have any memory manager hooks at all. On single threaded Linux, we use the hooks provided by glibc for doing "something" before the actual free/realloc occurs. Because these are official, recommended ways of doing things, they should work on any C, C++, and Fortran codes, even if they are statically linked. I've tested them with C++ apps, and they work as the documentation implies they would. I don't think that the ldpreload tricks should ever be the default. I'd like to provide them, because on threaded builds (where the glibc hooks aren't available), they provide a much better solution than using ptmalloc2. The sysadmin/user would have to setup his environment to load the preload library. If the module fails to preload, there is a facility in place for the memory code to tell the mpools that there is no memory manager interrupt and to fall back to the unpin after use mode. Further, the ldpreload module (not yet committed, but half written) can run just fine even if the app started isn't an opal code (with little if any performance difference). I don't envision us ever explicitly setting the LD_PRELOAD in the pls components or anything like that. Instead, I see us documenting "Add this to your LD_PRELOAD or /etc/ld.preload and OMPI goes faster". As you can tell, I am VERY leery of these sort of tricks for a production grade bit of code. If it is easy to decide at run-time if to use these tricks (w/o a performance penalty), this is a different question. Some of these will be very difficult to turn off at runtime (the LD_PRELOAD probably being the exception - you can at least turn that off any time before the application starts running). However, I don't think this is a problem because the defaults are going to be so pessimistic that we shouldn't get in a situation where the user is going to have to turn them off. I'm thinking big, annoying warnings in the installation document about turning the less-safe ones on. Brian Begin forwarded message: From: Brian Barrett Date: August 12, 2005 7:47:45 PM MDT To: Open MPI Developers Subject: [O-MPI devel] Memory manager changes Reply-To: Open MPI Developers Hi all - For those not on the telecon Tuesday, we finally broke down and decided we needed to do all the system nastiness to intercept free() and munmap() and the like for high speed interconnects so that we can do pinned page caching and not take the pinning performance hit on applications like NetPIPE (and, to be fair, many user applications). Unlike LAM, however, we're going to try to make this not be the center of all pain and suffering ;). While we'll support the ptmalloc2 trick that LAM and MPICH-gm use, it will not be on by default and we're trying to find better alternatives. Below are your current choices for intercepting memory releases back to the operating system. The default is malloc_hooks on platforms that support it when threads aren't enabled. Otherwise the current default is "none". In all cases, in addition to dealing with free() and realloc(), we provide intercepts for munmap() to catch the user doing his own memory management. We may also want to intercept SysV shared memory functions. You can choose exactly which "memory manager" to use with the --with- memory-manager=TYPE option to configure, where TYPE is one of "ptmalloc2", "malloc_hooks", "darwin7", or "ldpreload". Of course, you can also use --without-memory-manager or --with-memory- manager=none to completely disable the things. * PTMALLOC2 + Very fast implementation of the full malloc/free suite. Directly used by glibc as their memory manager. + Works properly in threaded environment + Only call unpin callbacks when giving memory back to the OS (ie, when sbrk() or munmap() are called) - Does not work properly in some situations
Re: [O-MPI devel] Fwd: Memory manager changes
Sound reasonable - I am for being able to turn on optional things that will improve performance... Thanks, Rich On Aug 12, 2005, at 9:14 PM, Brian Barrett wrote: On Aug 12, 2005, at 9:43 PM, Rich L. Graham wrote: Sounds like I got off the call a bit too early ;-) Can we choose to use standard platform libraries, or are we pinning ourselves into a corner ? I.e., is this optional ? Yes - the code is all built around trying to use the standard platform. And yes, everything is optional. In many cases (pretty much everywhere but single threaded Linux), the default will be to not do any memory manager tricks at all. Of course, not having any memory manager hooks lessens the performance of the BTLs since we have to do pin/rdma pipelining, but that's the price we have to pay. What sort of problems are we getting into playing with pre-load options ? I would be VERY careful here, and do plenty of testing, especially with c++ codes, before you decide to do this. We used to use some of these tricks in LA- MPI, but backed off because of loader ordering issues. Agreed - I'm one of the ones who was very against doing it in the first place :). Currently, the default on everywhere but single threaded Linux is to not have any memory manager hooks at all. On single threaded Linux, we use the hooks provided by glibc for doing "something" before the actual free/realloc occurs. Because these are official, recommended ways of doing things, they should work on any C, C++, and Fortran codes, even if they are statically linked. I've tested them with C++ apps, and they work as the documentation implies they would. I don't think that the ldpreload tricks should ever be the default. I'd like to provide them, because on threaded builds (where the glibc hooks aren't available), they provide a much better solution than using ptmalloc2. The sysadmin/user would have to setup his environment to load the preload library. If the module fails to preload, there is a facility in place for the memory code to tell the mpools that there is no memory manager interrupt and to fall back to the unpin after use mode. Further, the ldpreload module (not yet committed, but half written) can run just fine even if the app started isn't an opal code (with little if any performance difference). I don't envision us ever explicitly setting the LD_PRELOAD in the pls components or anything like that. Instead, I see us documenting "Add this to your LD_PRELOAD or /etc/ld.preload and OMPI goes faster". As you can tell, I am VERY leery of these sort of tricks for a production grade bit of code. If it is easy to decide at run-time if to use these tricks (w/o a performance penalty), this is a different question. Some of these will be very difficult to turn off at runtime (the LD_PRELOAD probably being the exception - you can at least turn that off any time before the application starts running). However, I don't think this is a problem because the defaults are going to be so pessimistic that we shouldn't get in a situation where the user is going to have to turn them off. I'm thinking big, annoying warnings in the installation document about turning the less-safe ones on. Brian Begin forwarded message: From: Brian Barrett Date: August 12, 2005 7:47:45 PM MDT To: Open MPI Developers Subject: [O-MPI devel] Memory manager changes Reply-To: Open MPI Developers Hi all - For those not on the telecon Tuesday, we finally broke down and decided we needed to do all the system nastiness to intercept free() and munmap() and the like for high speed interconnects so that we can do pinned page caching and not take the pinning performance hit on applications like NetPIPE (and, to be fair, many user applications). Unlike LAM, however, we're going to try to make this not be the center of all pain and suffering ;). While we'll support the ptmalloc2 trick that LAM and MPICH-gm use, it will not be on by default and we're trying to find better alternatives. Below are your current choices for intercepting memory releases back to the operating system. The default is malloc_hooks on platforms that support it when threads aren't enabled. Otherwise the current default is "none". In all cases, in addition to dealing with free() and realloc(), we provide intercepts for munmap() to catch the user doing his own memory management. We may also want to intercept SysV shared memory functions. You can choose exactly which "memory manager" to use with the --with- memory-manager=TYPE option to configure, where TYPE is one of "ptmalloc2", "malloc_hooks", "darwin7", or "ldpreload". Of course, you can also use --without-memory-manager or --with-memory- manager=none to completely disable the things. * PTMALLOC2 + Very fast implementation of the full malloc/free suite. Directly used by glibc as their memory manager. + Works properly in threaded environment + Only call unpin callbacks when giving memory back to the OS (i