[OMPI devel] shared-memory allocations
For shared memory communications, each on-node connection (non-self, sender-receiver pair) gets a circular buffer during MPI_Init(). Each CB requires the following allocations: *) ompi_cb_fifo_wrapper_t (roughly 64 bytes) *) ompi_cb_fifo_ctl_t head (roughly 12 bytes) *) ompi_cb_fifo_ctl_t tail (roughly 12 bytes) *) queue (roughly 1024 bytes) Importantly, the current code lays these four allocations out on three separate pages. (The tail and queue are aggregated together.) So, for example, that "head" allocation (12 bytes) ends up consuming a full page. As one goes to more and more on-node processes -- say, for a large SMP or a multicore system -- the number of non-self connections grows as n*(n-1). So, these circular-buffer allocations end up consuming a lot of shared memory. For example, for a 4K pagesize and n=512 on-node processes, the circular buffers consume 3 Gbyte of memory -- 90% of which is empty and simply used for page alignment. I'd like to aggregate more of these allocations so that: *) shared-memory consumption is reduced *) the number of allocations (and hence the degree of lock contention) during MPI_Init is reduced Any comments? I'd like to understand the original rationale for these page alignments. I expect this is related to memory placement of pages. So, I imagine three scenarios. Which is it? A) There really is a good reason for each allocation to have its own page and any attempt to aggregate is doomed. B) There is actual benefit for placing things carefully in memory, but substantial aggregation is still possible. That is, for n processes, we need at most n different allocations -- not 3*n*(n-1). C) There is no actual justification for having everything on different pages. That is, allowing different parts of a FIFO CB to be mapped differently to physical memory sounded to someone like a good idea at the time, but no one really did any performance measurements to justify this. Or, if they did, it was only on one platform and we have no evidence that the same behavior exists on all platforms. Personally, I've played with some simple experiments on one (or more?) platforms and found no performance variations due to placement of shared variables that two processes use for communication. I guess it's possible that data is moving cache-to-cache and doesn't care where the backing memory is. Note that I only want to reduce the number of page-aligned allocations. I'd preserve cacheline alignment. So, no worry about false sharing due to a sender thrashing on one end of a FIFO and a receiver on the other.
Re: [OMPI devel] RFC: merge windows branch into trunk
Ralph, we delayed the COB for this to 9.12., announced yesterday to prepare to commit today. We updated to get new buglets that were fixed, tested twice on Win (shared&static) and Linux to see that nothing breaks Now we are ready to commit and just as well get a r20106 which touches quite a code-base once again ,-] Thanks, Rainer On Donnerstag, 20. November 2008, Ralph Castain wrote: > HmmmI was just typing this up when Tim's note hit. I also have two > concerns that somewhat echo his: > > 1. since nearly everyone is at SC08, and since next week is a holiday, > the timing of this merge is poor. I would really urge that you delay > it until at least Dec 5 so people actually know about it - and have > time to even think about it > > 2. how does this fit into our overall release schedule? There was talk > at one time (when we thought 1.3 was going out soon) about having a > short release cycle to get Windows support out for 1.4. Now this is > coming into the trunk even before 1.3 goes out. > > So is 1.3 going to have a lifecycle of a month? Or are we going to > delay 1.3 (if it even needs to be delayed) so it can include this code? > > Reason I ask: last time we rolled Windows support into the system it > created a complete code fork, making support for the current stable > release nearly impossible. There generated a lot of unhappiness and > argument within the community until we finally released a new version. > > From what I have seen as we've discussed things during devel, these > are fairly well-contained changes. However, it -will- make maintaining > 1.3 more difficult if people attempt to do it the old way - making > changes in the trunk and patching across to 1.3. If we instead use > isolated 1.3 branches for maintaining the code, then this isn't an > issue. > > Merits more thought than one week can provide. > > Ralph > > On Nov 20, 2008, at 6:53 AM, Tim Mattox wrote: > > I have two concerns. First is that we really need to focus on > > getting 1.3 stable and released. My second concern with > > this is how will it effect merging of bugfixes for 1.3 from the > > trunk once we release 1.3. Will the following modified files > > cause merge conflicts for CMRs? How big is this diff, > > can you send it to the list, or otherwise make it available? > > > >> M ompi/runtime/ompi_mpi_init.c > >> M opal/event/event.c > >> M opal/event/WIN32-Code/win32.c > >> M opal/mca/base/mca_base_param.c > >> M opal/mca/installdirs/windows/opal_installdirs_windows.c > >> M opal/runtime/opal_cr.c > >> M opal/win32/ompi_misc.h > >> M opal/win32/win_compat.h > >> M orte/mca/plm/ccp/plm_ccp_component.c > >> M orte/mca/plm/ccp/plm_ccp_module.c > >> M orte/mca/plm/process/plm_process_module.c > >> M orte/mca/ras/ccp/ras_ccp_component.c > >> M orte/mca/ras/ccp/ras_ccp_module.c > >> M orte/runtime/orte_wait.c > >> M orte/tools/orterun/orterun.c > >> M orte/util/hnp_contact.c > > > > I would ask that you consider breaking these > > modifications into parts that "could" be harmlessly > > brought over independently to 1.3, if a subsequent > > non-windows bugfix to one of those files needs to > > be brought over that will only merge cleanly if some > > of your changes to the same file are also brought over. > > For example, it would be a real pain to have to use > > patchfiles to resolve merge conflicts simply because > > of an #ifdef or white-space change here or there. > > Hopefully that made sense... > > > > Although I don't use windows myself, I appreciate your > > and others' efforts to expand the number of platforms > > we can run on. Great work! > > -- > > Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ > > tmat...@gmail.com || timat...@open-mpi.org > >I'm a bright... http://www.the-brights.net/ > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Dipl.-Inf. Rainer Keller http://www.hlrs.de/people/keller HLRS Tel: ++49 (0)711-685 6 5858 Nobelstrasse 19 Fax: ++49 (0)711-685 6 5832 70550 Stuttgartemail: kel...@hlrs.de Germany AIM/Skype:rusraink
Re: [OMPI devel] RFC: merge windows branch into trunk
On Dec 10, 2008, at 2:01 PM, Rainer Keller wrote: Ralph, we delayed the COB for this to 9.12., announced yesterday to prepare to commit today. We updated to get new buglets that were fixed, tested twice on Win (shared&static) and Linux to see that nothing breaks Sounds great! Now we are ready to commit and just as well get a r20106 which touches quite a code-base once again ,-] Actually, r20106 is pretty well confined to the iof area (the changes outside iof are rather trivial) and mostly just restores what was there a few days ago. So I would be surprised to see a conflict other than perhaps how Windows handles iof. Glad to see this come over! Should be an interesting few days of MTT results... :-)) Ralph Thanks, Rainer On Donnerstag, 20. November 2008, Ralph Castain wrote: HmmmI was just typing this up when Tim's note hit. I also have two concerns that somewhat echo his: 1. since nearly everyone is at SC08, and since next week is a holiday, the timing of this merge is poor. I would really urge that you delay it until at least Dec 5 so people actually know about it - and have time to even think about it 2. how does this fit into our overall release schedule? There was talk at one time (when we thought 1.3 was going out soon) about having a short release cycle to get Windows support out for 1.4. Now this is coming into the trunk even before 1.3 goes out. So is 1.3 going to have a lifecycle of a month? Or are we going to delay 1.3 (if it even needs to be delayed) so it can include this code? Reason I ask: last time we rolled Windows support into the system it created a complete code fork, making support for the current stable release nearly impossible. There generated a lot of unhappiness and argument within the community until we finally released a new version. From what I have seen as we've discussed things during devel, these are fairly well-contained changes. However, it -will- make maintaining 1.3 more difficult if people attempt to do it the old way - making changes in the trunk and patching across to 1.3. If we instead use isolated 1.3 branches for maintaining the code, then this isn't an issue. Merits more thought than one week can provide. Ralph On Nov 20, 2008, at 6:53 AM, Tim Mattox wrote: I have two concerns. First is that we really need to focus on getting 1.3 stable and released. My second concern with this is how will it effect merging of bugfixes for 1.3 from the trunk once we release 1.3. Will the following modified files cause merge conflicts for CMRs? How big is this diff, can you send it to the list, or otherwise make it available? M ompi/runtime/ompi_mpi_init.c M opal/event/event.c M opal/event/WIN32-Code/win32.c M opal/mca/base/mca_base_param.c M opal/mca/installdirs/windows/opal_installdirs_windows.c M opal/runtime/opal_cr.c M opal/win32/ompi_misc.h M opal/win32/win_compat.h M orte/mca/plm/ccp/plm_ccp_component.c M orte/mca/plm/ccp/plm_ccp_module.c M orte/mca/plm/process/plm_process_module.c M orte/mca/ras/ccp/ras_ccp_component.c M orte/mca/ras/ccp/ras_ccp_module.c M orte/runtime/orte_wait.c M orte/tools/orterun/orterun.c M orte/util/hnp_contact.c I would ask that you consider breaking these modifications into parts that "could" be harmlessly brought over independently to 1.3, if a subsequent non-windows bugfix to one of those files needs to be brought over that will only merge cleanly if some of your changes to the same file are also brought over. For example, it would be a real pain to have to use patchfiles to resolve merge conflicts simply because of an #ifdef or white-space change here or there. Hopefully that made sense... Although I don't use windows myself, I appreciate your and others' efforts to expand the number of platforms we can run on. Great work! -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... http://www.the-brights.net/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Dipl.-Inf. Rainer Keller http://www.hlrs.de/people/keller HLRS Tel: ++49 (0)711-685 6 5858 Nobelstrasse 19 Fax: ++49 (0)711-685 6 5832 70550 Stuttgartemail: kel...@hlrs.de Germany AIM/Skype:rusraink ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20003 (Solaris malloc.h issue)
Hi Patrick, r20003 seems to break MX support on Solaris. $ cd ompi/mca/common/mx $ make ... "/usr/include/malloc.h", line 46: syntax error before or at: ( "/usr/include/malloc.h", line 47: syntax error before or at: ( "/usr/include/malloc.h", line 48: syntax error before or at: ( "/usr/include/malloc.h", line 48: cannot have void object: size_t "/usr/include/malloc.h", line 48: identifier redeclared: size_t ... <4000 more lines of compiler errors> ... The below patch makes it so opal/util/malloc.h is used instead of /usr/include/malloc.h and the compiler errors go away. (I also needed to include errno.h.) Would this be okay to do? diff -r 347f52a3713f ompi/mca/common/mx/common_mx.c --- ompi/mca/common/mx/common_mx.c +++ ompi/mca/common/mx/common_mx.c @@ -23,9 +23,8 @@ #include "ompi/constants.h" #include "common_mx.h" -#ifdef HAVE_MALLOC_H -#include -#endif +#include +#include "opal/util/malloc.h" #include "opal/memoryhooks/memory.h" #include "opal/mca/base/mca_base_param.h" #include "ompi/runtime/params.h" I tested the above on Solaris and Linux with SunStudio. Regards, Ethan On Fri, Nov/14/2008 11:17:59PM, patr...@osl.iu.edu wrote: > Author: patrick > Date: 2008-11-14 23:17:58 EST (Fri, 14 Nov 2008) > New Revision: 20003 > URL: https://svn.open-mpi.org/trac/ompi/changeset/20003 > > Log: > Define a "fake" mpool to provide a memory release callback for the > memory hooks (munmap) and initialize the mallopt component, and > nothing else. > Use this mpool in the MX common initialization, supporting both BTL > and MTL. Automatically set the MX_RCACHE environment variable to > enable registration cache in MX. > > Tested with success for munmap() and large free(). > > > Added: >trunk/ompi/mca/mpool/fake/ >trunk/ompi/mca/mpool/fake/Makefile.am >trunk/ompi/mca/mpool/fake/configure.params >trunk/ompi/mca/mpool/fake/mpool_fake.h >trunk/ompi/mca/mpool/fake/mpool_fake_component.c >trunk/ompi/mca/mpool/fake/mpool_fake_module.c > Text files modified: >trunk/ompi/mca/common/mx/common_mx.c |56 > +++ >1 files changed, 55 insertions(+), 1 deletions(-) > > Modified: trunk/ompi/mca/common/mx/common_mx.c > == > --- trunk/ompi/mca/common/mx/common_mx.c (original) > +++ trunk/ompi/mca/common/mx/common_mx.c 2008-11-14 23:17:58 EST (Fri, > 14 Nov 2008) > @@ -9,6 +9,8 @@ > * University of Stuttgart. All rights reserved. > * Copyright (c) 2004-2006 The Regents of the University of California. > * All rights reserved. > + * Copyright (c) 2008 Myricom. All rights reserved. > + * > * $COPYRIGHT$ > * > * Additional copyrights may follow > @@ -21,11 +23,29 @@ > #include "ompi/constants.h" > #include "common_mx.h" > > +#ifdef HAVE_MALLOC_H > +#include > +#endif > +#include "opal/memoryhooks/memory.h" > +#include "opal/mca/base/mca_base_param.h" > +#include "ompi/runtime/params.h" > +#include "ompi/mca/mpool/mpool.h" > +#include "ompi/mca/mpool/base/base.h" > +#include "ompi/mca/mpool/fake/mpool_fake.h" > + > + > +int mx__regcache_clean(void *ptr, size_t size); > + > static int ompi_common_mx_initialize_ref_cnt = 0; > +static mca_mpool_base_module_t *ompi_common_mx_fake_mpool = 0; > + > int > ompi_common_mx_initialize(void) > { > mx_return_t mx_return; > +struct mca_mpool_base_resources_t mpool_resources; > +int index, value; > + > ompi_common_mx_initialize_ref_cnt++; > > if(ompi_common_mx_initialize_ref_cnt == 1) { > @@ -35,7 +55,37 @@ > * library does not exit the application. > */ > mx_set_error_handler(MX_ERRORS_RETURN); > - > + > + /* If we have a memory manager available, and > +mpi_leave_pinned == -1, then set mpi_leave_pinned to 1. > + > +We have a memory manager if: > +- we have both FREE and MUNMAP support > +- we have MUNMAP support and the linux mallopt */ > + value = opal_mem_hooks_support_level(); > + if (((value & (OPAL_MEMORY_FREE_SUPPORT | OPAL_MEMORY_MUNMAP_SUPPORT)) > + == (OPAL_MEMORY_FREE_SUPPORT | OPAL_MEMORY_MUNMAP_SUPPORT)) > + || ((value & OPAL_MEMORY_MUNMAP_SUPPORT) && > + OMPI_MPOOL_BASE_HAVE_LINUX_MALLOPT)) { > + index = mca_base_param_find("mpi", NULL, "leave_pinned"); > + if (index >= 0) > +if ((mca_base_param_lookup_int(index, &value) == OPAL_SUCCESS) > + && (value == -1)) { > + > + ompi_mpi_leave_pinned = 1; > + setenv("MX_RCACHE", "2", 1); > + mpool_resources.regcache_clean = mx__regcache_clean; > + ompi_common_mx_fake_mpool = > + mca_mpool_base_module_create("fake", NULL, &mpool_resources); > + if (!ompi_common_mx_fake_mpool) { > + omp
[OMPI devel] RFC: windows branch merge
Hi all, We just now merged the windows branch into trunk, split into 4 patches (r20108 to r20111) to keep them separate. Although incoming changes to trunk incurred some compile errors on windows which we fixed, we tested the following before committing: Windows x86-64, static libs compilation and running Windows x86-64, shared libs compilation and running Linux x86-64, compilation and running Windows test was done using CMake, selecting C, C++ and Fortran. The ompi wrappers have been tested with Visual Studio, orte tools seem to work. The MCA components that working under Windows are now marked with file .windows in corresponding folders. To keep track of the proposed merge into a v1.3.x release, the ticket #1708 has been opened. If this is decided to be added to a later release, additional patches may be added to the ticket. Thank you very much. With Best Regards, Rainer and Shiqing
[OMPI devel] 1.3 staging area?
Hi all I'm a tad concerned about our ability to test proposed CMR's for the 1.3 branch. Given the long delays in getting 1.3 out, and the rapidly looming 1.4 milestones that many of us have in our individual projects, it is clear that the trunk is going to quickly diverge significantly from what is in the 1.3 branch. In addition, we are going to see quite a few commits occurring within a restricted time period. Thus, the fact that some proposed change does or does not pass MTT tests on the trunk at some given point in time is no longer a reliable indicator of its behavior in 1.3. Likewise, it will be difficult to isolate that "this commit is okay" when MTT can really only tell us the state of the aggregated code base. Let me hasten to point out that this has been a recurring problem with every major release. We have discussed the problem on several occasions, but failed to reach consensus on a solution. I would like to propose that we create a 1.3 staging branch. This branch would be opened on an individual-at-a-time basis for them to commit proposed CMR's for the 1.3 branch. We would ask that people please include the staging branch in their MTT testing on occasions when a change has been made. Once the proposed change has been validated, then it can be brought over as a single (and easy) merge to the 1.3 release branch. I realize this may slow the passage of bug fixes somewhat, and obviously we should apply this on a case-by-case basis (e.g., a simple removal of an unused variable would hardly merit such a step). However, I believe that something like the IOF patch that needs to eventually move to 1.3, and the Windows upgrade, are examples that probably do merit this step. Just a suggestion - hope it helps. Ralph
[OMPI devel] Fwd: [OMPI users] Onesided + derived datatypes
Hi all - I looked into this, and it appears to be datatype related. If the displacements are set t o 3, 2, 1, 0, there the datatype will fail the type checks for one-sided because is_overlapped() returns 1 for the datatype. My reading of the standard seems to indicate this should not be. I haven't looked into the problems with displacement set to 0, 1, 2, 3, but I'm guessing it has something to do with the reverse problem. This looks like a datatype issue, so it's out of my realm of expertise. Can someone else take a look? Brian Begin forwarded message: From: doriankrause Date: December 10, 2008 4:07:55 PM MST To: us...@open-mpi.org Subject: [OMPI users] Onesided + derived datatypes Reply-To: Open MPI Users Hi List, I have a MPI program which uses one sided communication with derived datatypes (MPI_Type_create_indexed_block). I developed the code with MPICH2 and unfortunately didn't thought about trying it out with OpenMPI. Now that I'm "porting" the Application to OpenMPI I'm facing some problems. On the most machines I get an SIGSEGV in MPI_Win_fence, sometimes an invalid datatype shows up. I ran the program in Valgrind and didn't get anything valuable. Since I can't see a reason for this problem (at least if I understand the standard correctly), I wrote the attached testprogram. Here are my experiences: * If I compile without ONESIDED defined, everything works and V1 and V2 give the same results * If I compile with ONESIDED and V2 defined (MPI_Type_contiguous) it works. * ONESIDED + V1 + O2: No errors but obviously nothing is send? (Am I in assuming that V1+O2 and V2 should be equivalent?) * ONESIDED + V1 + O1: [m02:03115] *** An error occurred in MPI_Put [m02:03115] *** on win [m02:03115] *** MPI_ERR_TYPE: invalid datatype [m02:03115] *** MPI_ERRORS_ARE_FATAL (goodbye) I didn't get a segfault as in the "real life example" but if ompitest.cc is correct it means that OpenMPI is buggy when it comes to onesided communication and (some) derived datatypes, so that it is probably not of problem in my code. I'm using OpenMPI-1.2.8 with the newest gcc 4.3.2 but the same behaviour can be be seen with gcc-3.3.1 and intel 10.1. Please correct me if ompitest.cc contains errors. Otherwise I would be glad to hear how I should report these problems to the develepors (if they don't read this). Thanks + best regards Dorian ompitest.tar.gz Description: GNU Zip compressed data ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users