Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
On Jun 9, 2010, at 11:57 AM, Jeff Squyres wrote: Iiiteresting. This, of course, begs the question of whether we should use sysv shmem or not. It seems like the order of preference should be: - sysv - mmap in a tmpfs - mmap in a "regular" (but not networked) fs The big downer, of course, is the whole "what happens if the job crashes?" issue. With mmap, an rm -rf will clean up any leftover files (although looking for them in /dev/shm might be a bit non- obvious). With sysv, you have to use the ipc* commands to look for and whack any orphan shmem segments. System V shared memory cleanup is a concern only if a process dies in between shmat and shmctl IPC_RMID. Shared memory segment cleanup should happen automagically in most cases, including abnormal process termination. -- Samuel K. Gutierrez Los Alamos National Laboratory Right now, the orted/hnp won't clean up any left over sysv segments. This seems like something we should fix. But even with that, if the orted/hnp is killed, sysv segments can get let over. Hrm. On Jun 9, 2010, at 11:58 AM, Sylvain Jeaugey wrote: As stated at the conf call, I did some performance testing on a 32 cores node. So, here is graph showing 500 timings of an allreduce operation (repeated 15,000 times for good timing) with sysv, mmap on /dev/shm and mmap on /tmp. What is shows : - sysv has the better performance ; - having the mmap file in /dev/shm is very close to sysv. We only have +0.1 us for a complete allreduce operation, but it seems stable. The noise is identical to sysv (must be OS noise) ; - having the mmap file in /tmp (ext3) decreases performance (+0.4 us compared to /dev/shm) and seems prone to some "other" noise. Warning : the graph does not start at 0. Sylvain On Tue, 27 Apr 2010, Samuel K. Gutierrez wrote: Hi, With Jeff and Ralph's help, I have completed a System V shared memory component for Open MPI. I have conducted some preliminary tests on our systems, but would like to get test results from a broader audience. As it stands, mmap is the defaul, but System V shared memory can be activated using: -mca mpi_common_sm sysv Repository: http://bitbucket.org/samuelkgutierrez/ompi_sysv_sm Input is greatly appreciated! -- Samuel K. Gutierrez Los Alamos National Laboratory ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
If anyone is up for it, another interesting performance comparison could be start-up time. That is, consider a fat node with many on-node processes and a large shared-memory area. How long does it take for all that shared memory to be set up? Arguably, start-up time is a "second-order effect", but it ends up mattering to someone sometime. I can't find my notes on this, but I thought SysV had some nice advantages in this area, but there were many other complicating factors (even beyond the clean-up stuff Jeff mentions). Jeff Squyres wrote: Iiiteresting. This, of course, begs the question of whether we should use sysv shmem or not. It seems like the order of preference should be: - sysv - mmap in a tmpfs - mmap in a "regular" (but not networked) fs The big downer, of course, is the whole "what happens if the job crashes?" issue. With mmap, an rm -rf will clean up any leftover files (although looking for them in /dev/shm might be a bit non-obvious). With sysv, you have to use the ipc* commands to look for and whack any orphan shmem segments. Right now, the orted/hnp won't clean up any left over sysv segments. This seems like something we should fix. But even with that, if the orted/hnp is killed, sysv segments can get let over. Hrm. On Jun 9, 2010, at 11:58 AM, Sylvain Jeaugey wrote: As stated at the conf call, I did some performance testing on a 32 cores node. So, here is graph showing 500 timings of an allreduce operation (repeated 15,000 times for good timing) with sysv, mmap on /dev/shm and mmap on /tmp. What is shows : - sysv has the better performance ; - having the mmap file in /dev/shm is very close to sysv. We only have +0.1 us for a complete allreduce operation, but it seems stable. The noise is identical to sysv (must be OS noise) ; - having the mmap file in /tmp (ext3) decreases performance (+0.4 us compared to /dev/shm) and seems prone to some "other" noise. Warning : the graph does not start at 0. On Tue, 27 Apr 2010, Samuel K. Gutierrez wrote: With Jeff and Ralph's help, I have completed a System V shared memory component for Open MPI. I have conducted some preliminary tests on our systems, but would like to get test results from a broader audience. As it stands, mmap is the defaul, but System V shared memory can be activated using: -mca mpi_common_sm sysv Repository: http://bitbucket.org/samuelkgutierrez/ompi_sysv_sm Input is greatly appreciated!
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
Iiiteresting. This, of course, begs the question of whether we should use sysv shmem or not. It seems like the order of preference should be: - sysv - mmap in a tmpfs - mmap in a "regular" (but not networked) fs The big downer, of course, is the whole "what happens if the job crashes?" issue. With mmap, an rm -rf will clean up any leftover files (although looking for them in /dev/shm might be a bit non-obvious). With sysv, you have to use the ipc* commands to look for and whack any orphan shmem segments. Right now, the orted/hnp won't clean up any left over sysv segments. This seems like something we should fix. But even with that, if the orted/hnp is killed, sysv segments can get let over. Hrm. On Jun 9, 2010, at 11:58 AM, Sylvain Jeaugey wrote: > As stated at the conf call, I did some performance testing on a 32 cores > node. > > So, here is graph showing 500 timings of an allreduce operation (repeated > 15,000 times for good timing) with sysv, mmap on /dev/shm and mmap on > /tmp. > > What is shows : > - sysv has the better performance ; > - having the mmap file in /dev/shm is very close to sysv. We only have > +0.1 us for a complete allreduce operation, but it seems stable. The noise > is identical to sysv (must be OS noise) ; > - having the mmap file in /tmp (ext3) decreases performance (+0.4 us > compared to /dev/shm) and seems prone to some "other" noise. > > Warning : the graph does not start at 0. > > Sylvain > > On Tue, 27 Apr 2010, Samuel K. Gutierrez wrote: > > > Hi, > > > > With Jeff and Ralph's help, I have completed a System V shared memory > > component for Open MPI. I have conducted some preliminary tests on our > > systems, but would like to get test results from a broader audience. > > > > As it stands, mmap is the defaul, but System V shared memory can be > > activated > > using: -mca mpi_common_sm sysv > > > > Repository: > > http://bitbucket.org/samuelkgutierrez/ompi_sysv_sm > > > > Input is greatly appreciated! > > > > -- > > Samuel K. Gutierrez > > Los Alamos National Laboratory > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
Thanks Sylvain! -- Samuel K. Gutierrez Los Alamos National Laboratory On Jun 9, 2010, at 9:58 AM, Sylvain Jeaugey wrote: As stated at the conf call, I did some performance testing on a 32 cores node. So, here is graph showing 500 timings of an allreduce operation (repeated 15,000 times for good timing) with sysv, mmap on /dev/shm and mmap on /tmp. What is shows : - sysv has the better performance ; - having the mmap file in /dev/shm is very close to sysv. We only have +0.1 us for a complete allreduce operation, but it seems stable. The noise is identical to sysv (must be OS noise) ; - having the mmap file in /tmp (ext3) decreases performance (+0.4 us compared to /dev/shm) and seems prone to some "other" noise. Warning : the graph does not start at 0. Sylvain On Tue, 27 Apr 2010, Samuel K. Gutierrez wrote: Hi, With Jeff and Ralph's help, I have completed a System V shared memory component for Open MPI. I have conducted some preliminary tests on our systems, but would like to get test results from a broader audience. As it stands, mmap is the defaul, but System V shared memory can be activated using: -mca mpi_common_sm sysv Repository: http://bitbucket.org/samuelkgutierrez/ompi_sysv_sm Input is greatly appreciated! -- Samuel K. Gutierrez Los Alamos National Laboratory ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
As stated at the conf call, I did some performance testing on a 32 cores node. So, here is graph showing 500 timings of an allreduce operation (repeated 15,000 times for good timing) with sysv, mmap on /dev/shm and mmap on /tmp. What is shows : - sysv has the better performance ; - having the mmap file in /dev/shm is very close to sysv. We only have +0.1 us for a complete allreduce operation, but it seems stable. The noise is identical to sysv (must be OS noise) ; - having the mmap file in /tmp (ext3) decreases performance (+0.4 us compared to /dev/shm) and seems prone to some "other" noise. Warning : the graph does not start at 0. Sylvain On Tue, 27 Apr 2010, Samuel K. Gutierrez wrote: Hi, With Jeff and Ralph's help, I have completed a System V shared memory component for Open MPI. I have conducted some preliminary tests on our systems, but would like to get test results from a broader audience. As it stands, mmap is the defaul, but System V shared memory can be activated using: -mca mpi_common_sm sysv Repository: http://bitbucket.org/samuelkgutierrez/ompi_sysv_sm Input is greatly appreciated! -- Samuel K. Gutierrez Los Alamos National Laboratory ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
Hi all, Does anyone know of a relatively portable solution for querying a given system for the shmctl behavior that I am relying on, or is this going to be a nightmare? Because, if I am reading this thread correctly, the presence of shmget and Linux is not sufficient for determining an adequate level of sysv support. Thanks! -- Samuel K. Gutierrez Los Alamos National Laboratory On May 2, 2010, at 7:48 AM, N.M. Maclaren wrote: On May 2 2010, Ashley Pittman wrote: On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote: As to performance there should be no difference in use between sys- V shared memory and file-backed shared memory, the instructions issued and the MMU flags for the page should both be the same so the performance should be identical. Not necessarily, and possibly not so even for far-future Linuces. On at least one system I used, the poxious kernel wrote the complete file to disk before returning - all right, it did that for System V shared memory, too, just to a 'hidden' file! But, if I recall, on another it did that only for file-backed shared memory - however, it's a decade ago now and I may be misremembering. Of course, that's a serious issue mainly for large segments. I was using multi-GB ones. I don't know how big the ones you need are. The one area you do need to keep an eye on for performance is on numa machines where it's important which process on a node touches each page first, you can end up using different areas (pages, not regions) for communicating in different directions between the same pair of processes. I don't believe this is any different to mmap backed shared memory though. On some systems it may be, but in bizarre, inconsistent, undocumented and unpredictable ways :-( Also, there are usually several system (and sometimes user) configuration options that change the behaviour, so you have to allow for that. My experience of trying to use those is that different uses have incompatible requirements, and most of the critical configuration parameters apply to ALL uses! In my view, the configuration variability is the number one nightmare for trying to write portable code that uses any form of shared memory. ARMCI seem to agree. Because of this, sysv support may be limited to Linux systems - that is, until we can get a better sense of which systems provide the shmctl IPC_RMID behavior that I am relying on. And, I suggest, whether they have an evil gotcha on one of the areas that Ashley Pittman noted. Regards, Nick Maclaren. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
On May 2 2010, Ashley Pittman wrote: On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote: As to performance there should be no difference in use between sys-V shared memory and file-backed shared memory, the instructions issued and the MMU flags for the page should both be the same so the performance should be identical. Not necessarily, and possibly not so even for far-future Linuces. On at least one system I used, the poxious kernel wrote the complete file to disk before returning - all right, it did that for System V shared memory, too, just to a 'hidden' file! But, if I recall, on another it did that only for file-backed shared memory - however, it's a decade ago now and I may be misremembering. Of course, that's a serious issue mainly for large segments. I was using multi-GB ones. I don't know how big the ones you need are. The one area you do need to keep an eye on for performance is on numa machines where it's important which process on a node touches each page first, you can end up using different areas (pages, not regions) for communicating in different directions between the same pair of processes. I don't believe this is any different to mmap backed shared memory though. On some systems it may be, but in bizarre, inconsistent, undocumented and unpredictable ways :-( Also, there are usually several system (and sometimes user) configuration options that change the behaviour, so you have to allow for that. My experience of trying to use those is that different uses have incompatible requirements, and most of the critical configuration parameters apply to ALL uses! In my view, the configuration variability is the number one nightmare for trying to write portable code that uses any form of shared memory. ARMCI seem to agree. Because of this, sysv support may be limited to Linux systems - that is, until we can get a better sense of which systems provide the shmctl IPC_RMID behavior that I am relying on. And, I suggest, whether they have an evil gotcha on one of the areas that Ashley Pittman noted. Regards, Nick Maclaren.
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
On 02/05/10 06:49, Ashley Pittman wrote: > I think you should look into this a little deeper, it > certainly used to be the case on Linux that setting > IPC_RMID would also prevent any further processes from > attaching to the segment. That certainly appears to be the case in the current master of the kernel, IPC_PRIVATE is set on the segment with the comment: /* Do not find it any more */ That flag means that ipcget() - used by sys_shmget() - take a different code path and now call ipcget_new() rather than ipcget_public(). cheers, Chris -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computational Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
On 01/05/10 23:03, Samuel K. Gutierrez wrote: > I call shmctl IPC_RMID immediately after one process has > attached to the segment because, at least on Linux, this > only marks the segment for destruction. That's correct, looking at the kernel code (at least in the current git master) the function that handles this - do_shm_rmid() in ipc/shm.c - only destroys the segment if nobody is attached to it, otherwise it marks the segment as IPC_PRIVATE to stop others finding it and with SHM_DEST so that it is automatically destroyed on the last detach. cheers, Chris -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computational Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
On 2 May 2010, at 04:03, Samuel K. Gutierrez wrote: > As far as I can tell, calling shmctl IPC_RMID is immediately destroying > the shared memory segment even though there is at least one process > attached to it. This is interesting and confusing because Solaris 10's > behavior description of shmctl IPC_RMID is similar to that of Linux'. > > I call shmctl IPC_RMID immediately after one process has attached to the > segment because, at least on Linux, this only marks the segment for > destruction. The segment is only actually destroyed after all attached > processes have terminated. I'm relying on this behavior for resource > cleanup upon application termination (normal/abnormal). I think you should look into this a little deeper, it certainly used to be the case on Linux that setting IPC_RMID would also prevent any further processes from attaching to the segment. You're right that minimising the window that the region exists for without that bit set is good, both in terms of wall-clock-time and lines of code, what we used to do here was to have all processes on a node perform a out-of-band intra-node barrier before creating the segment and another in-band barrier immediately after creating it. Without this if one process on a node has problems and aborts during startup before it gets to the shared memory code then you are almost guaranteed to leave a un-attached segment behind. As to performance there should be no difference in use between sys-V shared memory and file-backed shared memory, the instructions issued and the MMU flags for the page should both be the same so the performance should be identical. The one area you do need to keep an eye on for performance is on numa machines where it's important which process on a node touches each page first, you can end up using different areas (pages, not regions) for communicating in different directions between the same pair of processes. I don't believe this is any different to mmap backed shared memory though. > Because of this, sysv support may be limited to Linux systems - that is, > until we can get a better sense of which systems provide the shmctl > IPC_RMID behavior that I am relying on. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
Hi Ethan, Sorry about the lag. As far as I can tell, calling shmctl IPC_RMID is immediately destroying the shared memory segment even though there is at least one process attached to it. This is interesting and confusing because Solaris 10's behavior description of shmctl IPC_RMID is similar to that of Linux'. I call shmctl IPC_RMID immediately after one process has attached to the segment because, at least on Linux, this only marks the segment for destruction. The segment is only actually destroyed after all attached processes have terminated. I'm relying on this behavior for resource cleanup upon application termination (normal/abnormal). Because of this, sysv support may be limited to Linux systems - that is, until we can get a better sense of which systems provide the shmctl IPC_RMID behavior that I am relying on. Any other ideas are greatly appreciated. Thanks for testing! -- Samuel K. Gutierrez Los Alamos National Laboratory > On Thu, Apr/29/2010 02:52:24PM, Samuel K. Gutierrez wrote: >> Hi Ethan, >> Bummer. What does the following command show? >> sysctl -a | grep shm > > In this case, I think the Solaris equivalent to sysctl is prctl, e.g., > > $ prctl -i project group.staff > project: 10: group.staff > NAMEPRIVILEGE VALUEFLAG ACTION > RECIPIENT > ... > project.max-shm-memory > privileged 3.92GB - deny > - > system 16.0EBmax deny > - > project.max-shm-ids > privileged128 - deny > - > system 16.8M max deny > - > ... > > Is that the info you need? > > -Ethan > >> Thanks! >> -- >> Samuel K. Gutierrez >> Los Alamos National Laboratory >> On Apr 29, 2010, at 1:32 PM, Ethan Mallove wrote: >> > Hi Samuel, >> > >> > I'm trying to run off your HG clone, but I'm seeing issues with c_hello, e.g., >> > >> > $ mpirun -mca mpi_common_sm sysv --mca btl self,sm,tcp --host >> > burl-ct-v440-2,burl-ct-v440-2 -np 2 ./c_hello >> > -- A system call failed during shared memory initialization that should not have. It is likely that your MPI job will now either abort or experience performance degradation. >> > >> >Local host: burl-ct-v440-2 >> >System call: shmat(2) >> >Process: [[43408,1],1] >> >Error: Invalid argument (errno 22) >> > -- ^Cmpirun: killing job... >> > >> > $ uname -a >> > SunOS burl-ct-v440-2 5.10 Generic_118833-33 sun4u sparc >> SUNW,Sun-Fire-V440 >> > >> > The same test works okay if I s/sysv/mmap/. >> > >> > Regards, >> > Ethan >> > >> > >> > On Wed, Apr/28/2010 07:16:12AM, Samuel K. Gutierrez wrote: >> >> Hi, >> >> >> >> Faster component initialization/finalization times is one of the main >> >> motivating factors of this work. The general idea is to get away >> from >> >> creating a rather large backing file. With respect to module >> bandwidth >> >> and >> >> latency, mmap and sysv seem to be comparable - at least that is what >> my >> >> preliminary tests have shown. As it stands, I have not come across a >> >> situation where the mmap SM component doesn't work or is slower. >> >> >> >> Hope that helps, >> >> >> >> -- >> >> Samuel K. Gutierrez >> >> Los Alamos National Laboratory >> >> >> >> >> >> >> >> >> >> >> >> On Apr 28, 2010, at 5:35 AM, Bogdan Costescu wrote: >> >> >> >>> On Tue, Apr 27, 2010 at 7:55 PM, Samuel K. Gutierrez >> >> >>> wrote: >> With Jeff and Ralph's help, I have completed a System V shared >> memory >> component for Open MPI. >> >>> >> >>> What is the motivation for this work ? Are there situations where >> the >> >>> mmap based SM component doesn't work or is slow(er) ? >> >>> >> >>> Kind regards, >> >>> Bogdan >> >>> ___ >> >>> devel mailing list >> >>> de...@open-mpi.org >> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> ___ >> >> devel mailing list >> >> de...@open-mpi.org >> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > ___ >> > devel mailing list >> > de...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
On Thu, Apr/29/2010 02:52:24PM, Samuel K. Gutierrez wrote: > Hi Ethan, > > Bummer. What does the following command show? > > sysctl -a | grep shm In this case, I think the Solaris equivalent to sysctl is prctl, e.g., $ prctl -i project group.staff project: 10: group.staff NAMEPRIVILEGE VALUEFLAG ACTION RECIPIENT ... project.max-shm-memory privileged 3.92GB - deny - system 16.0EBmax deny - project.max-shm-ids privileged128 - deny - system 16.8M max deny - ... Is that the info you need? -Ethan > > Thanks! > > -- > Samuel K. Gutierrez > Los Alamos National Laboratory > > On Apr 29, 2010, at 1:32 PM, Ethan Mallove wrote: > > > Hi Samuel, > > > > I'm trying to run off your HG clone, but I'm seeing issues with > > c_hello, e.g., > > > > $ mpirun -mca mpi_common_sm sysv --mca btl self,sm,tcp --host > > burl-ct-v440-2,burl-ct-v440-2 -np 2 ./c_hello > > -- > > A system call failed during shared memory initialization that should > > not have. It is likely that your MPI job will now either abort or > > experience performance degradation. > > > >Local host: burl-ct-v440-2 > >System call: shmat(2) > >Process: [[43408,1],1] > >Error: Invalid argument (errno 22) > > -- > > ^Cmpirun: killing job... > > > > $ uname -a > > SunOS burl-ct-v440-2 5.10 Generic_118833-33 sun4u sparc SUNW,Sun-Fire-V440 > > > > The same test works okay if I s/sysv/mmap/. > > > > Regards, > > Ethan > > > > > > On Wed, Apr/28/2010 07:16:12AM, Samuel K. Gutierrez wrote: > >> Hi, > >> > >> Faster component initialization/finalization times is one of the main > >> motivating factors of this work. The general idea is to get away from > >> creating a rather large backing file. With respect to module bandwidth > >> and > >> latency, mmap and sysv seem to be comparable - at least that is what my > >> preliminary tests have shown. As it stands, I have not come across a > >> situation where the mmap SM component doesn't work or is slower. > >> > >> Hope that helps, > >> > >> -- > >> Samuel K. Gutierrez > >> Los Alamos National Laboratory > >> > >> > >> > >> > >> > >> On Apr 28, 2010, at 5:35 AM, Bogdan Costescu wrote: > >> > >>> On Tue, Apr 27, 2010 at 7:55 PM, Samuel K. Gutierrez > >>> wrote: > With Jeff and Ralph's help, I have completed a System V shared memory > component for Open MPI. > >>> > >>> What is the motivation for this work ? Are there situations where the > >>> mmap based SM component doesn't work or is slow(er) ? > >>> > >>> Kind regards, > >>> Bogdan > >>> ___ > >>> devel mailing list > >>> de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> ___ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
Hi Ethan, Bummer. What does the following command show? sysctl -a | grep shm Thanks! -- Samuel K. Gutierrez Los Alamos National Laboratory On Apr 29, 2010, at 1:32 PM, Ethan Mallove wrote: Hi Samuel, I'm trying to run off your HG clone, but I'm seeing issues with c_hello, e.g., $ mpirun -mca mpi_common_sm sysv --mca btl self,sm,tcp --host burl- ct-v440-2,burl-ct-v440-2 -np 2 ./c_hello -- A system call failed during shared memory initialization that should not have. It is likely that your MPI job will now either abort or experience performance degradation. Local host: burl-ct-v440-2 System call: shmat(2) Process: [[43408,1],1] Error: Invalid argument (errno 22) -- ^Cmpirun: killing job... $ uname -a SunOS burl-ct-v440-2 5.10 Generic_118833-33 sun4u sparc SUNW,Sun- Fire-V440 The same test works okay if I s/sysv/mmap/. Regards, Ethan On Wed, Apr/28/2010 07:16:12AM, Samuel K. Gutierrez wrote: Hi, Faster component initialization/finalization times is one of the main motivating factors of this work. The general idea is to get away from creating a rather large backing file. With respect to module bandwidth and latency, mmap and sysv seem to be comparable - at least that is what my preliminary tests have shown. As it stands, I have not come across a situation where the mmap SM component doesn't work or is slower. Hope that helps, -- Samuel K. Gutierrez Los Alamos National Laboratory On Apr 28, 2010, at 5:35 AM, Bogdan Costescu wrote: On Tue, Apr 27, 2010 at 7:55 PM, Samuel K. Gutierrez > wrote: With Jeff and Ralph's help, I have completed a System V shared memory component for Open MPI. What is the motivation for this work ? Are there situations where the mmap based SM component doesn't work or is slow(er) ? Kind regards, Bogdan ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
Hi Samuel, I'm trying to run off your HG clone, but I'm seeing issues with c_hello, e.g., $ mpirun -mca mpi_common_sm sysv --mca btl self,sm,tcp --host burl-ct-v440-2,burl-ct-v440-2 -np 2 ./c_hello -- A system call failed during shared memory initialization that should not have. It is likely that your MPI job will now either abort or experience performance degradation. Local host: burl-ct-v440-2 System call: shmat(2) Process: [[43408,1],1] Error: Invalid argument (errno 22) -- ^Cmpirun: killing job... $ uname -a SunOS burl-ct-v440-2 5.10 Generic_118833-33 sun4u sparc SUNW,Sun-Fire-V440 The same test works okay if I s/sysv/mmap/. Regards, Ethan On Wed, Apr/28/2010 07:16:12AM, Samuel K. Gutierrez wrote: > Hi, > > Faster component initialization/finalization times is one of the main > motivating factors of this work. The general idea is to get away from > creating a rather large backing file. With respect to module bandwidth and > latency, mmap and sysv seem to be comparable - at least that is what my > preliminary tests have shown. As it stands, I have not come across a > situation where the mmap SM component doesn't work or is slower. > > Hope that helps, > > -- > Samuel K. Gutierrez > Los Alamos National Laboratory > > > > > > On Apr 28, 2010, at 5:35 AM, Bogdan Costescu wrote: > > > On Tue, Apr 27, 2010 at 7:55 PM, Samuel K. Gutierrez > > wrote: > >> With Jeff and Ralph's help, I have completed a System V shared memory > >> component for Open MPI. > > > > What is the motivation for this work ? Are there situations where the > > mmap based SM component doesn't work or is slow(er) ? > > > > Kind regards, > > Bogdan > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
Hi, Faster component initialization/finalization times is one of the main motivating factors of this work. The general idea is to get away from creating a rather large backing file. With respect to module bandwidth and latency, mmap and sysv seem to be comparable - at least that is what my preliminary tests have shown. As it stands, I have not come across a situation where the mmap SM component doesn't work or is slower. Hope that helps, -- Samuel K. Gutierrez Los Alamos National Laboratory On Apr 28, 2010, at 5:35 AM, Bogdan Costescu wrote: On Tue, Apr 27, 2010 at 7:55 PM, Samuel K. Gutierrez wrote: With Jeff and Ralph's help, I have completed a System V shared memory component for Open MPI. What is the motivation for this work ? Are there situations where the mmap based SM component doesn't work or is slow(er) ? Kind regards, Bogdan ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
On Tue, Apr 27, 2010 at 7:55 PM, Samuel K. Gutierrez wrote: > With Jeff and Ralph's help, I have completed a System V shared memory > component for Open MPI. What is the motivation for this work ? Are there situations where the mmap based SM component doesn't work or is slow(er) ? Kind regards, Bogdan
[OMPI devel] System V Shared Memory for Open MPI: Request for Community Input and Testing
Hi, With Jeff and Ralph's help, I have completed a System V shared memory component for Open MPI. I have conducted some preliminary tests on our systems, but would like to get test results from a broader audience. As it stands, mmap is the defaul, but System V shared memory can be activated using: -mca mpi_common_sm sysv Repository: http://bitbucket.org/samuelkgutierrez/ompi_sysv_sm Input is greatly appreciated! -- Samuel K. Gutierrez Los Alamos National Laboratory