[OMPI devel] rankfile questions
I notice that rankfile didn't compile properly on some platforms and issued warnings on other platforms. Thanks to Ralph for cleaning it up... 1. I see a getenv("slot_list") in the MPI side of the code; it looks like $slot_list is set by the odls for the MPI process. Why isn't it an MCA parameter? That's what all other values passed by the orted to the MPI process appear to be. 2. I see that ompi_mpi_params.c is now registering 2 rmaps-level MCA parameters. Why? Shouldn't these be in ORTE somewhere? -- Jeff Squyres Cisco Systems
Re: [OMPI devel] rankfile questions
On Mar 18, 2008, at 9:32 AM, Jeff Squyres wrote: I notice that rankfile didn't compile properly on some platforms and issued warnings on other platforms. Thanks to Ralph for cleaning it up... 1. I see a getenv("slot_list") in the MPI side of the code; it looks like $slot_list is set by the odls for the MPI process. Why isn't it an MCA parameter? That's what all other values passed by the orted to the MPI process appear to be. 2. I see that ompi_mpi_params.c is now registering 2 rmaps-level MCA parameters. Why? Shouldn't these be in ORTE somewhere? A few more notes: 3. Most of the files in orte/mca/rmaps/rankfile do not obey the prefix rule. I think that they should be renamed. 4. A quick look through rankfile_lex.l seems to show that there are global variables that are not protected by the prefix rule (or static). Ditto in rmaps_rf.c. These should be fixed. 5. rank_file_done was instantiated in both rankfile_lex.l and ramps_rf.c (causing a duplicate symbol linker error on OS X). I removed it from rmaps_rf.c (it was declared "extern" in rankfile_lex.h, assumedly to indicate that it is "owned" by the lex.l file...?). 6. svn:ignore was not set in the new rankfile directory. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] xensocket btl and migration
Muhammad, With regard to your question on migration you will likely have to reload the BTL components when a migration occurs. Open MPI currently assumes that once the set of BTLs are decided upon in a process they are to be used until the application completes. There is some limited support for failover in which if one BTL 'fails' then it is disregarded and a previously defined alternative path is used. For example if between two peers Open MPI has the choice of using tcp or openib then it will use openib. If openib were to fail during the running of the job then it may be possible for Open MPI to fail over and use just tcp. I'm not sure how well tested this ability is, others can comment if you are interested in this. However failover is not really want you are looking for. What it seem you are looking for is the ability to tell two processes that they should no longer communicate over tcp, but continue communication over xensockets or visa versa. One technique would be upon migration, if unload the BTLs (component_close) then reopen (component_open) and reselect (component_select) then reexchange the modex the processes should settle into the new configuration. You will have to make sure that any state Open MPI has cached such as network addresses and node name data is refreshed upon restart. Take a look at the checkpoint/ restart logic for how I do this in the code base ([opal|orte|ompi]/ runtime/*_cr.c). It is likely that there is another, more efficient method but I don't have anything to point you to at the moment. One idea would be to add a refresh function to the modex which would force the reexchange of a single processes address set. There are a slew of problems with this that you will have to overcome including race conditions, but I think they can be surmounted. I'd be interested in hearing your experiences implementing this in Open MPI. Let me know if I can be of any more help. Cheers, Josh On Mar 9, 2008, at 6:13 AM, Muhammad Atif wrote: Okay guys.. with all your support and help in understanding ompi architecture, I was able to get Xensocket to work. Only minor changes to the xensocket kernel module made it compatible with libevent. I am getting results which are bad but I am sure, I have to cleanup the code. At least my results have improved over native netfront-netback of xen for messages of size larger than 1 MB. I started with making minor changes in the TCP btl, but it seems it is not the best way, as changes are quite huge and it is better to have separate dedicated btl for xensockets. As you guys might be aware Xen supports live migration, now I have one stupid question. My knowledge so far suggests that btl component is initialized only once. The scerario here is if my guest os is migrated from one physical node to another, and realizes that the communicating processes are now on one physical host and they should abandon use of TCP btl and make use of Xensocket btl. I am sure it would not happen out of the box, but is it possible without making heavy changes in the openmpi architecture? With the current design, i am running a mix of tcp and xensocket btls, and endpoints check periodically if they are on same physical host or not. This has quite a big penalty in terms of time. Another question is (good thing i am using email otherwise you guys would beat the hell outta me, its such a basic question). I am not able to track MPI_Recv(...) api call and its alike calls. Once in the code of MPI_Recv(..) we give a call to rc = MCA_PML_CALL(recv(buf, count ... ). This call goes to the macro, and pml.recv(..) gets invoked (mca_pml_base_module_recv_fn_t pml_recv;) . Where can I find the actual function? I get totally lost when trying to pinpoint what exactly is happening. Basically, I am looking for a place where tcp btl recv is getting called with all the goodies and parameters which were passed by the MPI programmer. I hope I have made my question understandable. Best Regards, Muhammad Atif - Original Message From: Brian W. Barrett To: Open MPI Developers Sent: Wednesday, February 6, 2008 2:57:31 AM Subject: Re: [OMPI devel] xensocket - callbacks through OPAL/libevent On Mon, 4 Feb 2008, Muhammad Atif wrote: > I am trying to port xensockets to openmpi. In principle, I have the > framework and everything, but there seems to be a small issue, I cannot > get libevent (or OPAL) to give callbacks for receive (or send) for > xensockets. I have tried to implement native code for xensockets with > libevent library, again the same issue. No call backs! . With normal > sockets, callbacks do come easily. > > So question is, do the socket/file descriptors have to have some special > mechanism attached to them to support callbacks for libevent/opal? i.e > some structure/magic?. i.e. maybe the developers of xensockets did not > add that callback
[OMPI devel] 1.2.6 man page fixes: done
Terry -- Per the teleconf today (I wanted to ensure that some man page fixes were included in 1.2.6): I checked SVN; the man pages fixes submitted by the Debian OMPI package maintainers were committed to the 1.2 branch almost a month ago. So I think we're clear for 1.2.6rc3. -- Jeff Squyres Cisco Systems
[OMPI devel] libevent-merge tarball
Per the RFC posted yesterday, we plan to merge in the new libevent over this upcoming weekend. Please test the /tmp-public/libevent- merge SVN branch! For convenience, I have posted a tarball from this branch if it would make it easier for you to test: http://www.open-mpi.org/~jsquyres/unofficial/ The SVN branch is: http://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge -- Jeff Squyres Cisco Systems
Re: [OMPI devel] RFC: libevent update
I'm testing with checkpoint/restart and the new libevent seems to be messing up the checkpoints generated by BLCR. I'll be taking a look at it over the next couple of days, but just thought I'd let people know. Unfortunately I don't have any more details at the moment. -- Josh On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote: WHAT: Bring new version of libevent to the trunk. WHY: Newer version, slightly better performance (lower overheads / lighter weight), properly integrate the use of epoll and other scalable fd monitoring mechanisms. WHERE: 98% of the changes are in opal/event; there's a few changes to configury and one change to the orted. TIMEOUT: COB, Friday, 21 March 2008 DESCRIPTION: George/UTK has done the bulk of the work to integrate a new version of libevent on the following tmp branch: https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS BRANCH! ** Cisco ran MTT on this branch on Friday and everything checked out (i.e., no more failures than on the trunk). We just made a few more minor changes today and I'm running MTT again now, but I'm not expecting any new failures (MTT will take several hours). We would like to bring the new libevent in over this upcoming weekend, but would very much appreciate if others could test on their platforms (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be a fairly side-effect free change, but it is possible that since we're now using epoll and other scalable fd monitoring tools, we'll run into some unanticipated issues on some platforms. Here's a consolidated diff if you want to see the changes: https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public% 2Flibevent-merge&old=17846&new_path=trunk&new=17842 Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] Switching away from SVN?
It's been loosely proposed that we switch away from SVN into a different system. This probably warrants some discussion to a) figure out if we want to move, and b) *if* we want to move, which system should we move to? One has system been proposed: Mercurial -- several OMPI developers are using it with good success. I know that some OMPI developers use Git, too. Are there other systems that we should consider? Primary reasons for doing the switch are: - distributed repositories are attractive/useful - git/Mercurial branching and merging are *way* better than SVN --> note that SVN v1.5 is supposed to be *much* better than v1.4 Primary reasons for staying with SVN are: - aside from branching/merging, SVN works pretty well - branching/merging is not "bad" in SVN (but if you used git/hg, you know it can be much, much better) This is likely not a near-term issue, but we might as well start some low-frequency discussions about it. Several issues would need to be figured out if we decide to switch away from SVN: - integration with trac - integration with user/account management - how to import all the SVN history to the new system - ...and probably others This might make a good topic for the next post-MPI-Forum meeting in Chicago: have someone stand up and give a 30 min overview of each system (Mercurial, Git, ...?) and we can have developer-level discussions (and hands-on testing) of the various systems to see what we like / don't like. If this sounds like a reasonable idea, let's figure out who wants to speak about the systems, etc. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] RFC: libevent update
Crud, ok. Keep us posted. On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote: I'm testing with checkpoint/restart and the new libevent seems to be messing up the checkpoints generated by BLCR. I'll be taking a look at it over the next couple of days, but just thought I'd let people know. Unfortunately I don't have any more details at the moment. -- Josh On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote: WHAT: Bring new version of libevent to the trunk. WHY: Newer version, slightly better performance (lower overheads / lighter weight), properly integrate the use of epoll and other scalable fd monitoring mechanisms. WHERE: 98% of the changes are in opal/event; there's a few changes to configury and one change to the orted. TIMEOUT: COB, Friday, 21 March 2008 DESCRIPTION: George/UTK has done the bulk of the work to integrate a new version of libevent on the following tmp branch: https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS BRANCH! ** Cisco ran MTT on this branch on Friday and everything checked out (i.e., no more failures than on the trunk). We just made a few more minor changes today and I'm running MTT again now, but I'm not expecting any new failures (MTT will take several hours). We would like to bring the new libevent in over this upcoming weekend, but would very much appreciate if others could test on their platforms (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be a fairly side-effect free change, but it is possible that since we're now using epoll and other scalable fd monitoring tools, we'll run into some unanticipated issues on some platforms. Here's a consolidated diff if you want to see the changes: https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public% 2Flibevent-merge&old=17846&new_path=trunk&new=17842 Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] RFC: libevent update
Jeff / George - Did you add a way to specify which event modules are used? Because epoll pushs the socket list into the kernel, I can see how it would screw up BLCR. I bet everything would work if we forced the use of poll / select. Brian On Tue, 18 Mar 2008, Jeff Squyres wrote: Crud, ok. Keep us posted. On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote: I'm testing with checkpoint/restart and the new libevent seems to be messing up the checkpoints generated by BLCR. I'll be taking a look at it over the next couple of days, but just thought I'd let people know. Unfortunately I don't have any more details at the moment. -- Josh On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote: WHAT: Bring new version of libevent to the trunk. WHY: Newer version, slightly better performance (lower overheads / lighter weight), properly integrate the use of epoll and other scalable fd monitoring mechanisms. WHERE: 98% of the changes are in opal/event; there's a few changes to configury and one change to the orted. TIMEOUT: COB, Friday, 21 March 2008 DESCRIPTION: George/UTK has done the bulk of the work to integrate a new version of libevent on the following tmp branch: https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS BRANCH! ** Cisco ran MTT on this branch on Friday and everything checked out (i.e., no more failures than on the trunk). We just made a few more minor changes today and I'm running MTT again now, but I'm not expecting any new failures (MTT will take several hours). We would like to bring the new libevent in over this upcoming weekend, but would very much appreciate if others could test on their platforms (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be a fairly side-effect free change, but it is possible that since we're now using epoll and other scalable fd monitoring tools, we'll run into some unanticipated issues on some platforms. Here's a consolidated diff if you want to see the changes: https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public% 2Flibevent-merge&old=17846&new_path=trunk&new=17842 Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] RFC: libevent update
If avoiding epoll() makes Josh's problems go away, PLEASE let me know because that might indicate a deficiency in BLCR that I would want to address. -Paul Brian W. Barrett wrote: > Jeff / George - > > Did you add a way to specify which event modules are used? Because epoll > pushs the socket list into the kernel, I can see how it would screw up > BLCR. I bet everything would work if we forced the use of poll / select. > > Brian > > On Tue, 18 Mar 2008, Jeff Squyres wrote: > >> Crud, ok. Keep us posted. >> >> On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote: >> >>> I'm testing with checkpoint/restart and the new libevent seems to be >>> messing up the checkpoints generated by BLCR. I'll be taking a look >>> at it over the next couple of days, but just thought I'd let people >>> know. Unfortunately I don't have any more details at the moment. >>> >>> -- Josh >>> >>> On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote: >>> WHAT: Bring new version of libevent to the trunk. WHY: Newer version, slightly better performance (lower overheads / lighter weight), properly integrate the use of epoll and other scalable fd monitoring mechanisms. WHERE: 98% of the changes are in opal/event; there's a few changes to configury and one change to the orted. TIMEOUT: COB, Friday, 21 March 2008 DESCRIPTION: George/UTK has done the bulk of the work to integrate a new version of libevent on the following tmp branch: https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS BRANCH! ** Cisco ran MTT on this branch on Friday and everything checked out (i.e., no more failures than on the trunk). We just made a few more minor changes today and I'm running MTT again now, but I'm not expecting any new failures (MTT will take several hours). We would like to bring the new libevent in over this upcoming weekend, but would very much appreciate if others could test on their platforms (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be a fairly side-effect free change, but it is possible that since we're now using epoll and other scalable fd monitoring tools, we'll run into some unanticipated issues on some platforms. Here's a consolidated diff if you want to see the changes: https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public% 2Flibevent-merge&old=17846&new_path=trunk&new=17842 Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] RFC: libevent update
George added an MCA parameter for it (opal_event_include is a string that can be set to "select" or "poll"), but it has to be set before opal_init(). Josh: could you try running with the MCA parameter opal_event_include set to "select"? This would confirm Brian's hypothesis... Given that opal_init() is the first thing that happens in ompi_mpi_init(), this may not be enough -- you could *detect* that we can't do BLCR, but this mechanism doesn't allow libmpi to set something saying "reset libevent to be able to only use select()." George -- is that hard to add? I would imagine that it could be kinda difficult to reset libevent after there are already users of it, fd's and other events that may have been added, etc...? On Mar 18, 2008, at 4:29 PM, Brian W. Barrett wrote: Jeff / George - Did you add a way to specify which event modules are used? Because epoll pushs the socket list into the kernel, I can see how it would screw up BLCR. I bet everything would work if we forced the use of poll / select. Brian On Tue, 18 Mar 2008, Jeff Squyres wrote: Crud, ok. Keep us posted. On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote: I'm testing with checkpoint/restart and the new libevent seems to be messing up the checkpoints generated by BLCR. I'll be taking a look at it over the next couple of days, but just thought I'd let people know. Unfortunately I don't have any more details at the moment. -- Josh On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote: WHAT: Bring new version of libevent to the trunk. WHY: Newer version, slightly better performance (lower overheads / lighter weight), properly integrate the use of epoll and other scalable fd monitoring mechanisms. WHERE: 98% of the changes are in opal/event; there's a few changes to configury and one change to the orted. TIMEOUT: COB, Friday, 21 March 2008 DESCRIPTION: George/UTK has done the bulk of the work to integrate a new version of libevent on the following tmp branch: https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS BRANCH! ** Cisco ran MTT on this branch on Friday and everything checked out (i.e., no more failures than on the trunk). We just made a few more minor changes today and I'm running MTT again now, but I'm not expecting any new failures (MTT will take several hours). We would like to bring the new libevent in over this upcoming weekend, but would very much appreciate if others could test on their platforms (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be a fairly side-effect free change, but it is possible that since we're now using epoll and other scalable fd monitoring tools, we'll run into some unanticipated issues on some platforms. Here's a consolidated diff if you want to see the changes: https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public% 2Flibevent-merge&old=17846&new_path=trunk&new=17842 Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] RFC: libevent update
Its like rewriting libevent from scratch. I guess it can be done, but it will be a long and painful process. How about the following solution: - the daemons are aware that the checkpointing is enabled. They can set the environment variable which will force the opal_event_include to be set to select. - as the environment variables have a higher priority over the configuration file, this will work on most cases (except when the user add the mca parameter by hand). - in the checkpoint/restart code, we can add a test that check the value of opal_event_include, print a message if the value is not select, and disable the checkpoint/restart functionality. george. On Mar 18, 2008, at 4:59 PM, Jeff Squyres wrote: George added an MCA parameter for it (opal_event_include is a string that can be set to "select" or "poll"), but it has to be set before opal_init(). Josh: could you try running with the MCA parameter opal_event_include set to "select"? This would confirm Brian's hypothesis... Given that opal_init() is the first thing that happens in ompi_mpi_init(), this may not be enough -- you could *detect* that we can't do BLCR, but this mechanism doesn't allow libmpi to set something saying "reset libevent to be able to only use select()." George -- is that hard to add? I would imagine that it could be kinda difficult to reset libevent after there are already users of it, fd's and other events that may have been added, etc...? On Mar 18, 2008, at 4:29 PM, Brian W. Barrett wrote: Jeff / George - Did you add a way to specify which event modules are used? Because epoll pushs the socket list into the kernel, I can see how it would screw up BLCR. I bet everything would work if we forced the use of poll / select. Brian On Tue, 18 Mar 2008, Jeff Squyres wrote: Crud, ok. Keep us posted. On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote: I'm testing with checkpoint/restart and the new libevent seems to be messing up the checkpoints generated by BLCR. I'll be taking a look at it over the next couple of days, but just thought I'd let people know. Unfortunately I don't have any more details at the moment. -- Josh On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote: WHAT: Bring new version of libevent to the trunk. WHY: Newer version, slightly better performance (lower overheads / lighter weight), properly integrate the use of epoll and other scalable fd monitoring mechanisms. WHERE: 98% of the changes are in opal/event; there's a few changes to configury and one change to the orted. TIMEOUT: COB, Friday, 21 March 2008 DESCRIPTION: George/UTK has done the bulk of the work to integrate a new version of libevent on the following tmp branch: https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS BRANCH! ** Cisco ran MTT on this branch on Friday and everything checked out (i.e., no more failures than on the trunk). We just made a few more minor changes today and I'm running MTT again now, but I'm not expecting any new failures (MTT will take several hours). We would like to bring the new libevent in over this upcoming weekend, but would very much appreciate if others could test on their platforms (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be a fairly side-effect free change, but it is possible that since we're now using epoll and other scalable fd monitoring tools, we'll run into some unanticipated issues on some platforms. Here's a consolidated diff if you want to see the changes: https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public% 2Flibevent-merge&old=17846&new_path=trunk&new=17842 Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] RFC: libevent update
I have some more data from the field. Leaving "opal_event_include" unset (Default) BLCR would give me the following error when trying to restart a 2 process 'noop' MPI application: shell$ ompi-restart ompi_global_snapshot_8587.ckpt Restart failed: Bad file descriptor Restart failed: Bad file descriptor shell$ If I set "opal_event_include" to "select" then I get a different message, this one from Open MPI: shell$ ompi-restart ompi_global_snapshot_8543.ckpt [warn] select: Bad file descriptor [odin001.cs.indiana.edu:18027] opal_event_base_loop: ompi_evesel- >dispatch() failed. [warn] select: Bad file descriptor [odin001.cs.indiana.edu:18027] opal_event_base_loop: ompi_evesel- >dispatch() failed. [warn] select: Bad file descriptor ... This repeats until I kill the restarted job. I've figured out what is outputing the error message, but I can't say exactly why at the moment. Still digging. If I set "opal_event_include" to "poll" then everything is fine. The restart works as expected in all scenarios. :) I'm currently using BLCR 0.6.0 Beta 6 on this machine. I've requested that BLCR be upgraded on this machine so I can test the latest version to see if the poll/epoll problem persists. I'll work with Paul if this turns up anything. As far as what Open MPI needs to do, I don't think we need to do anything at the moment. I can add the MCA parameter to the 'ft-enable- cr' AMCA file which will work as a temporary fix. Thanks for all your help in tracking this problem. Cheers, Josh On Mar 18, 2008, at 5:19 PM, George Bosilca wrote: Its like rewriting libevent from scratch. I guess it can be done, but it will be a long and painful process. How about the following solution: - the daemons are aware that the checkpointing is enabled. They can set the environment variable which will force the opal_event_include to be set to select. - as the environment variables have a higher priority over the configuration file, this will work on most cases (except when the user add the mca parameter by hand). - in the checkpoint/restart code, we can add a test that check the value of opal_event_include, print a message if the value is not select, and disable the checkpoint/restart functionality. george. On Mar 18, 2008, at 4:59 PM, Jeff Squyres wrote: George added an MCA parameter for it (opal_event_include is a string that can be set to "select" or "poll"), but it has to be set before opal_init(). Josh: could you try running with the MCA parameter opal_event_include set to "select"? This would confirm Brian's hypothesis... Given that opal_init() is the first thing that happens in ompi_mpi_init(), this may not be enough -- you could *detect* that we can't do BLCR, but this mechanism doesn't allow libmpi to set something saying "reset libevent to be able to only use select()." George -- is that hard to add? I would imagine that it could be kinda difficult to reset libevent after there are already users of it, fd's and other events that may have been added, etc...? On Mar 18, 2008, at 4:29 PM, Brian W. Barrett wrote: Jeff / George - Did you add a way to specify which event modules are used? Because epoll pushs the socket list into the kernel, I can see how it would screw up BLCR. I bet everything would work if we forced the use of poll / select. Brian On Tue, 18 Mar 2008, Jeff Squyres wrote: Crud, ok. Keep us posted. On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote: I'm testing with checkpoint/restart and the new libevent seems to be messing up the checkpoints generated by BLCR. I'll be taking a look at it over the next couple of days, but just thought I'd let people know. Unfortunately I don't have any more details at the moment. -- Josh On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote: WHAT: Bring new version of libevent to the trunk. WHY: Newer version, slightly better performance (lower overheads / lighter weight), properly integrate the use of epoll and other scalable fd monitoring mechanisms. WHERE: 98% of the changes are in opal/event; there's a few changes to configury and one change to the orted. TIMEOUT: COB, Friday, 21 March 2008 DESCRIPTION: George/UTK has done the bulk of the work to integrate a new version of libevent on the following tmp branch: https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS BRANCH! ** Cisco ran MTT on this branch on Friday and everything checked out (i.e., no more failures than on the trunk). We just made a few more minor changes today and I'm running MTT again now, but I'm not expecting any new failures (MTT will take several hours). We would like to bring the new libevent in over this upcoming weekend, but would very much appreciate if others could test on their platforms (Cisco tests mainly
Re: [OMPI devel] Switching away from SVN?
> It's been loosely proposed that we switch away from SVN into a > different system. This probably warrants some discussion to a) figure > out if we want to move, and b) *if* we want to move, which system > should we move to? One has system been proposed: Mercurial -- several > OMPI developers are using it with good success. I know that some OMPI > developers use Git, too. Are there other systems that we should > consider? As an ompi bystander, I would strongly endorse a switch away from svn. I think that git, hg and bzr are all roughly equivalent -- they each have their enthusiastic partisans, but in reality they're all probably fine. And the difference between svn and any of the newer distributed systems, especially for a big codebase like ompi, is pretty huge. > Primary reasons for doing the switch are: > > - distributed repositories are attractive/useful > - git/Mercurial branching and merging are *way* better than SVN >--> note that SVN v1.5 is supposed to be *much* better than v1.4 Also, svn is much slower for lots of things, to the point where it becomes a usability issue. And supporting disconnected operation (aka "working on a plane") is another really nice bonus. > - how to import all the SVN history to the new system Should not be a big problem -- since svn at least has atomic changesets, you avoid all the pain of parsing cvs repositories, and there fairly mature svn importers for distributed systems. - R.
Re: [OMPI devel] RFC: libevent update
I found another problem with the libevent branch. If I set "-mca btl tcp,self" on the command line then I get a segfult when sending messages > 16 KB. I can try to make a smaller repeater, but if you use the "progress" or "simple" tests in ompi-tests below: https://svn.open-mpi.org/svn/ompi-tests/trunk/iu/ft/correctness To build: shell$ make To run with failure: shell$ mpirun -np 2 -mca btl tcp,self progress -s 16 -v 1 To run without failure: shell$ mpirun -np 2 -mca btl tcp,self progress -s 15 -v 1 This program will display the message "Checkpoint at any time...". If you send mpirun SIGUSR2 it will progress to the next stage of the test. The failure occurs when the first message before this becomes an issue though. I was using Odin, and if I do not specify the btls then the test will pass as normal. The backtrace is below: -- ... Core was generated by `progress -s 16 -v 1'. Program terminated with signal 11, Segmentation fault. #0 0x002a9793318b in mca_bml_base_free (bml_btl=0x736275705f61636d, des=0x559700) at ../../../../ompi/mca/ bml/bml.h:267 267 bml_btl->btl_free( bml_btl->btl, des ); (gdb) bt #0 0x002a9793318b in mca_bml_base_free (bml_btl=0x736275705f61636d, des=0x559700) at ../../../../ompi/mca/ bml/bml.h:267 #1 0x002a9793304d in mca_pml_ob1_put_completion (btl=0x5598c0, ep=0x0, des=0x559700, status=0) at pml_ob1_recvreq.c:190 #2 0x002a97930069 in mca_pml_ob1_recv_frag_callback (btl=0x5598c0, tag=64 '@', des=0x2a989d2b00, cbdata=0x0) at pml_ob1_recvfrag.c:149 #3 0x002a97d5f3e0 in mca_btl_tcp_endpoint_recv_handler (sd=10, flags=2, user=0x5a5df0) at btl_tcp_endpoint.c:696 #4 0x002a95a0ab93 in event_process_active (base=0x508c80) at event.c:591 #5 0x002a95a0af59 in opal_event_base_loop (base=0x508c80, flags=2) at event.c:763 #6 0x002a95a0ad2b in opal_event_loop (flags=2) at event.c:670 #7 0x002a959fadf8 in opal_progress () at runtime/opal_progress.c: 169 #8 0x002a9792caae in opal_condition_wait (c=0x2a9587d940, m=0x2a9587d9c0) at ../../../../opal/threads/condition.h:93 #9 0x002a9792c9dd in ompi_request_wait_completion (req=0x5a5380) at ../../../../ompi/request/request.h:381 #10 0x002a9792c920 in mca_pml_ob1_recv (addr=0x5baf70, count=16384, datatype=0x503770, src=1, tag=1001, comm=0x5039a0, status=0x0) at pml_ob1_irecv.c:104 #11 0x002a956f1f00 in PMPI_Recv (buf=0x5baf70, count=16384, type=0x503770, source=1, tag=1001, comm=0x5039a0, status=0x0) at precv.c:75 #12 0x0040211f in exchange_stage1 (ckpt_num=1) at progress.c:414 #13 0x00401295 in main (argc=5, argv=0x7fbfffe668) at progress.c:131 (gdb) p bml_btl $1 = (mca_bml_base_btl_t *) 0x736275705f61636d (gdb) p *bml_btl Cannot access memory at address 0x736275705f61636d -- -- Josh On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote: WHAT: Bring new version of libevent to the trunk. WHY: Newer version, slightly better performance (lower overheads / lighter weight), properly integrate the use of epoll and other scalable fd monitoring mechanisms. WHERE: 98% of the changes are in opal/event; there's a few changes to configury and one change to the orted. TIMEOUT: COB, Friday, 21 March 2008 DESCRIPTION: George/UTK has done the bulk of the work to integrate a new version of libevent on the following tmp branch: https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS BRANCH! ** Cisco ran MTT on this branch on Friday and everything checked out (i.e., no more failures than on the trunk). We just made a few more minor changes today and I'm running MTT again now, but I'm not expecting any new failures (MTT will take several hours). We would like to bring the new libevent in over this upcoming weekend, but would very much appreciate if others could test on their platforms (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be a fairly side-effect free change, but it is possible that since we're now using epoll and other scalable fd monitoring tools, we'll run into some unanticipated issues on some platforms. Here's a consolidated diff if you want to see the changes: https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public% 2Flibevent-merge&old=17846&new_path=trunk&new=17842 Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] RFC: libevent update
This has been fixed in the trunk, but not yet merged in the branch. george. On Mar 18, 2008, at 7:17 PM, Josh Hursey wrote: I found another problem with the libevent branch. If I set "-mca btl tcp,self" on the command line then I get a segfult when sending messages > 16 KB. I can try to make a smaller repeater, but if you use the "progress" or "simple" tests in ompi-tests below: https://svn.open-mpi.org/svn/ompi-tests/trunk/iu/ft/correctness To build: shell$ make To run with failure: shell$ mpirun -np 2 -mca btl tcp,self progress -s 16 -v 1 To run without failure: shell$ mpirun -np 2 -mca btl tcp,self progress -s 15 -v 1 This program will display the message "Checkpoint at any time...". If you send mpirun SIGUSR2 it will progress to the next stage of the test. The failure occurs when the first message before this becomes an issue though. I was using Odin, and if I do not specify the btls then the test will pass as normal. The backtrace is below: -- ... Core was generated by `progress -s 16 -v 1'. Program terminated with signal 11, Segmentation fault. #0 0x002a9793318b in mca_bml_base_free (bml_btl=0x736275705f61636d, des=0x559700) at ../../../../ompi/mca/ bml/bml.h:267 267 bml_btl->btl_free( bml_btl->btl, des ); (gdb) bt #0 0x002a9793318b in mca_bml_base_free (bml_btl=0x736275705f61636d, des=0x559700) at ../../../../ompi/mca/ bml/bml.h:267 #1 0x002a9793304d in mca_pml_ob1_put_completion (btl=0x5598c0, ep=0x0, des=0x559700, status=0) at pml_ob1_recvreq.c:190 #2 0x002a97930069 in mca_pml_ob1_recv_frag_callback (btl=0x5598c0, tag=64 '@', des=0x2a989d2b00, cbdata=0x0) at pml_ob1_recvfrag.c:149 #3 0x002a97d5f3e0 in mca_btl_tcp_endpoint_recv_handler (sd=10, flags=2, user=0x5a5df0) at btl_tcp_endpoint.c:696 #4 0x002a95a0ab93 in event_process_active (base=0x508c80) at event.c:591 #5 0x002a95a0af59 in opal_event_base_loop (base=0x508c80, flags=2) at event.c:763 #6 0x002a95a0ad2b in opal_event_loop (flags=2) at event.c:670 #7 0x002a959fadf8 in opal_progress () at runtime/opal_progress.c: 169 #8 0x002a9792caae in opal_condition_wait (c=0x2a9587d940, m=0x2a9587d9c0) at ../../../../opal/threads/condition.h:93 #9 0x002a9792c9dd in ompi_request_wait_completion (req=0x5a5380) at ../../../../ompi/request/request.h:381 #10 0x002a9792c920 in mca_pml_ob1_recv (addr=0x5baf70, count=16384, datatype=0x503770, src=1, tag=1001, comm=0x5039a0, status=0x0) at pml_ob1_irecv.c:104 #11 0x002a956f1f00 in PMPI_Recv (buf=0x5baf70, count=16384, type=0x503770, source=1, tag=1001, comm=0x5039a0, status=0x0) at precv.c:75 #12 0x0040211f in exchange_stage1 (ckpt_num=1) at progress.c: 414 #13 0x00401295 in main (argc=5, argv=0x7fbfffe668) at progress.c:131 (gdb) p bml_btl $1 = (mca_bml_base_btl_t *) 0x736275705f61636d (gdb) p *bml_btl Cannot access memory at address 0x736275705f61636d -- -- Josh On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote: WHAT: Bring new version of libevent to the trunk. WHY: Newer version, slightly better performance (lower overheads / lighter weight), properly integrate the use of epoll and other scalable fd monitoring mechanisms. WHERE: 98% of the changes are in opal/event; there's a few changes to configury and one change to the orted. TIMEOUT: COB, Friday, 21 March 2008 DESCRIPTION: George/UTK has done the bulk of the work to integrate a new version of libevent on the following tmp branch: https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS BRANCH! ** Cisco ran MTT on this branch on Friday and everything checked out (i.e., no more failures than on the trunk). We just made a few more minor changes today and I'm running MTT again now, but I'm not expecting any new failures (MTT will take several hours). We would like to bring the new libevent in over this upcoming weekend, but would very much appreciate if others could test on their platforms (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be a fairly side-effect free change, but it is possible that since we're now using epoll and other scalable fd monitoring tools, we'll run into some unanticipated issues on some platforms. Here's a consolidated diff if you want to see the changes: https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public% 2Flibevent-merge&old=17846&new_path=trunk&new=17842 Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] Switching away from SVN?
On Mar 18, 2008, at 7:02 PM, Roland Dreier wrote: Primary reasons for doing the switch are: - distributed repositories are attractive/useful - git/Mercurial branching and merging are *way* better than SVN --> note that SVN v1.5 is supposed to be *much* better than v1.4 Also, svn is much slower for lots of things, to the point where it becomes a usability issue. And supporting disconnected operation (aka "working on a plane") is another really nice bonus. This is a good point - I've [briefly] used both git and Mercurial; as part of their "*way* better support for branching and merging" is speed. A goodly-sized merge in SVN can take an hour or more. I've done goodly-sized merges in git and hg in seconds (or minutes). - how to import all the SVN history to the new system Should not be a big problem -- since svn at least has atomic changesets, you avoid all the pain of parsing cvs repositories, and there fairly mature svn importers for distributed systems. Agreed -- I'm sure it *can* be done; we just have to spend a few cycles to figure out how to do it properly. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] RFC: libevent update
After taking a look at how epoll is implemented in the Linyux kernel, I can say with 100% certainty that BLCR will not restore the epoll fd correctly. I hope to fix that eventually, but have too many other things on my plate to address is now. Since I cannot promise how soon BLCR may be able to resolve this problem, I suggest that Josh continue exploring the alternatives. At least "opal_event_include" set to "poll" appears to work. It is not clear to me if the "select" problem is related to BLCR or not. I am guessing that I don't get a say as to weather the BLCR/epoll problems should delay the libevent merge, but I trust the rest of you to determine what is in the best interest of OMPI. -Paul Josh Hursey wrote: > I have some more data from the field. > > Leaving "opal_event_include" unset (Default) BLCR would give me the > following error when trying to restart a 2 process 'noop' MPI > application: > > shell$ ompi-restart ompi_global_snapshot_8587.ckpt > Restart failed: Bad file descriptor > Restart failed: Bad file descriptor > shell$ > [snip] -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] RFC: libevent update
When did you fix it? I merged the trunk down to the libevent-merge branch late this afternoon (r17869). On Mar 18, 2008, at 7:29 PM, George Bosilca wrote: This has been fixed in the trunk, but not yet merged in the branch. george. On Mar 18, 2008, at 7:17 PM, Josh Hursey wrote: I found another problem with the libevent branch. If I set "-mca btl tcp,self" on the command line then I get a segfult when sending messages > 16 KB. I can try to make a smaller repeater, but if you use the "progress" or "simple" tests in ompi-tests below: https://svn.open-mpi.org/svn/ompi-tests/trunk/iu/ft/correctness To build: shell$ make To run with failure: shell$ mpirun -np 2 -mca btl tcp,self progress -s 16 -v 1 To run without failure: shell$ mpirun -np 2 -mca btl tcp,self progress -s 15 -v 1 This program will display the message "Checkpoint at any time...". If you send mpirun SIGUSR2 it will progress to the next stage of the test. The failure occurs when the first message before this becomes an issue though. I was using Odin, and if I do not specify the btls then the test will pass as normal. The backtrace is below: -- ... Core was generated by `progress -s 16 -v 1'. Program terminated with signal 11, Segmentation fault. #0 0x002a9793318b in mca_bml_base_free (bml_btl=0x736275705f61636d, des=0x559700) at ../../../../ompi/mca/ bml/bml.h:267 267 bml_btl->btl_free( bml_btl->btl, des ); (gdb) bt #0 0x002a9793318b in mca_bml_base_free (bml_btl=0x736275705f61636d, des=0x559700) at ../../../../ompi/mca/ bml/bml.h:267 #1 0x002a9793304d in mca_pml_ob1_put_completion (btl=0x5598c0, ep=0x0, des=0x559700, status=0) at pml_ob1_recvreq.c:190 #2 0x002a97930069 in mca_pml_ob1_recv_frag_callback (btl=0x5598c0, tag=64 '@', des=0x2a989d2b00, cbdata=0x0) at pml_ob1_recvfrag.c:149 #3 0x002a97d5f3e0 in mca_btl_tcp_endpoint_recv_handler (sd=10, flags=2, user=0x5a5df0) at btl_tcp_endpoint.c:696 #4 0x002a95a0ab93 in event_process_active (base=0x508c80) at event.c:591 #5 0x002a95a0af59 in opal_event_base_loop (base=0x508c80, flags=2) at event.c:763 #6 0x002a95a0ad2b in opal_event_loop (flags=2) at event.c:670 #7 0x002a959fadf8 in opal_progress () at runtime/ opal_progress.c: 169 #8 0x002a9792caae in opal_condition_wait (c=0x2a9587d940, m=0x2a9587d9c0) at ../../../../opal/threads/condition.h:93 #9 0x002a9792c9dd in ompi_request_wait_completion (req=0x5a5380) at ../../../../ompi/request/request.h:381 #10 0x002a9792c920 in mca_pml_ob1_recv (addr=0x5baf70, count=16384, datatype=0x503770, src=1, tag=1001, comm=0x5039a0, status=0x0) at pml_ob1_irecv.c:104 #11 0x002a956f1f00 in PMPI_Recv (buf=0x5baf70, count=16384, type=0x503770, source=1, tag=1001, comm=0x5039a0, status=0x0) at precv.c:75 #12 0x0040211f in exchange_stage1 (ckpt_num=1) at progress.c:414 #13 0x00401295 in main (argc=5, argv=0x7fbfffe668) at progress.c:131 (gdb) p bml_btl $1 = (mca_bml_base_btl_t *) 0x736275705f61636d (gdb) p *bml_btl Cannot access memory at address 0x736275705f61636d -- -- Josh On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote: WHAT: Bring new version of libevent to the trunk. WHY: Newer version, slightly better performance (lower overheads / lighter weight), properly integrate the use of epoll and other scalable fd monitoring mechanisms. WHERE: 98% of the changes are in opal/event; there's a few changes to configury and one change to the orted. TIMEOUT: COB, Friday, 21 March 2008 DESCRIPTION: George/UTK has done the bulk of the work to integrate a new version of libevent on the following tmp branch: https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS BRANCH! ** Cisco ran MTT on this branch on Friday and everything checked out (i.e., no more failures than on the trunk). We just made a few more minor changes today and I'm running MTT again now, but I'm not expecting any new failures (MTT will take several hours). We would like to bring the new libevent in over this upcoming weekend, but would very much appreciate if others could test on their platforms (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be a fairly side-effect free change, but it is possible that since we're now using epoll and other scalable fd monitoring tools, we'll run into some unanticipated issues on some platforms. Here's a consolidated diff if you want to see the changes: https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public% 2Flibevent-merge&old=17846&new_path=trunk&new=17842 Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.c
Re: [OMPI devel] rankfile questions
Not trying to pile on here...but I do have a question. This commit inserted a bunch of affinity-specific code in ompi_mpi_init.c. Was this truly necessary? It seems to me this violates our code architecture. Affinity-specific code belongs in the opal_p[m]affinity functions. Why aren't we just calling a "opal_paffinity_set_my_processor" function (or whatever name you like) in mpi_init, and doing all this paffinity stuff there? It would make mpi_init a lot cleaner, and preserve the code standards we have had since the beginning. In addition, the code that has been added returns ORTE error and success codes. Given the location, it should be OMPI error and success codes - if we move it to where I think it belongs (in OPAL), then those codes should obviously be OPAL codes. If I'm missing some reason why these things can't be done, please enlighten me. Otherwise, it would be nice if this could be cleaned up. Thanks Ralph On 3/18/08 8:39 AM, "Jeff Squyres" wrote: > On Mar 18, 2008, at 9:32 AM, Jeff Squyres wrote: > >> I notice that rankfile didn't compile properly on some platforms and >> issued warnings on other platforms. Thanks to Ralph for cleaning it >> up... >> >> 1. I see a getenv("slot_list") in the MPI side of the code; it looks >> like $slot_list is set by the odls for the MPI process. Why isn't it >> an MCA parameter? That's what all other values passed by the orted to >> the MPI process appear to be. >> >> 2. I see that ompi_mpi_params.c is now registering 2 rmaps-level MCA >> parameters. Why? Shouldn't these be in ORTE somewhere? > > > A few more notes: > > 3. Most of the files in orte/mca/rmaps/rankfile do not obey the prefix > rule. I think that they should be renamed. > > 4. A quick look through rankfile_lex.l seems to show that there are > global variables that are not protected by the prefix rule (or > static). Ditto in rmaps_rf.c. These should be fixed. > > 5. rank_file_done was instantiated in both rankfile_lex.l and > ramps_rf.c (causing a duplicate symbol linker error on OS X). I > removed it from rmaps_rf.c (it was declared "extern" in > rankfile_lex.h, assumedly to indicate that it is "owned" by the lex.l > file...?). > > 6. svn:ignore was not set in the new rankfile directory.
Re: [OMPI devel] RFC: libevent update
Commit 17872 is the one you're looking for. https://svn.open-mpi.org/trac/ompi/changeset/17872 george. On Mar 18, 2008, at 9:12 PM, Jeff Squyres wrote: When did you fix it? I merged the trunk down to the libevent-merge branch late this afternoon (r17869). On Mar 18, 2008, at 7:29 PM, George Bosilca wrote: This has been fixed in the trunk, but not yet merged in the branch. george. On Mar 18, 2008, at 7:17 PM, Josh Hursey wrote: I found another problem with the libevent branch. If I set "-mca btl tcp,self" on the command line then I get a segfult when sending messages > 16 KB. I can try to make a smaller repeater, but if you use the "progress" or "simple" tests in ompi-tests below: https://svn.open-mpi.org/svn/ompi-tests/trunk/iu/ft/correctness To build: shell$ make To run with failure: shell$ mpirun -np 2 -mca btl tcp,self progress -s 16 -v 1 To run without failure: shell$ mpirun -np 2 -mca btl tcp,self progress -s 15 -v 1 This program will display the message "Checkpoint at any time...". If you send mpirun SIGUSR2 it will progress to the next stage of the test. The failure occurs when the first message before this becomes an issue though. I was using Odin, and if I do not specify the btls then the test will pass as normal. The backtrace is below: -- ... Core was generated by `progress -s 16 -v 1'. Program terminated with signal 11, Segmentation fault. #0 0x002a9793318b in mca_bml_base_free (bml_btl=0x736275705f61636d, des=0x559700) at ../../../../ompi/mca/ bml/bml.h:267 267 bml_btl->btl_free( bml_btl->btl, des ); (gdb) bt #0 0x002a9793318b in mca_bml_base_free (bml_btl=0x736275705f61636d, des=0x559700) at ../../../../ompi/mca/ bml/bml.h:267 #1 0x002a9793304d in mca_pml_ob1_put_completion (btl=0x5598c0, ep=0x0, des=0x559700, status=0) at pml_ob1_recvreq.c:190 #2 0x002a97930069 in mca_pml_ob1_recv_frag_callback (btl=0x5598c0, tag=64 '@', des=0x2a989d2b00, cbdata=0x0) at pml_ob1_recvfrag.c:149 #3 0x002a97d5f3e0 in mca_btl_tcp_endpoint_recv_handler (sd=10, flags=2, user=0x5a5df0) at btl_tcp_endpoint.c:696 #4 0x002a95a0ab93 in event_process_active (base=0x508c80) at event.c:591 #5 0x002a95a0af59 in opal_event_base_loop (base=0x508c80, flags=2) at event.c:763 #6 0x002a95a0ad2b in opal_event_loop (flags=2) at event.c:670 #7 0x002a959fadf8 in opal_progress () at runtime/ opal_progress.c: 169 #8 0x002a9792caae in opal_condition_wait (c=0x2a9587d940, m=0x2a9587d9c0) at ../../../../opal/threads/condition.h:93 #9 0x002a9792c9dd in ompi_request_wait_completion (req=0x5a5380) at ../../../../ompi/request/request.h:381 #10 0x002a9792c920 in mca_pml_ob1_recv (addr=0x5baf70, count=16384, datatype=0x503770, src=1, tag=1001, comm=0x5039a0, status=0x0) at pml_ob1_irecv.c:104 #11 0x002a956f1f00 in PMPI_Recv (buf=0x5baf70, count=16384, type=0x503770, source=1, tag=1001, comm=0x5039a0, status=0x0) at precv.c:75 #12 0x0040211f in exchange_stage1 (ckpt_num=1) at progress.c:414 #13 0x00401295 in main (argc=5, argv=0x7fbfffe668) at progress.c:131 (gdb) p bml_btl $1 = (mca_bml_base_btl_t *) 0x736275705f61636d (gdb) p *bml_btl Cannot access memory at address 0x736275705f61636d -- -- Josh On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote: WHAT: Bring new version of libevent to the trunk. WHY: Newer version, slightly better performance (lower overheads / lighter weight), properly integrate the use of epoll and other scalable fd monitoring mechanisms. WHERE: 98% of the changes are in opal/event; there's a few changes to configury and one change to the orted. TIMEOUT: COB, Friday, 21 March 2008 DESCRIPTION: George/UTK has done the bulk of the work to integrate a new version of libevent on the following tmp branch: https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS BRANCH! ** Cisco ran MTT on this branch on Friday and everything checked out (i.e., no more failures than on the trunk). We just made a few more minor changes today and I'm running MTT again now, but I'm not expecting any new failures (MTT will take several hours). We would like to bring the new libevent in over this upcoming weekend, but would very much appreciate if others could test on their platforms (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should* be a fairly side-effect free change, but it is possible that since we're now using epoll and other scalable fd monitoring tools, we'll run into some unanticipated issues on some platforms. Here's a consolidated diff if you want to see the changes: https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public% 2Flibevent-merge&old=17846&new_path=trunk&new=17842 Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listin