Re: [OMPI devel] IOF repair
On Thu, 10 Jul 2008, Ralph Castain wrote: We would appreciate it if people could test this to the extent possible over the next few days. Please let us know (good or bad) so we can decide whether or not to move it to the 1.3 release branch. I've tested with r18878 and the strange behaviour mentioned in a previous e-mail (which happened with 1.3a1r18769) has disappeared, CHARMM can again read its instructions properly from stdin. Thanks for the quick resolution! -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI devel] open ib dependency question
On Thu, 10 Jul 2008, Pavel Shamis (Pasha) wrote: FYI the issue was resolved - https://svn.open-mpi.org/trac/ompi/ticket/1376 Indeed, no more IBCM error message displayed with r18878. Thank you ! -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI devel] === CREATE FAILURE ===
I can find no reason that this failed. :-\ I am unable to duplicate the problem, and this area of code has not changed in a while -- I don't know why plpa/src/plpa-taskset/parser.c would suddenly be left behind. On Jul 10, 2008, at 9:24 PM, MPI Team wrote: ERROR: Command returned a non-zero exist status make distcheck Start time: Thu Jul 10 21:00:14 EDT 2008 End time: Thu Jul 10 21:24:55 EDT 2008 = == [... previous lines snipped ...] test -z "" || rm -f rm -f class/.deps/.dirstamp rm -f class/.dirstamp rm -f dss/.deps/.dirstamp rm -f dss/.dirstamp rm -f memoryhooks/.deps/.dirstamp rm -f memoryhooks/.dirstamp rm -f runtime/.deps/.dirstamp rm -f runtime/.dirstamp rm -f threads/.deps/.dirstamp rm -f threads/.dirstamp rm -f win32/.deps/.dirstamp rm -f win32/.dirstamp rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags make[3]: Leaving directory `/home/mpiteam/openmpi/nightly-tarball- build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build/opal' rm -rf class/.deps dss/.deps memoryhooks/.deps runtime/.deps threads/.deps win32/.deps rm -f Makefile make[2]: Leaving directory `/home/mpiteam/openmpi/nightly-tarball- build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build/opal' Making distclean in contrib make[2]: Entering directory `/home/mpiteam/openmpi/nightly-tarball- build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build/contrib' test -z "*~ .#*" || rm -f *~ .#* rm -rf .libs _libs rm -f *.lo test -z "" || rm -f rm -f Makefile make[2]: Leaving directory `/home/mpiteam/openmpi/nightly-tarball- build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build/contrib' Making distclean in config make[2]: Entering directory `/home/mpiteam/openmpi/nightly-tarball- build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build/config' test -z "*~ .#*" || rm -f *~ .#* rm -rf .libs _libs rm -f *.lo test -z "" || rm -f rm -f Makefile make[2]: Leaving directory `/home/mpiteam/openmpi/nightly-tarball- build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build/config' Making distclean in . make[2]: Entering directory `/home/mpiteam/openmpi/nightly-tarball- build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build' test -z "*~ .#*" || rm -f *~ .#* rm -rf .libs _libs rm -f *.lo test -z "ompi/include/ompi/version.h orte/include/orte/version.h opal/include/opal/version.h" || rm -f ompi/include/ompi/version.h orte/include/orte/version.h opal/include/opal/version.h rm -f libtool rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags make[2]: Leaving directory `/home/mpiteam/openmpi/nightly-tarball- build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build' rm -f config.status config.cache config.log configure.lineno config.status.lineno rm -f Makefile ERROR: files left in build directory after distclean: ./opal/mca/paffinity/linux/plpa/src/plpa-taskset/parser.c make[1]: *** [distcleancheck] Error 1 make[1]: Leaving directory `/home/mpiteam/openmpi/nightly-tarball- build-root/v1.3/create-r18869/ompi/openmpi-1.3a1r18869/_build' make: *** [distcheck] Error 2 = == Your friendly daemon, Cyrador ___ testing mailing list test...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/testing -- Jeff Squyres Cisco Systems
Re: [OMPI devel] v1.3 RM: need a ruling
Jeff Squyres wrote: Check that -- Ralph and I talked more about #1383 and have come up with a decent/better solution that a) is not wonky and b) does not involve MCA parameter synonyms. We're working on it in an hg and will put it back when done (probably within a business day or three). So I think the MCA synonym stuff *isn't* needed for v1.3 after all. I am not dead yet!!! So, there was also the name change of pls_rsh_agent to plm_rsh_agent because the pls's were sucked into plm's (or so I believe). Anyways, this seems like another case to support synonyms. Also are there other pls mca parameters that have had their names changed to plm? --td I think the MCA param synonyms and "deprecated" stuff is useful for the future, but at this point, nothing in v1.3 would use it. So my $0.02 is that we should leave it out. On Jul 10, 2008, at 2:00 PM, Jeff Squyres (jsquyres) wrote: K, will do. Note that it turns out that we did not yet solve the mpi_paffinity_alone issue, but we're working on it. I'm working on the IOF issue ATM; will return to mpi_paffinity_alone in a bit... On Jul 10, 2008, at 1:56 PM, George Bosilca wrote: > I'm 100% with Brad on this. Please go ahead and include this feature > in the 1.3. > > george. > > On Jul 10, 2008, at 11:33 AM, Brad Benton wrote: > >> I think this is very reasonable to go ahead and include for 1.3. I >> find that preferable to a 1.3-specific "wonky" workaround. Plus, >> this sounds like something that is very good to have in general. >> >> --brad >> >> >> On Wed, Jul 9, 2008 at 8:49 PM, Jeff Squyres >> wrote: >> v1.3 RMs: Due to some recent work, the MCA parameter >> mpi_paffinity_alone disappeared -- it was moved and renamed to be >> opal_paffinity_alone. This is Bad because we have a lot of >> historical precent based on the MCA param name >> "mpi_paffinity_alone" (FAQ, PPT presentations, e-mails on public >> lists, etc.). So it needed to be restored for v1.3. I just >> noticed that I hadn't opened a ticket on this -- sorry -- I opened >> #1383 tonight. >> >> For a variety of reasons described in the commit message r1383, >> Lenny and I first decided that it would be best to fix this problem >> by the functionality committed in r18770 (have the ability to find >> out where an MCA parameter was set). This would allow us to >> register two MCA params: mpi_paffinity_alone and >> opal_paffinity_alone, and generally do the Right Thing (because we >> could then tell if a user had set a value or whether it was a >> default MCA param value). This functionality will also be useful >> in the openib BTL, where there is a blend of MCA parameters and INI >> file parameters. >> >> However, after doing that, it seemed like only a few more steps to >> implement an overall better solution: implement "synonyms" for MCA >> parameters. I.e., register the name "mpi_paffinity_alone" as a >> synonym for opal_paffinity_alone. Along the way, it was trivial to >> add a "deprecated" flag for MCA parameters that we no longer want >> to use anymore (this deprecated flag is also useful in the OB1 PML >> and openib BTL). >> >> So to fix a problem that needed to be fixed for v1.3 (restore the >> MCA parameter "mpi_paffinity_alone"), I ended up implementing new >> functionality. >> >> Can this go into v1.3, or do we need to implement some kind of >> alternate fix? (I admit to not having thought through what it >> would take to fix without the new MCA parameter functionality -- it >> might be kinda wonky) >> >> -- >> Jeff Squyres >> Cisco Systems >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] v1.3 RM: need a ruling
On 7/11/08 7:48 AM, "Terry Dontje" wrote: > Jeff Squyres wrote: >> Check that -- Ralph and I talked more about #1383 and have come up >> with a decent/better solution that a) is not wonky and b) does not >> involve MCA parameter synonyms. We're working on it in an hg and will >> put it back when done (probably within a business day or three). >> >> So I think the MCA synonym stuff *isn't* needed for v1.3 after all. >> > I am not dead yet!!! > > So, there was also the name change of pls_rsh_agent to plm_rsh_agent > because the pls's were sucked into plm's (or so I believe). Anyways, > this seems like another case to support synonyms. Also are there other > pls mca parameters that have had their names changed to plm? I think you're opening a really ugly can of worms. How far back do you want to go? v1.0? v0.1? We have a history of changing mca param names across major releases, so trying to keep everything alive could well become a nightmare. I would hate to try and figure out all the changes - and what about the params that simply have disappeared, or had their functionality absorbed by some combination of other params? My head aches already... :-) Ralph > > --td >> I think the MCA param synonyms and "deprecated" stuff is useful for >> the future, but at this point, nothing in v1.3 would use it. So my >> $0.02 is that we should leave it out. >> >> >> >> On Jul 10, 2008, at 2:00 PM, Jeff Squyres (jsquyres) wrote: >> >>> K, will do. Note that it turns out that we did not yet solve the >>> mpi_paffinity_alone issue, but we're working on it. I'm working on >>> the IOF issue ATM; will return to mpi_paffinity_alone in a bit... >>> >>> >>> On Jul 10, 2008, at 1:56 PM, George Bosilca wrote: >>> I'm 100% with Brad on this. Please go ahead and include this feature in the 1.3. george. On Jul 10, 2008, at 11:33 AM, Brad Benton wrote: > I think this is very reasonable to go ahead and include for 1.3. I > find that preferable to a 1.3-specific "wonky" workaround. Plus, > this sounds like something that is very good to have in general. > > --brad > > > On Wed, Jul 9, 2008 at 8:49 PM, Jeff Squyres > wrote: > v1.3 RMs: Due to some recent work, the MCA parameter > mpi_paffinity_alone disappeared -- it was moved and renamed to be > opal_paffinity_alone. This is Bad because we have a lot of > historical precent based on the MCA param name > "mpi_paffinity_alone" (FAQ, PPT presentations, e-mails on public > lists, etc.). So it needed to be restored for v1.3. I just > noticed that I hadn't opened a ticket on this -- sorry -- I opened > #1383 tonight. > > For a variety of reasons described in the commit message r1383, > Lenny and I first decided that it would be best to fix this problem > by the functionality committed in r18770 (have the ability to find > out where an MCA parameter was set). This would allow us to > register two MCA params: mpi_paffinity_alone and > opal_paffinity_alone, and generally do the Right Thing (because we > could then tell if a user had set a value or whether it was a > default MCA param value). This functionality will also be useful > in the openib BTL, where there is a blend of MCA parameters and INI > file parameters. > > However, after doing that, it seemed like only a few more steps to > implement an overall better solution: implement "synonyms" for MCA > parameters. I.e., register the name "mpi_paffinity_alone" as a > synonym for opal_paffinity_alone. Along the way, it was trivial to > add a "deprecated" flag for MCA parameters that we no longer want > to use anymore (this deprecated flag is also useful in the OB1 PML > and openib BTL). > > So to fix a problem that needed to be fixed for v1.3 (restore the > MCA parameter "mpi_paffinity_alone"), I ended up implementing new > functionality. > > Can this go into v1.3, or do we need to implement some kind of > alternate fix? (I admit to not having thought through what it > would take to fix without the new MCA parameter functionality -- it > might be kinda wonky) > > -- > Jeff Squyres > Cisco Systems > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> -- >>> Jeff Squyres >>> Cisco Systems >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mail
[OMPI devel] PLM consistency: launch agent param
Since the question of backward compatibility of params came up... ;-) I've been perusing the various PLM modules to check consistency. One thing I noted right away is that -every- PLM module registers an MCA param to let the user specify an orted cmd. I believe this specifically was done so people could insert their favorite debugger in front of the "orted" on the spawned command line - e.g., "valgrind orted". The problem is that this forces the user to have to figure out the name of the PLM module being used as the param is called "-mca plm_rsh_agent", or "-mca plm_lsf_orted", or...you name it. For users that only ever operate in one environment, who cares. However, many users (at least around here) operate in multiple environments, and this creates confusion. I propose to create a single MCA param name for this value - something like "-mca plm_launch_agent" or whatever - and get rid of all these individual registrations to reduce the user confusion. Comments? I'll put my helmet on Ralph
Re: [OMPI devel] v1.3 RM: need a ruling
On Jul 11, 2008, at 9:48 AM, Terry Dontje wrote: Check that -- Ralph and I talked more about #1383 and have come up with a decent/better solution that a) is not wonky and b) does not involve MCA parameter synonyms. We're working on it in an hg and will put it back when done (probably within a business day or three). So I think the MCA synonym stuff *isn't* needed for v1.3 after all. I am not dead yet!!! So, there was also the name change of pls_rsh_agent to plm_rsh_agent because the pls's were sucked into plm's (or so I believe). Anyways, this seems like another case to support synonyms. Also are there other pls mca parameters that have had their names changed to plm? All of them, right? The whole pls framework is gone -- replaced by plm. There are some OB1 and openib parameters that got renamed, too (probably in other BTLs as well -- the pipeline parameters, etc.). So if we want to bring this functionality over, we can, but we should then also commit to adding deprecated synonyms for all the old names. It's not hard to do (2 function calls per deprecated name: 1) lookup the index of the new name, 2) register a deprecated synonym for that new name), but it does involve some menial labor. Ralph raises a good point -- perhaps we should have a definitive policy (that starts in v1.3) about MCA parameters. I know that Sun has examples for this stuff. Is there one that we can implement easily? -- Jeff Squyres Cisco Systems
Re: [OMPI devel] v1.3 RM: need a ruling
Ralph H Castain wrote: On 7/11/08 7:48 AM, "Terry Dontje" wrote: Jeff Squyres wrote: Check that -- Ralph and I talked more about #1383 and have come up with a decent/better solution that a) is not wonky and b) does not involve MCA parameter synonyms. We're working on it in an hg and will put it back when done (probably within a business day or three). So I think the MCA synonym stuff *isn't* needed for v1.3 after all. I am not dead yet!!! So, there was also the name change of pls_rsh_agent to plm_rsh_agent because the pls's were sucked into plm's (or so I believe). Anyways, this seems like another case to support synonyms. Also are there other pls mca parameters that have had their names changed to plm? I think you're opening a really ugly can of worms. How far back do you want to go? v1.0? v0.1? We have a history of changing mca param names across major releases, so trying to keep everything alive could well become a nightmare. I am only asking to be compatible with the last release (however that might have an interpretation of inifinity :-). Seriously, though I think we need to be very careful about renaming mca parameters because this will screw production sites and ISV's which use scripts to launch their apps. So a change could render their scripts useless (the paffinity param is a perfect example of this). I don't really want to promote keeping everything alive forever but in cases where the only change is a 3-4 letter prefix it almost looks random to people outside of the community. I would hate to try and figure out all the changes - and what about the params that simply have disappeared, or had their functionality absorbed by some combination of other params? So, I think if a functionality is not supported or the way you drive it is completely different then I agree with you trying to support a round peg to fit in a square hole is silly. But if the feature is one for one except in name only then I think we need to ask ourselves if we really want/need to drop the original name. My head aches already... :-) Take two aspirins... --td Ralph --td I think the MCA param synonyms and "deprecated" stuff is useful for the future, but at this point, nothing in v1.3 would use it. So my $0.02 is that we should leave it out. On Jul 10, 2008, at 2:00 PM, Jeff Squyres (jsquyres) wrote: K, will do. Note that it turns out that we did not yet solve the mpi_paffinity_alone issue, but we're working on it. I'm working on the IOF issue ATM; will return to mpi_paffinity_alone in a bit... On Jul 10, 2008, at 1:56 PM, George Bosilca wrote: I'm 100% with Brad on this. Please go ahead and include this feature in the 1.3. george. On Jul 10, 2008, at 11:33 AM, Brad Benton wrote: I think this is very reasonable to go ahead and include for 1.3. I find that preferable to a 1.3-specific "wonky" workaround. Plus, this sounds like something that is very good to have in general. --brad On Wed, Jul 9, 2008 at 8:49 PM, Jeff Squyres wrote: v1.3 RMs: Due to some recent work, the MCA parameter mpi_paffinity_alone disappeared -- it was moved and renamed to be opal_paffinity_alone. This is Bad because we have a lot of historical precent based on the MCA param name "mpi_paffinity_alone" (FAQ, PPT presentations, e-mails on public lists, etc.). So it needed to be restored for v1.3. I just noticed that I hadn't opened a ticket on this -- sorry -- I opened #1383 tonight. For a variety of reasons described in the commit message r1383, Lenny and I first decided that it would be best to fix this problem by the functionality committed in r18770 (have the ability to find out where an MCA parameter was set). This would allow us to register two MCA params: mpi_paffinity_alone and opal_paffinity_alone, and generally do the Right Thing (because we could then tell if a user had set a value or whether it was a default MCA param value). This functionality will also be useful in the openib BTL, where there is a blend of MCA parameters and INI file parameters. However, after doing that, it seemed like only a few more steps to implement an overall better solution: implement "synonyms" for MCA parameters. I.e., register the name "mpi_paffinity_alone" as a synonym for opal_paffinity_alone. Along the way, it was trivial to add a "deprecated" flag for MCA parameters that we no longer want to use anymore (this deprecated flag is also useful in the OB1 PML and openib BTL). So to fix a problem that needed to be fixed for v1.3 (restore the MCA parameter "mpi_paffinity_alone"), I ended up implementing new functionality. Can this go into v1.3, or do we need to implement some kind of alternate fix? (I admit to not having thought through what it would take to fix without the new MCA parameter functionality -- it might be kinda wonky) -- Jeff Squyres Cisco Systems ___ devel mailing list
Re: [OMPI devel] PLM consistency: launch agent param
Sounds good to me. We've done similar things in other frameworks -- put in MCA base params for things that all components could use. How about plm_base_launch_agent? On Jul 11, 2008, at 10:17 AM, Ralph H Castain wrote: Since the question of backward compatibility of params came up... ;-) I've been perusing the various PLM modules to check consistency. One thing I noted right away is that -every- PLM module registers an MCA param to let the user specify an orted cmd. I believe this specifically was done so people could insert their favorite debugger in front of the "orted" on the spawned command line - e.g., "valgrind orted". The problem is that this forces the user to have to figure out the name of the PLM module being used as the param is called "-mca plm_rsh_agent", or "-mca plm_lsf_orted", or...you name it. For users that only ever operate in one environment, who cares. However, many users (at least around here) operate in multiple environments, and this creates confusion. I propose to create a single MCA param name for this value - something like "-mca plm_launch_agent" or whatever - and get rid of all these individual registrations to reduce the user confusion. Comments? I'll put my helmet on Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] PLM consistency: launch agent param
For something as fundamental as launch do we still need to specify the component, could it just be "launch_agent"? Jeff Squyres wrote: Sounds good to me. We've done similar things in other frameworks -- put in MCA base params for things that all components could use. How about plm_base_launch_agent? On Jul 11, 2008, at 10:17 AM, Ralph H Castain wrote: Since the question of backward compatibility of params came up... ;-) I've been perusing the various PLM modules to check consistency. One thing I noted right away is that -every- PLM module registers an MCA param to let the user specify an orted cmd. I believe this specifically was done so people could insert their favorite debugger in front of the "orted" on the spawned command line - e.g., "valgrind orted". The problem is that this forces the user to have to figure out the name of the PLM module being used as the param is called "-mca plm_rsh_agent", or "-mca plm_lsf_orted", or...you name it. For users that only ever operate in one environment, who cares. However, many users (at least around here) operate in multiple environments, and this creates confusion. I propose to create a single MCA param name for this value - something like "-mca plm_launch_agent" or whatever - and get rid of all these individual registrations to reduce the user confusion. Comments? I'll put my helmet on Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] PLM consistency: launch agent param
I suppose we could even just make it an mpirun cmd line param, at that point. As an MCA param, though, we have typically insisted on a particular syntax that includes framework and component... On 7/11/08 8:41 AM, "Don Kerr" wrote: > For something as fundamental as launch do we still need to specify the > component, could it just be "launch_agent"? > > Jeff Squyres wrote: >> Sounds good to me. We've done similar things in other frameworks -- >> put in MCA base params for things that all components could use. How >> about plm_base_launch_agent? >> >> >> On Jul 11, 2008, at 10:17 AM, Ralph H Castain wrote: >> >>> Since the question of backward compatibility of params came up... ;-) >>> >>> I've been perusing the various PLM modules to check consistency. One >>> thing I >>> noted right away is that -every- PLM module registers an MCA param to >>> let >>> the user specify an orted cmd. I believe this specifically was done so >>> people could insert their favorite debugger in front of the "orted" >>> on the >>> spawned command line - e.g., "valgrind orted". >>> >>> The problem is that this forces the user to have to figure out the >>> name of >>> the PLM module being used as the param is called "-mca >>> plm_rsh_agent", or >>> "-mca plm_lsf_orted", or...you name it. >>> >>> For users that only ever operate in one environment, who cares. However, >>> many users (at least around here) operate in multiple environments, >>> and this >>> creates confusion. >>> >>> I propose to create a single MCA param name for this value - >>> something like >>> "-mca plm_launch_agent" or whatever - and get rid of all these >>> individual >>> registrations to reduce the user confusion. >>> >>> Comments? I'll put my helmet on >>> Ralph >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] PLM consistency: priority
Okay, another fun one. Some of the PLM modules use MCA params to adjust their relative selection priority. This can lead to very unexpected behavior as which module gets selected will depend on the priorities of the other selectable modules - which changes from release to release as people independently make adjustments and/or new modules are added. Fortunately, this doesn't bite us too often since many environments only support one module, and since there is nothing to tell the user that the plm module whose priority they raised actually -didn't- get used! However, in the interest of "least astonishment", some of us working on the RTE had changed our coding approach to avoid this confusion. Given that we have this nice mca component select logic that takes the specified module - i.e., "-mca plm foo" always yields foo if it can run, or errors out if it can't - then the safest course is to remove MCA params that adjust module priorities and have the user simply tell us which module they want us to use. Do we want to make this consistent, at least in the PLM? Or do you want to leave the user guessing? :-) Ralph
Re: [OMPI devel] PLM consistency: priority
We don't want the user to have to select by hand the best PML. The logic inside the current selection process selects the best pml for the underlying network. However changing the priority is pretty meaningless from the user's point of view. So while retaining the selection process including priorities, we might want to remove the priority parameter, and use only the pml=ob1,cm syntax from the user's point of view. Aurelien Le 11 juil. 08 à 10:56, Ralph H Castain a écrit : Okay, another fun one. Some of the PLM modules use MCA params to adjust their relative selection priority. This can lead to very unexpected behavior as which module gets selected will depend on the priorities of the other selectable modules - which changes from release to release as people independently make adjustments and/or new modules are added. Fortunately, this doesn't bite us too often since many environments only support one module, and since there is nothing to tell the user that the plm module whose priority they raised actually -didn't- get used! However, in the interest of "least astonishment", some of us working on the RTE had changed our coding approach to avoid this confusion. Given that we have this nice mca component select logic that takes the specified module - i.e., "-mca plm foo" always yields foo if it can run, or errors out if it can't - then the safest course is to remove MCA params that adjust module priorities and have the user simply tell us which module they want us to use. Do we want to make this consistent, at least in the PLM? Or do you want to leave the user guessing? :-) Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] PLM consistency: priority
Ummm...I actually was talking about the "PLM", not the "PML". But I believe what you suggest concurs with what I said. In the PLM, you could still provide multiple components you want considered, though it has less meaning there. My suggestion really is only that we eliminate the params to adjust relative priority as they are just confusing the user and potentially misleading them as to what is going to happen. Ralph On 7/11/08 9:07 AM, "Aurélien Bouteiller" wrote: > We don't want the user to have to select by hand the best PML. The > logic inside the current selection process selects the best pml for > the underlying network. However changing the priority is pretty > meaningless from the user's point of view. So while retaining the > selection process including priorities, we might want to remove the > priority parameter, and use only the pml=ob1,cm syntax from the user's > point of view. > > Aurelien > > Le 11 juil. 08 à 10:56, Ralph H Castain a écrit : > >> Okay, another fun one. Some of the PLM modules use MCA params to >> adjust >> their relative selection priority. This can lead to very unexpected >> behavior >> as which module gets selected will depend on the priorities of the >> other >> selectable modules - which changes from release to release as people >> independently make adjustments and/or new modules are added. >> >> Fortunately, this doesn't bite us too often since many environments >> only >> support one module, and since there is nothing to tell the user that >> the plm >> module whose priority they raised actually -didn't- get used! >> >> However, in the interest of "least astonishment", some of us working >> on the >> RTE had changed our coding approach to avoid this confusion. Given >> that we >> have this nice mca component select logic that takes the specified >> module - >> i.e., "-mca plm foo" always yields foo if it can run, or errors out >> if it >> can't - then the safest course is to remove MCA params that adjust >> module >> priorities and have the user simply tell us which module they want >> us to >> use. >> >> Do we want to make this consistent, at least in the PLM? Or do you >> want to >> leave the user guessing? :-) >> >> Ralph >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] SM latency regression
Has anyone else seen the trunk incur approximately a 10% increase in latency? I think this has happened in the last couple weeks. I have verified that it isn't due to the recheck put into the sm_component_progress. I am about ready to try and track this down but wanted to throw this out there just in case someone is ahead of me. --td
[OMPI devel] User request: add envar?
Yo folks For those not following the user list, this request was generated today: Absolutely, these are useful time and time again so should be part of the API and hence stable. Care to mention what they are and I'll add it to my note as something to change when upgrading to 1.3 (we are looking at testing a snapshot in the near future). >>> >>> Surely: >>> >>> OMPI_COMM_WORLD_SIZE#procs in the job >>> OMPI_COMM_WORLD_LOCAL_SIZE #procs in this job that are sharing the node >>> OMPI_UNIVERSE_SIZE total #slots allocated to this user >>> (across all nodes) >>> OMPI_COMM_WORLD_RANKproc's rank >>> OMPI_COMM_WORLD_LOCAL_RANK local rank on node - lowest rank'd proc on >>> the node is given local_rank=0 >>> >>> If there are others that would be useful, now is definitely the time to >>> speak up! >> >> The only other one I'd like to see is some kind of global identifier for >> the job but as far as I can see I don't believe that openmpi has such a >> concept. > > Not really - of course, many environments have a jobid they assign at time > of allocation. We could create a unified identifier from that to ensure a > consistent name was always available, but the problem would be that not all > environments provide it (e.g., rsh). To guarantee that the variable would > always be there, we would have to make something up in those cases. I could easily create such an envar, even for non-managed environments. The plan would be to use the RM-provided jobid where one was available, and to use the mpirun jobid where not. My thought was to call it OMPI_JOB_ID, unless someone has another suggestion. Any objection to my doing so, and/or suggestions on alternative implementations? Ralph