Re: [OMPI devel] [Fwd: LAM: undefined reference to `mpi_bcast__']
I think the user meant that PelicanHPC lacked clear OMPI-specifc documentation. -Paul Eugene Loh wrote: I guess a bunch of you already saw this on the lam mail alias. The part that caught my eye was a user choosing LAM over OMPI due to lack of "clear documentation" for OMPI. Original Message Subject:LAM: undefined reference to `mpi_bcast__' Date: Thu, 28 May 2009 08:32:46 -0700 (PDT) From: Silviu Groza Reply-To: General LAM/MPI mailing list To: l...@lam-mpi.org Hello, I am trying to install a qauntum chemistry program (Dalton) with LAM-MPI under PelicanHPC. PelicanHPC has both LAM-MPI as well as OpenMPI. I have chosen LAM-MPI due to lack of clear documentation of OpenMPI, and because LAM-MPI environment is the default on PelicanHPC. So, I have the following outputs: user@pelican:~$ mpif77 -c foo.c user@pelican:~$ mpif77 -show gfortran -I/usr/lib/lam/include -pthread -L/usr/lib/lam/lib -llammpio -llamf77mpi -lmpi -llam -lutil -ldl user@pelican:~$ mpicc -show gcc -I/usr/lib/lam/include -pthread -L/usr/lib/lam/lib -llammpio -llamf77mpi -lmpi -llam -lutil -ldl Therefore, my Makefile.config is: ARCH= linux # # CPPFLAGS = -DVAR_G77 -DSYS_LINUX -DVAR_MFDS -D'INSTALL_WRKMEM=1' -D'INSTALL_BASDIR="/mnt/sda8/home/dan/Daltonsubpelican/dalton-2.0/basis/"' -DVAR_MPI -DIMPLICIT_NONE F77 = mpif77 CC= mpicc RM= rm -f FFLAGS= -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -fno-range-check -fsecond-underscore SAFEFFLAGS= -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -fno-range-check -fsecond-underscore CFLAGS= -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -std=c99 -DRESTRICT=restrict INCLUDES = -I../include LIBS = -L/usr/lib -llapack -lblas INSTALLDIR= /mnt/sda8/home/dalton-2.0/bin PDPACK_EXTRAS = linpack.o eispack.o GP_EXTRAS = AR= ar ARFLAGS = rvs # flags for ftnchek on Dalton /hjaaj CHEKFLAGS = -nopure -nopretty -nocommon -nousage -noarray -notruncation -quiet -noargumants -arguments=number -usage=var-unitialized # -usage=var-unitialized:arg-const-modified:arg-alias # -usage=var-unitialized:var-set-unused:arg-unused:arg-const-modified:arg-alias # default : linuxparallel.x # # Parallel initialization # MPI_INCLUDE_DIR = -I/usr/lib/lam/include MPI_LIB_PATH= -L/usr/lib/lam/lib MPI_LIB = -lmpi # # # Suffix rules # hjaaj Oct 04: .g is a "cheat" suffix, for debugging. # 'make x.g' will create x.o from x.F or x.c with -g debug flag set. # .SUFFIXES : .F .o .c .i .g .F.o: $(F77) $(INCLUDES) $(CPPFLAGS) $(FFLAGS) -c $*.F .F.g: $(F77) $(INCLUDES) $(CPPFLAGS) $(FFLAGS) -g -c $*.F .c.o: $(CC) $(INCLUDES) $(CPPFLAGS) $(CFLAGS) -c $*.c .c.g: $(CC) $(INCLUDES) $(CPPFLAGS) $(CFLAGS) -g -c $*.c .F.i: $(F77) $(INCLUDES) $(CPPFLAGS) -E $*.F > $*.i and the errors are: ---> Linking parallel dalpar.x ... mpif77 -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -fno-range-check -fsecond-underscore \ -o /mnt/sda8/home/dalton-2.0/bin/dalpar.x abacus/dalton.o cc/crayio.o abacus/linux_mem_allo.o \ abacus/herpar.o eri/eri2par.o amfi/amfi.o amfi/symtra.o -Labacus -labacus -Lrsp -lrsp -Lsirius -lsirius -labacus -Leri -leri -Ldensfit -ldensfit -Lcc -lcc -Ldft -ldft -Lgp -lgp -Lpdpack -lpdpack -L/usr/lib -llapack -lblas \ -L/usr/lib/lam/lib -lmpi abacus/dalton.o: In function `getmmbas_': dalton.F:(.text+0x379): undefined reference to `mpi_bcast__' abacus/dalton.o: In function `MAIN__': dalton.F:(.text+0x739): undefined reference to `mpi_bcast__' abacus/libabacus.a(dalgnr.o): In function `parion_': dalgnr.F:(.text+0x223): undefined reference to `mpi_bcast__' dalgnr.F:(.text+0x3ea): undefined reference to `mpi_bcast__' dalgnr.F:(.text+0x438): undefined reference to `mpi_bcast__' abacus/libabacus.a(dalgnr.o):dalgnr.F:(.text+0x686): more undefined references to `mpi_bcast__' follow dft/libdft.a(dft_ksm.o): In function `ksmcollect_': dft_ksm.F:(.text+0x8c): undefined reference to `mpi_reduce__' dft_ksm.F:(.text+0xd7): undefined reference to `mpi_reduce__' dft/libdft.a(dft_ksm.o): In function `ksmsync_': dft_ksm.F:(.text+0x12c): undefined reference to `mpi_bcast__' dft/libdft.a(dft_ksm.o): In function `kick_ksm_slaves_alive__': dft_ksm.F:(.text+0x27d): undefined reference to `mpi_bcast__' dft_ksm.F:(.text+0x29f): undefined reference to `mpi_bcast__' dft/libdft.a(dft_mag.o): In function `dft_suscep_collect__': dft_mag.F:(.text+0x21b0): undefined reference to `mpi_reduce__' dft/libdft.a(dft_mag.o): In function `kick_slaves_suscep__': dft_mag.F:(.text+0x231d): undefined reference to `mpi_bcast__' dft_mag.F:(.text+0x233f): undefined reference to `mpi_bcast__' dft/libdft.a(dft_mag.o): In function `dft
[OMPI devel] [Fwd: LAM: undefined reference to `mpi_bcast__']
I guess a bunch of you already saw this on the lam mail alias. The part that caught my eye was a user choosing LAM over OMPI due to lack of "clear documentation" for OMPI. Original Message Subject: LAM: undefined reference to `mpi_bcast__' Date: Thu, 28 May 2009 08:32:46 -0700 (PDT) From: Silviu Groza Reply-To: General LAM/MPI mailing list To: l...@lam-mpi.org Hello, I am trying to install a qauntum chemistry program (Dalton) with LAM-MPI under PelicanHPC. PelicanHPC has both LAM-MPI as well as OpenMPI. I have chosen LAM-MPI due to lack of clear documentation of OpenMPI, and because LAM-MPI environment is the default on PelicanHPC. So, I have the following outputs: user@pelican:~$ mpif77 -c foo.c user@pelican:~$ mpif77 -show gfortran -I/usr/lib/lam/include -pthread -L/usr/lib/lam/lib -llammpio -llamf77mpi -lmpi -llam -lutil -ldl user@pelican:~$ mpicc -show gcc -I/usr/lib/lam/include -pthread -L/usr/lib/lam/lib -llammpio -llamf77mpi -lmpi -llam -lutil -ldl Therefore, my Makefile.config is: ARCH= linux # # CPPFLAGS = -DVAR_G77 -DSYS_LINUX -DVAR_MFDS -D'INSTALL_WRKMEM=1' -D'INSTALL_BASDIR="/mnt/sda8/home/dan/Daltonsubpelican/dalton-2.0/basis/"' -DVAR_MPI -DIMPLICIT_NONE F77 = mpif77 CC= mpicc RM= rm -f FFLAGS= -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -fno-range-check -fsecond-underscore SAFEFFLAGS= -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -fno-range-check -fsecond-underscore CFLAGS= -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -std=c99 -DRESTRICT=restrict INCLUDES = -I../include LIBS = -L/usr/lib -llapack -lblas INSTALLDIR= /mnt/sda8/home/dalton-2.0/bin PDPACK_EXTRAS = linpack.o eispack.o GP_EXTRAS = AR= ar ARFLAGS = rvs # flags for ftnchek on Dalton /hjaaj CHEKFLAGS = -nopure -nopretty -nocommon -nousage -noarray -notruncation -quiet -noargumants -arguments=number -usage=var-unitialized # -usage=var-unitialized:arg-const-modified:arg-alias # -usage=var-unitialized:var-set-unused:arg-unused:arg-const-modified:arg-alias # default : linuxparallel.x # # Parallel initialization # MPI_INCLUDE_DIR = -I/usr/lib/lam/include MPI_LIB_PATH= -L/usr/lib/lam/lib MPI_LIB = -lmpi # # # Suffix rules # hjaaj Oct 04: .g is a "cheat" suffix, for debugging. # 'make x.g' will create x.o from x.F or x.c with -g debug flag set. # .SUFFIXES : .F .o .c .i .g .F.o: $(F77) $(INCLUDES) $(CPPFLAGS) $(FFLAGS) -c $*.F .F.g: $(F77) $(INCLUDES) $(CPPFLAGS) $(FFLAGS) -g -c $*.F .c.o: $(CC) $(INCLUDES) $(CPPFLAGS) $(CFLAGS) -c $*.c .c.g: $(CC) $(INCLUDES) $(CPPFLAGS) $(CFLAGS) -g -c $*.c .F.i: $(F77) $(INCLUDES) $(CPPFLAGS) -E $*.F > $*.i and the errors are: ---> Linking parallel dalpar.x ... mpif77 -march=x86-64 -O3 -ffast-math -fexpensive-optimizations -funroll-loops -fno-range-check -fsecond-underscore \ -o /mnt/sda8/home/dalton-2.0/bin/dalpar.x abacus/dalton.o cc/crayio.o abacus/linux_mem_allo.o \ abacus/herpar.o eri/eri2par.o amfi/amfi.o amfi/symtra.o -Labacus -labacus -Lrsp -lrsp -Lsirius -lsirius -labacus -Leri -leri -Ldensfit -ldensfit -Lcc -lcc -Ldft -ldft -Lgp -lgp -Lpdpack -lpdpack -L/usr/lib -llapack -lblas \ -L/usr/lib/lam/lib -lmpi abacus/dalton.o: In function `getmmbas_': dalton.F:(.text+0x379): undefined reference to `mpi_bcast__' abacus/dalton.o: In function `MAIN__': dalton.F:(.text+0x739): undefined reference to `mpi_bcast__' abacus/libabacus.a(dalgnr.o): In function `parion_': dalgnr.F:(.text+0x223): undefined reference to `mpi_bcast__' dalgnr.F:(.text+0x3ea): undefined reference to `mpi_bcast__' dalgnr.F:(.text+0x438): undefined reference to `mpi_bcast__' abacus/libabacus.a(dalgnr.o):dalgnr.F:(.text+0x686): more undefined references to `mpi_bcast__' follow dft/libdft.a(dft_ksm.o): In function `ksmcollect_': dft_ksm.F:(.text+0x8c): undefined reference to `mpi_reduce__' dft_ksm.F:(.text+0xd7): undefined reference to `mpi_reduce__' dft/libdft.a(dft_ksm.o): In function `ksmsync_': dft_ksm.F:(.text+0x12c): undefined reference to `mpi_bcast__' dft/libdft.a(dft_ksm.o): In function `kick_ksm_slaves_alive__': dft_ksm.F:(.text+0x27d): undefined reference to `mpi_bcast__' dft_ksm.F:(.text+0x29f): undefined reference to `mpi_bcast__' dft/libdft.a(dft_mag.o): In function `dft_suscep_collect__': dft_mag.F:(.text+0x21b0): undefined reference to `mpi_reduce__' dft/libdft.a(dft_mag.o): In function `kick_slaves_suscep__': dft_mag.F:(.text+0x231d): undefined reference to `mpi_bcast__' dft_mag.F:(.text+0x233f): undefined reference to `mpi_bcast__' dft/libdft.a(dft_mag.o): In function `dft_brhs_colle
Re: [OMPI devel] problem in the ORTE notifier framework
On May 28, 2009, at 8:48 AM, Jeff Squyres (jsquyres) wrote: Yes, the opal-sos branch has a variant of this as well. One thing I didn't mention: the opal-sos hg tree is unfortunately unrelated from the main ompi-svn-mirror, so you can't just push/pull between them. :-( Most of us OMPI developers make specific SVN+Mercurial branches (see https://svn.open-mpi.org/trac/ompi/wiki/UsingMercurial) rather than branch from the ompi-svn-mirror so that we can more easily eventually commit back to the SVN trunk. Eventually, Open MPI will be moving to 100% Mercurial and this issue will go away (current status is that Indiana U. is working on revamping our hosting infrastructure to support Mercurial). -- Jeff Squyres Cisco Systems
Re: [OMPI devel] problem in the ORTE notifier framework
On May 28, 2009, at 2:55 AM, Nadia Derbey wrote: Well, it didn't because from what I understood, the MPI program need to be changed (register a callback routine for the event, activate the event, etc), and this is something we wanted to avoid. Combined with what Terry and Ralph already said, I just wanted to make sure this point is crystal clear: what we're proposing is that you use peruse *internally* -- there's no need to change MPI applications. Now, if we are allowed to 1. define new "internal" PERUSE events, 2. internally set the associated callback routines Peruse was designed to be extensible, I believe. So adding new events into its infrastructure may not be too terrible (I didn't work on the peruse stuff; George/Rainer would have to comment on that). The bigger issue is adding hooks to call those peruse events in the main progression engines. Adding them to error paths (or already-slow paths) might not be too bad. But I'm sure that many of us would scrutinize such changes closely -- as previously stated, we don't want to negatively impact performance for those who will not be using this functionality. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] problem in the ORTE notifier framework
On May 28, 2009, at 7:53 AM, Ralph Castain wrote: The code in 1.3 is definitely different from the trunk as it lags quite a bit behind. However, the trunk definitely does include the code I referenced. Not sure why the hg mirror wouldn't have it. I would have to defer to Jeff on that question - could be a bug in the update macro that maintains the mirror? FWIW: I see the code right here: http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror/file/tip/orte/mca/notifier/base/notifier_base_select.c#l72 I haven't checked the opal_sos branch to see if it has the code in it, but I would have thought those guys were tracking the trunk that closely - that code was committed in r19209. Yes, the opal-sos branch has a variant of this as well. Note that we changed the notifier framework in the opal-sos branch to be many-of- many, not one-of-many. Specifically: the trunk will select the *one* available notifier with the highest priority. The opal-sos branch will select *all* available notifiers and then subsequently invoke them in priority order. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] problem in the ORTE notifier framework
On Thu, 2009-05-28 at 05:57 -0600, Ralph Castain wrote: > I agree with Terry here about being careful in pursuing this path. > What I wouldn't want to have happen is to force anyone wanting to be > notified of error events to have to also turn on peruse, which impacts > the non-error code path. Agreed, I missed that part! Regards, Nadia > > Again, I'm not entirely sure what you are trying to do here. As I > understood the original RFC, it sounded like you wanted to track > errors but only report them when they occurred a controlled number of > times (as opposed to every time). I think this would better be done > outside of peruse. > > If you are trying to track normal performance (e.g., trying to alert > sys admins when networks aren't running as fast as they should), then > that probably should be done inside of peruse. However, that > definitely will impact the critical code path, so Terry's caution is > definitely a concern. > > > On Thu, May 28, 2009 at 12:55 AM, Nadia Derbey > wrote: > On Wed, 2009-05-27 at 14:25 -0400, Jeff Squyres wrote: > > Excellent points; Ralph and I chatted about this on the > phone today -- > > we concur with George. > > > > Bull -- would peruse work for you? I think you mentioned > before that > > it didn't seem attractive to you. > > > Well, it didn't because from what I understood, the MPI > program need to > be changed (register a callback routine for the event, > activate the > event, etc), and this is something we wanted to avoid. > > Now, if we are allowed to > 1. define new "internal" PERUSE events, > 2. internally set the associated callback routines > why not using peruse? This combined with the orte notifier > framework, > could do the job I think. > > Regards, > Nadia > > > > I think George's point is that we > > already have lots of hooks in place in the PML -- and > they're called > > peruse. So if we could use those hooks, then a) they're > run-time > > selectable already, and b) there's no additional cost in > performance > > critical/not-critical code paths (for the case where these > stats are > > not being collected) because PERUSE has been in the code > base for a > > long time. > > > > I think the idea is that your callbacks could be invoked by > the peruse > > hooks and then they can do whatever they want -- increment > counters, > > conditionally invoke the ORTE notifier system, etc. > > > > > > > > On May 27, 2009, at 11:34 AM, George Bosilca wrote: > > > > > What is a generic threshold? And what is a counter? We > have a policy > > > against such coding standards, and to be honest I would > like to stick > > > to it. The reason is that the PML is a very complex piece > of code, and > > > I would like to keep it as easy to understand as possible. > If people > > > start adding #if/#endif all over the code, we diverging > from this > > > goal. > > > > > > The only way to make this work is to call the notifier or > some other > > > framework in this "slow path" and let this other framework > do it's own > > > logic to determine what and when to print. Of course the > cost of this > > > is a function call plus an atomic operation (which is > already not > > > cheap). It's starting to get expensive, even for a "slow > path", which > > > in this particular context is just one insertion in an > atomic FIFO. > > > > > > If instead of counting in number of times we try to send > the fragment, > > > and switch to a time base approach, this can be solved > with the PERUSE > > > calls. There is a callback when the request is created, > and another > > > callback when the first fragment is pushed successfully > into the > > > network. Computing the time between these two, allow a > tool to figure > > > out how much time the request was waiting in some internal > queues, and > > > therefore how much delay this added to the execution time. > > > > > >george. > > > > > > On May 27, 2009, at 06:59 , Ralph Castain wrote: > > > > > > > ORTE_NOTIFIER_VERBOSE(api, counter, threshold,...) > > > > > > > > #if WANT_NOTIFIER_VERBOSE > > > > opal_atomic_increment(counter); > > > > if (counter > threshold) { > > > > orte_notifier.api(.) > > > > } > > > > #en
Re: [OMPI devel] problem in the ORTE notifier framework
I agree with Terry here about being careful in pursuing this path. What I wouldn't want to have happen is to force anyone wanting to be notified of error events to have to also turn on peruse, which impacts the non-error code path. Again, I'm not entirely sure what you are trying to do here. As I understood the original RFC, it sounded like you wanted to track errors but only report them when they occurred a controlled number of times (as opposed to every time). I think this would better be done outside of peruse. If you are trying to track normal performance (e.g., trying to alert sys admins when networks aren't running as fast as they should), then that probably should be done inside of peruse. However, that definitely will impact the critical code path, so Terry's caution is definitely a concern. On Thu, May 28, 2009 at 12:55 AM, Nadia Derbey wrote: > On Wed, 2009-05-27 at 14:25 -0400, Jeff Squyres wrote: > > Excellent points; Ralph and I chatted about this on the phone today -- > > we concur with George. > > > > Bull -- would peruse work for you? I think you mentioned before that > > it didn't seem attractive to you. > > Well, it didn't because from what I understood, the MPI program need to > be changed (register a callback routine for the event, activate the > event, etc), and this is something we wanted to avoid. > > Now, if we are allowed to > 1. define new "internal" PERUSE events, > 2. internally set the associated callback routines > why not using peruse? This combined with the orte notifier framework, > could do the job I think. > > Regards, > Nadia > > > I think George's point is that we > > already have lots of hooks in place in the PML -- and they're called > > peruse. So if we could use those hooks, then a) they're run-time > > selectable already, and b) there's no additional cost in performance > > critical/not-critical code paths (for the case where these stats are > > not being collected) because PERUSE has been in the code base for a > > long time. > > > > I think the idea is that your callbacks could be invoked by the peruse > > hooks and then they can do whatever they want -- increment counters, > > conditionally invoke the ORTE notifier system, etc. > > > > > > > > On May 27, 2009, at 11:34 AM, George Bosilca wrote: > > > > > What is a generic threshold? And what is a counter? We have a policy > > > against such coding standards, and to be honest I would like to stick > > > to it. The reason is that the PML is a very complex piece of code, and > > > I would like to keep it as easy to understand as possible. If people > > > start adding #if/#endif all over the code, we diverging from this > > > goal. > > > > > > The only way to make this work is to call the notifier or some other > > > framework in this "slow path" and let this other framework do it's own > > > logic to determine what and when to print. Of course the cost of this > > > is a function call plus an atomic operation (which is already not > > > cheap). It's starting to get expensive, even for a "slow path", which > > > in this particular context is just one insertion in an atomic FIFO. > > > > > > If instead of counting in number of times we try to send the fragment, > > > and switch to a time base approach, this can be solved with the PERUSE > > > calls. There is a callback when the request is created, and another > > > callback when the first fragment is pushed successfully into the > > > network. Computing the time between these two, allow a tool to figure > > > out how much time the request was waiting in some internal queues, and > > > therefore how much delay this added to the execution time. > > > > > >george. > > > > > > On May 27, 2009, at 06:59 , Ralph Castain wrote: > > > > > > > ORTE_NOTIFIER_VERBOSE(api, counter, threshold,...) > > > > > > > > #if WANT_NOTIFIER_VERBOSE > > > > opal_atomic_increment(counter); > > > > if (counter > threshold) { > > > > orte_notifier.api(.) > > > > } > > > > #endif > > > > > > ___ > > > devel mailing list > > > de...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > > > -- > Nadia Derbey > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] problem in the ORTE notifier framework
The code in 1.3 is definitely different from the trunk as it lags quite a bit behind. However, the trunk definitely does include the code I referenced. Not sure why the hg mirror wouldn't have it. I would have to defer to Jeff on that question - could be a bug in the update macro that maintains the mirror? I haven't checked the opal_sos branch to see if it has the code in it, but I would have thought those guys were tracking the trunk that closely - that code was committed in r19209. Ralph On Thu, May 28, 2009 at 1:45 AM, Sylvain Jeaugey wrote: > To be more complete, we pull Hg from > http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror/ ; are we mistaken > ? > > If not, the code in v1.3 seems to be different from the code in the trunk > ... > > Sylvain > > > On Thu, 28 May 2009, Nadia Derbey wrote: > > On Tue, 2009-05-26 at 17:24 -0600, Ralph Castain wrote: >> >>> First, to answer Nadia's question: you will find that the init >>> function for the module is already called when it is selected - see >>> the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the >>> trunk. >>> >> >> Strange? Our repository is a clone of the trunk? >> >>> >>> It's true that if I "hg update" to v1.3 I see that the fix is there. >> >> Regards, >> Nadia >> >> It would be a good idea to tie into the sos work to avoid conflicts >>> when it all gets merged back together, assuming that isn't a big >>> problem for you. >>> >>> As for Jeff's suggestion: dealing with the performance hit problem is >>> why I suggested ORTE_NOTIFIER_VERBOSE, modeled after the >>> OPAL_OUTPUT_VERBOSE model. The idea was to compile it in -only- when >>> the system is built for it - maybe using a --with-notifier-verbose >>> configuration option. Frankly, some organizations would happily pay a >>> small performance penalty for the benefits. >>> >>> I would personally recommend that the notifier framework keep the >>> stats so things can be compact and self-contained. We still get >>> atomicity by allowing each framework/component/whatever specify the >>> threshold. Creating yet another system to do nothing more than track >>> error/warning frequencies to decide whether or not to notify seems >>> wasteful. >>> >>> Perhaps worth a phone call to decide path forward? >>> >>> >>> On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres >>> wrote: >>>Nadia -- >>> >>>Sorry I didn't get to jump in on the other thread earlier. >>> >>>We have made considerable changes to the notifier framework in >>>a branch to better support "SOS" functionality: >>> >>> >>> https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos >>> >>>Cisco and Indiana U. have been working on this branch for a >>>while. A description of the SOS stuff is here: >>> >>> https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages >>> >>>As for setting up an external web server with hg, don't bother >>>-- just get an account at bitbucket.org. They're free and >>>allow you to host hg repositories there. I've used bitbucket >>>to collaborate on code before it hits OMPI's SVN trunk with >>>both internal and external OMPI developers. >>> >>>We can certainly move the opal-sos repo to bitbucket (or >>>branch again off opal-sos to bitbucket -- whatever makes more >>>sense) to facilitate collaborating with you. >>> >>>Back on topic... >>> >>>I'd actually suggest a combination of what has been discussed >>>in the other thread. The notifier can be the mechanism that >>>actually sends the output message, but it doesn't have to be >>>the mechanism that tracks the stats and decides when to output >>>a message. That can be separate logic, and therefore be more >>>fine-grained (and potentially even specific to the MPI layer). >>> >>>The Big Question will how to do this with zero performance >>>impact when it is not being used. This has always been the >>>difficult issue when trying to implement any kind of >>>monitoring inside the core OMPI performance-sensitive paths. >>> Even adding individual branches has met with resistance (in >>>performance-critical code paths)... >>> >>> >>> >>> >>> >>>On May 26, 2009, at 10:59 AM, Nadia Derbey wrote: >>> >>> >>> >>>Hi, >>> >>>While having a look at the notifier framework under >>>orte, I noticed that >>>the way it is written, the init routine for the >>>selected module cannot >>>be called. >>> >>>Attached is a small patch that fixes this issue. >>> >>>Regards, >>>Nadia >>> >>> >>> >>> >>> >>>-- >>>Jeff Squyres >>>Cisco Systems >>> >>>___ >>>devel mailing list >>>de...@open-m
Re: [OMPI devel] Remove IMB 2.3 from ompi-tests?
On May 28, 2009, at 3:10 AM, Holger Mickler wrote: > We have a few one-sided tests in the ompi-test repository (which I think > Dresden has access to?), but I'm not 100% sure that they're correct... Yes, we do have access. We'll try the tests and see how far we can get :) Look in ompi-tests/trunk/onesided. Like I said, I won't vouch for the correctness of those tests. :-) -- Jeff Squyres Cisco Systems
Re: [OMPI devel] problem in the ORTE notifier framework
Nadia Derbey wrote: On Wed, 2009-05-27 at 14:25 -0400, Jeff Squyres wrote: Excellent points; Ralph and I chatted about this on the phone today -- we concur with George. Bull -- would peruse work for you? I think you mentioned before that it didn't seem attractive to you. Well, it didn't because from what I understood, the MPI program need to be changed (register a callback routine for the event, activate the event, etc), and this is something we wanted to avoid. Now, if we are allowed to 1. define new "internal" PERUSE events, 2. internally set the associated callback routines why not using peruse? This combined with the orte notifier framework, could do the job I think. FWIW, I did a prototype of some dtrace probes piggybacking on the PERUSE macros and letting those changes be enabled/disabled at configure time. One word of caution, if you start adding if statements to all the PERUSE macros you will more than likely end up significantly slowing down the performance. So be careful as keep an eye on the overhead as you add stuff to the macros. --td Regards, Nadia I think George's point is that we already have lots of hooks in place in the PML -- and they're called peruse. So if we could use those hooks, then a) they're run-time selectable already, and b) there's no additional cost in performance critical/not-critical code paths (for the case where these stats are not being collected) because PERUSE has been in the code base for a long time. I think the idea is that your callbacks could be invoked by the peruse hooks and then they can do whatever they want -- increment counters, conditionally invoke the ORTE notifier system, etc. On May 27, 2009, at 11:34 AM, George Bosilca wrote: What is a generic threshold? And what is a counter? We have a policy against such coding standards, and to be honest I would like to stick to it. The reason is that the PML is a very complex piece of code, and I would like to keep it as easy to understand as possible. If people start adding #if/#endif all over the code, we diverging from this goal. The only way to make this work is to call the notifier or some other framework in this "slow path" and let this other framework do it's own logic to determine what and when to print. Of course the cost of this is a function call plus an atomic operation (which is already not cheap). It's starting to get expensive, even for a "slow path", which in this particular context is just one insertion in an atomic FIFO. If instead of counting in number of times we try to send the fragment, and switch to a time base approach, this can be solved with the PERUSE calls. There is a callback when the request is created, and another callback when the first fragment is pushed successfully into the network. Computing the time between these two, allow a tool to figure out how much time the request was waiting in some internal queues, and therefore how much delay this added to the execution time. george. On May 27, 2009, at 06:59 , Ralph Castain wrote: ORTE_NOTIFIER_VERBOSE(api, counter, threshold,...) #if WANT_NOTIFIER_VERBOSE opal_atomic_increment(counter); if (counter > threshold) { orte_notifier.api(.) } #endif ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] problem in the ORTE notifier framework
To be more complete, we pull Hg from http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror/ ; are we mistaken ? If not, the code in v1.3 seems to be different from the code in the trunk ... Sylvain On Thu, 28 May 2009, Nadia Derbey wrote: On Tue, 2009-05-26 at 17:24 -0600, Ralph Castain wrote: First, to answer Nadia's question: you will find that the init function for the module is already called when it is selected - see the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the trunk. Strange? Our repository is a clone of the trunk? It's true that if I "hg update" to v1.3 I see that the fix is there. Regards, Nadia It would be a good idea to tie into the sos work to avoid conflicts when it all gets merged back together, assuming that isn't a big problem for you. As for Jeff's suggestion: dealing with the performance hit problem is why I suggested ORTE_NOTIFIER_VERBOSE, modeled after the OPAL_OUTPUT_VERBOSE model. The idea was to compile it in -only- when the system is built for it - maybe using a --with-notifier-verbose configuration option. Frankly, some organizations would happily pay a small performance penalty for the benefits. I would personally recommend that the notifier framework keep the stats so things can be compact and self-contained. We still get atomicity by allowing each framework/component/whatever specify the threshold. Creating yet another system to do nothing more than track error/warning frequencies to decide whether or not to notify seems wasteful. Perhaps worth a phone call to decide path forward? On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres wrote: Nadia -- Sorry I didn't get to jump in on the other thread earlier. We have made considerable changes to the notifier framework in a branch to better support "SOS" functionality: https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos Cisco and Indiana U. have been working on this branch for a while. A description of the SOS stuff is here: https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages As for setting up an external web server with hg, don't bother -- just get an account at bitbucket.org. They're free and allow you to host hg repositories there. I've used bitbucket to collaborate on code before it hits OMPI's SVN trunk with both internal and external OMPI developers. We can certainly move the opal-sos repo to bitbucket (or branch again off opal-sos to bitbucket -- whatever makes more sense) to facilitate collaborating with you. Back on topic... I'd actually suggest a combination of what has been discussed in the other thread. The notifier can be the mechanism that actually sends the output message, but it doesn't have to be the mechanism that tracks the stats and decides when to output a message. That can be separate logic, and therefore be more fine-grained (and potentially even specific to the MPI layer). The Big Question will how to do this with zero performance impact when it is not being used. This has always been the difficult issue when trying to implement any kind of monitoring inside the core OMPI performance-sensitive paths. Even adding individual branches has met with resistance (in performance-critical code paths)... On May 26, 2009, at 10:59 AM, Nadia Derbey wrote: Hi, While having a look at the notifier framework under orte, I noticed that the way it is written, the init routine for the selected module cannot be called. Attached is a small patch that fixes this issue. Regards, Nadia -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Nadia Derbey ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] problem in the ORTE notifier framework
On Tue, 2009-05-26 at 17:24 -0600, Ralph Castain wrote: > First, to answer Nadia's question: you will find that the init > function for the module is already called when it is selected - see > the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the > trunk. Strange? Our repository is a clone of the trunk? > It's true that if I "hg update" to v1.3 I see that the fix is there. Regards, Nadia > It would be a good idea to tie into the sos work to avoid conflicts > when it all gets merged back together, assuming that isn't a big > problem for you. > > As for Jeff's suggestion: dealing with the performance hit problem is > why I suggested ORTE_NOTIFIER_VERBOSE, modeled after the > OPAL_OUTPUT_VERBOSE model. The idea was to compile it in -only- when > the system is built for it - maybe using a --with-notifier-verbose > configuration option. Frankly, some organizations would happily pay a > small performance penalty for the benefits. > > I would personally recommend that the notifier framework keep the > stats so things can be compact and self-contained. We still get > atomicity by allowing each framework/component/whatever specify the > threshold. Creating yet another system to do nothing more than track > error/warning frequencies to decide whether or not to notify seems > wasteful. > > Perhaps worth a phone call to decide path forward? > > > On Tue, May 26, 2009 at 1:06 PM, Jeff Squyres > wrote: > Nadia -- > > Sorry I didn't get to jump in on the other thread earlier. > > We have made considerable changes to the notifier framework in > a branch to better support "SOS" functionality: > > > https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos > > Cisco and Indiana U. have been working on this branch for a > while. A description of the SOS stuff is here: > >https://svn.open-mpi.org/trac/ompi/wiki/ErrorMessages > > As for setting up an external web server with hg, don't bother > -- just get an account at bitbucket.org. They're free and > allow you to host hg repositories there. I've used bitbucket > to collaborate on code before it hits OMPI's SVN trunk with > both internal and external OMPI developers. > > We can certainly move the opal-sos repo to bitbucket (or > branch again off opal-sos to bitbucket -- whatever makes more > sense) to facilitate collaborating with you. > > Back on topic... > > I'd actually suggest a combination of what has been discussed > in the other thread. The notifier can be the mechanism that > actually sends the output message, but it doesn't have to be > the mechanism that tracks the stats and decides when to output > a message. That can be separate logic, and therefore be more > fine-grained (and potentially even specific to the MPI layer). > > The Big Question will how to do this with zero performance > impact when it is not being used. This has always been the > difficult issue when trying to implement any kind of > monitoring inside the core OMPI performance-sensitive paths. > Even adding individual branches has met with resistance (in > performance-critical code paths)... > > > > > > On May 26, 2009, at 10:59 AM, Nadia Derbey wrote: > > > > Hi, > > While having a look at the notifier framework under > orte, I noticed that > the way it is written, the init routine for the > selected module cannot > be called. > > Attached is a small patch that fixes this issue. > > Regards, > Nadia > > > > > > -- > Jeff Squyres > Cisco Systems > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Nadia Derbey
Re: [OMPI devel] Remove IMB 2.3 from ompi-tests?
Jeff Squyres wrote: > On May 27, 2009, at 6:49 AM, Holger Mickler wrote: > >> would you mind sharing this patch? We'd like to test our current VT >> version with >> some MPI RMA code :) >> > > No problem-o. I've submitted this patch upstream to Intel as well. > Note that the patch slightly changed between 3.1 and 3.2; this is the > 3.2 patch: > > --- imb/src/IMB_window.c2008-10-21 01:17:31.0 -0700 > +++ IMB_3.2/src/IMB_window.c2009-05-26 05:29:15.0 -0700 > @@ -140,6 +140,9 @@ > c_info->rank, 0, 1, c_info->r_data_type, > c_info->WIN); >MPI_ERRHAND(ierr); >} > + /* JMS Added a call to MPI_WIN_FENCE, per MPI-2.1 11.2.1 */ > + ierr = MPI_Win_fence(0, c_info->WIN); > + MPI_ERRHAND(ierr); >ierr = MPI_Win_free(&c_info->WIN); >MPI_ERRHAND(ierr); > } > Great, works fine! >> Does anyone know of some (small) code/benchmark that uses all >> available MPI RMA >> functionality? As far as I see, IMB only uses fence and >> put/get/accumulate. No >> locks or post/wait/start/complete... >> > > We have a few one-sided tests in the ompi-test repository (which I think > Dresden has access to?), but I'm not 100% sure that they're correct... > Yes, we do have access. We'll try the tests and see how far we can get :) Thanks a lot! Holger
Re: [OMPI devel] problem in the ORTE notifier framework
On Wed, 2009-05-27 at 14:25 -0400, Jeff Squyres wrote: > Excellent points; Ralph and I chatted about this on the phone today -- > we concur with George. > > Bull -- would peruse work for you? I think you mentioned before that > it didn't seem attractive to you. Well, it didn't because from what I understood, the MPI program need to be changed (register a callback routine for the event, activate the event, etc), and this is something we wanted to avoid. Now, if we are allowed to 1. define new "internal" PERUSE events, 2. internally set the associated callback routines why not using peruse? This combined with the orte notifier framework, could do the job I think. Regards, Nadia > I think George's point is that we > already have lots of hooks in place in the PML -- and they're called > peruse. So if we could use those hooks, then a) they're run-time > selectable already, and b) there's no additional cost in performance > critical/not-critical code paths (for the case where these stats are > not being collected) because PERUSE has been in the code base for a > long time. > > I think the idea is that your callbacks could be invoked by the peruse > hooks and then they can do whatever they want -- increment counters, > conditionally invoke the ORTE notifier system, etc. > > > > On May 27, 2009, at 11:34 AM, George Bosilca wrote: > > > What is a generic threshold? And what is a counter? We have a policy > > against such coding standards, and to be honest I would like to stick > > to it. The reason is that the PML is a very complex piece of code, and > > I would like to keep it as easy to understand as possible. If people > > start adding #if/#endif all over the code, we diverging from this > > goal. > > > > The only way to make this work is to call the notifier or some other > > framework in this "slow path" and let this other framework do it's own > > logic to determine what and when to print. Of course the cost of this > > is a function call plus an atomic operation (which is already not > > cheap). It's starting to get expensive, even for a "slow path", which > > in this particular context is just one insertion in an atomic FIFO. > > > > If instead of counting in number of times we try to send the fragment, > > and switch to a time base approach, this can be solved with the PERUSE > > calls. There is a callback when the request is created, and another > > callback when the first fragment is pushed successfully into the > > network. Computing the time between these two, allow a tool to figure > > out how much time the request was waiting in some internal queues, and > > therefore how much delay this added to the execution time. > > > >george. > > > > On May 27, 2009, at 06:59 , Ralph Castain wrote: > > > > > ORTE_NOTIFIER_VERBOSE(api, counter, threshold,...) > > > > > > #if WANT_NOTIFIER_VERBOSE > > > opal_atomic_increment(counter); > > > if (counter > threshold) { > > > orte_notifier.api(.) > > > } > > > #endif > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > -- Nadia Derbey