Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn] svn:open-mpi r29733 - in trunk: config oshmem
But how it will work once oshmem/ folder will be merged into existing folders layout and will not have common root for all shmem files? On Nov 24, 2013 6:03 AM, "Barrett, Brian W" wrote: > I'm pretty sure I was clear it's a hack. But removing from subdirs is how > we disable a project, not by adding a big fixed around a makefile (see > ORTE). > > Brian > > > > Sent with Good (www.good.com) > > > -Original Message- > *From: *Mike Dubman [mi...@dev.mellanox.co.il] > *Sent: *Saturday, November 23, 2013 07:49 PM Mountain Standard Time > *To: *Open MPI Developers > *Subject: *[EXTERNAL] Re: [OMPI devel] [OMPI svn] svn:open-mpi r29733 - > in trunk: config oshmem > > Hi > Looking into Brian's fix it seems more hack then fix. > Could please explain what is wrong in one we committed? > > Thanks > On Nov 22, 2013 10:44 AM, "Ralph Castain" wrote: > >> Hey Mike >> >> This fix is incorrect - Brian already committed the correct way to >> disable oshmem. Please revert this. >> >> Thanks >> Ralph >> >> >> On Nov 22, 2013, at 5:51 AM, svn-commit-mai...@open-mpi.org wrote: >> >> > Author: miked (Mike Dubman) >> > Date: 2013-11-22 08:51:46 EST (Fri, 22 Nov 2013) >> > New Revision: 29733 >> > URL: https://svn.open-mpi.org/trac/ompi/changeset/29733 >> > >> > Log: >> > add support for --without-oshmem in configure >> > >> > Text files modified: >> > trunk/config/oshmem_configure_options.m4 |16 >> > trunk/oshmem/Makefile.am | 5 + >> > 2 files changed, 21 insertions(+), 0 deletions(-) >> > >> > Modified: trunk/config/oshmem_configure_options.m4 >> > >> == >> > --- trunk/config/oshmem_configure_options.m4 Fri Nov 22 07:37:31 2013 >>(r29732) >> > +++ trunk/config/oshmem_configure_options.m4 2013-11-22 08:51:46 EST >> (Fri, 22 Nov 2013) (r29733) >> > @@ -25,6 +25,22 @@ >> > [Disable building the OpenSHMEM interface])]) >> > >> > # >> > +# OSHMEM support >> > +# >> > +AC_MSG_CHECKING([if want OSHMEM support]) >> > +AC_ARG_WITH([oshmem], >> > +[AC_HELP_STRING([--with-oshmem], >> > +[Build with OSHMEM support (default=yes)])]) >> > +if test "$with_oshmem" != "no"; then >> > +AC_MSG_RESULT([yes]) >> > +oshmem_with_support=1 >> > +else >> > +AC_MSG_RESULT([no]) >> > +oshmem_with_support=0 >> > +fi >> > +AM_CONDITIONAL(OSHMEM_SUPPORT, test "$oshmem_with_support" = 1) >> > + >> > +# >> > # Enable compatibility mode >> > # >> > AC_MSG_CHECKING([if want SGI/Quadrix compatibility mode]) >> > >> > Modified: trunk/oshmem/Makefile.am >> > >> == >> > --- trunk/oshmem/Makefile.am Fri Nov 22 07:37:31 2013(r29732) >> > +++ trunk/oshmem/Makefile.am 2013-11-22 08:51:46 EST (Fri, 22 Nov >> 2013) (r29733) >> > @@ -9,6 +9,9 @@ >> > # $HEADER$ >> > # >> > >> > +# Do we need to build OSHMEM? >> > +if OSHMEM_SUPPORT >> > + >> > # Do we have profiling? >> > if OSHMEM_PROFILING >> > c_pshmem_lib = shmem/c/profile/libshmem_c_pshmem.la >> > @@ -99,3 +102,5 @@ >> > # Remove the generated man pages >> > distclean-local: >> > rm -f $(nodist_man_MANS) $(dir_stamp) >> > + >> > +endif # OSHMEM_SUPPORT >> > ___ >> > svn mailing list >> > s...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/svn >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] CUDA support not working?
On 23.11.2013, at 22:56, Dmitry N. Mikushin wrote: > VT is getting out of sync with CUDA from time to time, this already > happened before. Yes, thats what I thought and thats why I didn’t mention it as my main issue. I’m rather stuck because cuda support and ob1 don’t seem to fit together — at least on my systems. j > - D. > > > 2013/11/24 Jörg Bornschein : >> On 23.11.2013, at 21:42, Jörg Bornschein wrote: >> >> Sorry, >> >>> I’m typically compiling with >>> >>> ./configure —with-cuda >> >> >> I’m actually compiling with >> >> ./configure —with-cuda —disable-vt >> >> because otherwise I get a compile time error: >> >> make[5]: Entering directory >> `/u/bornj/software-old/src/openmpi-1.7.3/ompi/contrib/vt/vt/vtlib' >> CC libvt_la-vt_cudart.lo >> CC libvt_mpi_la-vt_pform_linux.lo >> CC libvt_mpi_la-vt_thrd.lo >> CC libvt_mpi_la-vt_trc.lo >> CC libvt_mpi_la-vt_user_comment.lo >> CC libvt_mpi_la-vt_user_control.lo >> CC libvt_mpi_la-vt_user_count.lo >> CC libvt_mpi_la-vt_user_marker.lo >> vt_cudart.c: In function 'cudaLaunch': >> vt_cudart.c:2725:15: error: 'vt_cupti_events_enabled' undeclared (first use >> in this function) >> vt_cudart.c:2725:15: note: each undeclared identifier is reported only once >> for each function it appears in >> >> >> >>j >> >> >> >>> but I tried combining it with various other options. OMPI builds fine, but >>> when I try to run programs compiled against it I always get: >>> >>> /a.out: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: >>> undefined symbol: progress_one_cuda_htod_event >>> >>> That error even seems to make sense, because the code in ompi/mca/pml/ob1/ >>> refers to common_cuda.[ch], but it does not >>> seem to link against it's dynamic binary. >>> >>> Am I missing something? >>> >>> >>> Thanks! >>> >>> >>> jb >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] CUDA support not working?
I disable VT using “—enable-contrib-no-build=vt”, and it works. You should try with this instead of “—disable-ft”. George. On Nov 24, 2013, at 16:11 , Jörg Bornschein wrote: > On 23.11.2013, at 22:56, Dmitry N. Mikushin wrote: > >> VT is getting out of sync with CUDA from time to time, this already >> happened before. > > Yes, thats what I thought and thats why I didn’t mention it as my main issue. > > > > I’m rather stuck because cuda support and ob1 don’t seem to fit together — at > least on my systems. > > > j > > > >> - D. >> >> >> 2013/11/24 Jörg Bornschein : >>> On 23.11.2013, at 21:42, Jörg Bornschein wrote: >>> >>> Sorry, >>> I’m typically compiling with ./configure —with-cuda >>> >>> >>> I’m actually compiling with >>> >>> ./configure —with-cuda —disable-vt >>> >>> because otherwise I get a compile time error: >>> >>> make[5]: Entering directory >>> `/u/bornj/software-old/src/openmpi-1.7.3/ompi/contrib/vt/vt/vtlib' >>> CC libvt_la-vt_cudart.lo >>> CC libvt_mpi_la-vt_pform_linux.lo >>> CC libvt_mpi_la-vt_thrd.lo >>> CC libvt_mpi_la-vt_trc.lo >>> CC libvt_mpi_la-vt_user_comment.lo >>> CC libvt_mpi_la-vt_user_control.lo >>> CC libvt_mpi_la-vt_user_count.lo >>> CC libvt_mpi_la-vt_user_marker.lo >>> vt_cudart.c: In function 'cudaLaunch': >>> vt_cudart.c:2725:15: error: 'vt_cupti_events_enabled' undeclared (first use >>> in this function) >>> vt_cudart.c:2725:15: note: each undeclared identifier is reported only once >>> for each function it appears in >>> >>> >>> >>> j >>> >>> >>> but I tried combining it with various other options. OMPI builds fine, but when I try to run programs compiled against it I always get: /a.out: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event That error even seems to make sense, because the code in ompi/mca/pml/ob1/ refers to common_cuda.[ch], but it does not seem to link against it's dynamic binary. Am I missing something? Thanks! jb ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] CUDA support not working?
The cuda support in the 1.7 series has been evolving - a number of patches have been applied since 1.7.3 was released, and I see another (for optimization) scheduled. You might try the 1.7.4 nightly tarball and see if the problem has been fixed. On Nov 24, 2013, at 7:11 AM, Jörg Bornschein wrote: > On 23.11.2013, at 22:56, Dmitry N. Mikushin wrote: > >> VT is getting out of sync with CUDA from time to time, this already >> happened before. > > Yes, thats what I thought and thats why I didn’t mention it as my main issue. > > > > I’m rather stuck because cuda support and ob1 don’t seem to fit together — at > least on my systems. > > > j > > > >> - D. >> >> >> 2013/11/24 Jörg Bornschein : >>> On 23.11.2013, at 21:42, Jörg Bornschein wrote: >>> >>> Sorry, >>> I’m typically compiling with ./configure —with-cuda >>> >>> >>> I’m actually compiling with >>> >>> ./configure —with-cuda —disable-vt >>> >>> because otherwise I get a compile time error: >>> >>> make[5]: Entering directory >>> `/u/bornj/software-old/src/openmpi-1.7.3/ompi/contrib/vt/vt/vtlib' >>> CC libvt_la-vt_cudart.lo >>> CC libvt_mpi_la-vt_pform_linux.lo >>> CC libvt_mpi_la-vt_thrd.lo >>> CC libvt_mpi_la-vt_trc.lo >>> CC libvt_mpi_la-vt_user_comment.lo >>> CC libvt_mpi_la-vt_user_control.lo >>> CC libvt_mpi_la-vt_user_count.lo >>> CC libvt_mpi_la-vt_user_marker.lo >>> vt_cudart.c: In function 'cudaLaunch': >>> vt_cudart.c:2725:15: error: 'vt_cupti_events_enabled' undeclared (first use >>> in this function) >>> vt_cudart.c:2725:15: note: each undeclared identifier is reported only once >>> for each function it appears in >>> >>> >>> >>> j >>> >>> >>> but I tried combining it with various other options. OMPI builds fine, but when I try to run programs compiled against it I always get: /a.out: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event That error even seems to make sense, because the code in ompi/mca/pml/ob1/ refers to common_cuda.[ch], but it does not seem to link against it's dynamic binary. Am I missing something? Thanks! jb ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] CUDA support not working?
On 24.11.2013, at 10:22, Ralph Castain wrote: > The cuda support in the 1.7 series has been evolving - a number of patches > have been applied since 1.7.3 was released, and I see another (for > optimization) scheduled. > > You might try the 1.7.4 nightly tarball and see if the problem has been fixed. Same problem with 1.7.4-nightly. But I compiled and started my little test program on a machine with actual Infiniband hardware and the problem disappeared! I guess on machines with Inifniband hardware OB1 is not selected at runtime? Is this correct? I still believe that ompi/mca/pml/ob1/* is not linked to common_cuda.*, although it should. I’m slightly overwhelmed by automake, so I don’t know how to add this reference and try it myself.. j > > On Nov 24, 2013, at 7:11 AM, Jörg Bornschein wrote: > >> On 23.11.2013, at 22:56, Dmitry N. Mikushin wrote: >> >>> VT is getting out of sync with CUDA from time to time, this already >>> happened before. >> >> Yes, thats what I thought and thats why I didn’t mention it as my main >> issue. >> >> >> >> I’m rather stuck because cuda support and ob1 don’t seem to fit together — >> at least on my systems. >> >> >> j >> >> >> >>> - D. >>> >>> >>> 2013/11/24 Jörg Bornschein : On 23.11.2013, at 21:42, Jörg Bornschein wrote: Sorry, > I’m typically compiling with > > ./configure —with-cuda I’m actually compiling with ./configure —with-cuda —disable-vt because otherwise I get a compile time error: make[5]: Entering directory `/u/bornj/software-old/src/openmpi-1.7.3/ompi/contrib/vt/vt/vtlib' CC libvt_la-vt_cudart.lo CC libvt_mpi_la-vt_pform_linux.lo CC libvt_mpi_la-vt_thrd.lo CC libvt_mpi_la-vt_trc.lo CC libvt_mpi_la-vt_user_comment.lo CC libvt_mpi_la-vt_user_control.lo CC libvt_mpi_la-vt_user_count.lo CC libvt_mpi_la-vt_user_marker.lo vt_cudart.c: In function 'cudaLaunch': vt_cudart.c:2725:15: error: 'vt_cupti_events_enabled' undeclared (first use in this function) vt_cudart.c:2725:15: note: each undeclared identifier is reported only once for each function it appears in j > but I tried combining it with various other options. OMPI builds fine, > but when I try to run programs compiled against it I always get: > > /a.out: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: > undefined symbol: progress_one_cuda_htod_event > > That error even seems to make sense, because the code in > ompi/mca/pml/ob1/ refers to common_cuda.[ch], but it does not > seem to link against it's dynamic binary. > > Am I missing something? > > > Thanks! > > > jb > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] CUDA support not working?
On Nov 24, 2013, at 8:30 AM, Jörg Bornscheinwrote:On 24.11.2013, at 10:22, Ralph Castain wrote:The cuda support in the 1.7 series has been evolving - a number of patches have been applied since 1.7.3 was released, and I see another (for optimization) scheduled.You might try the 1.7.4 nightly tarball and see if the problem has been fixed.Same problem with 1.7.4-nightly.But I compiled and started my little test program on a machine with actual Infiniband hardwareand the problem disappeared! I guess on machines with Inifniband hardware OB1 is notselected at runtime? Is this correct?Sounds like a bug to me - if cuda is being used, we need to select ob1 regardless. I'll have to let Rolf figure that one out.I still believe that ompi/mca/pml/ob1/* is not linked to common_cuda.*, although it should. I’m slightly overwhelmed by automake, so I don’t know how to add thisreference and try it myself..Try the attached - should fix the problem. pml.diff Description: Binary data j On Nov 24, 2013, at 7:11 AM, Jörg Bornschein wrote:On 23.11.2013, at 22:56, Dmitry N. Mikushin wrote:VT is getting out of sync with CUDA from time to time, this alreadyhappened before.Yes, thats what I thought and thats why I didn’t mention it as my main issue. I’m rather stuck because cuda support and ob1 don’t seem to fit together — at least on my systems. j- D.2013/11/24 Jörg Bornschein :On 23.11.2013, at 21:42, Jörg Bornschein wrote:Sorry,I’m typically compiling with./configure —with-cudaI’m actually compiling with./configure —with-cuda —disable-vtbecause otherwise I get a compile time error:make[5]: Entering directory `/u/bornj/software-old/src/openmpi-1.7.3/ompi/contrib/vt/vt/vtlib'CC libvt_la-vt_cudart.loCC libvt_mpi_la-vt_pform_linux.loCC libvt_mpi_la-vt_thrd.loCC libvt_mpi_la-vt_trc.loCC libvt_mpi_la-vt_user_comment.loCC libvt_mpi_la-vt_user_control.loCC libvt_mpi_la-vt_user_count.loCC libvt_mpi_la-vt_user_marker.lovt_cudart.c: In function 'cudaLaunch':vt_cudart.c:2725:15: error: 'vt_cupti_events_enabled' undeclared (first use in this function)vt_cudart.c:2725:15: note: each undeclared identifier is reported only once for each function it appears in jbut I tried combining it with various other options. OMPI builds fine, but when I try to run programs compiled against it I always get:/a.out: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_eventThat error even seems to make sense, because the code in ompi/mca/pml/ob1/ refers to common_cuda.[ch], but it does notseem to link against it's dynamic binary.Am I missing something?Thanks!jb___devel mailing listde...@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/devel___devel mailing listde...@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/devel___devel mailing listde...@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/devel___devel mailing listde...@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/devel___devel mailing listde...@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/devel___devel mailing listde...@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] CUDA support not working?
On 24.11.2013, at 12:08, Ralph Castain wrote: > On Nov 24, 2013, at 8:30 AM, Jörg Bornschein wrote: >> On 24.11.2013, at 10:22, Ralph Castain wrote: >> I still believe that ompi/mca/pml/ob1/* is not linked to common_cuda.*, >> although it >> should. I’m slightly overwhelmed by automake, so I don’t know how to add this >> reference and try it myself.. > > Try the attached - should fix the problem. That one indeed fixed it, Thanks! j > > >> >>j >> >> >> >> >>> >>> On Nov 24, 2013, at 7:11 AM, Jörg Bornschein wrote: >>> On 23.11.2013, at 22:56, Dmitry N. Mikushin wrote: > VT is getting out of sync with CUDA from time to time, this already > happened before. Yes, thats what I thought and thats why I didn’t mention it as my main issue. I’m rather stuck because cuda support and ob1 don’t seem to fit together — at least on my systems. j > - D. > > > 2013/11/24 Jörg Bornschein : >> On 23.11.2013, at 21:42, Jörg Bornschein wrote: >> >> Sorry, >> >>> I’m typically compiling with >>> >>> ./configure —with-cuda >> >> >> I’m actually compiling with >> >> ./configure —with-cuda —disable-vt >> >> because otherwise I get a compile time error: >> >> make[5]: Entering directory >> `/u/bornj/software-old/src/openmpi-1.7.3/ompi/contrib/vt/vt/vtlib' >> CC libvt_la-vt_cudart.lo >> CC libvt_mpi_la-vt_pform_linux.lo >> CC libvt_mpi_la-vt_thrd.lo >> CC libvt_mpi_la-vt_trc.lo >> CC libvt_mpi_la-vt_user_comment.lo >> CC libvt_mpi_la-vt_user_control.lo >> CC libvt_mpi_la-vt_user_count.lo >> CC libvt_mpi_la-vt_user_marker.lo >> vt_cudart.c: In function 'cudaLaunch': >> vt_cudart.c:2725:15: error: 'vt_cupti_events_enabled' undeclared (first >> use in this function) >> vt_cudart.c:2725:15: note: each undeclared identifier is reported only >> once for each function it appears in >> >> >> >> j >> >> >> >>> but I tried combining it with various other options. OMPI builds fine, >>> but when I try to run programs compiled against it I always get: >>> >>> /a.out: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: >>> undefined symbol: progress_one_cuda_htod_event >>> >>> That error even seems to make sense, because the code in >>> ompi/mca/pml/ob1/ refers to common_cuda.[ch], but it does not >>> seem to link against it's dynamic binary. >>> >>> Am I missing something? >>> >>> >>> Thanks! >>> >>> >>> jb >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel