On 24.11.2013, at 10:22, Ralph Castain <r...@open-mpi.org> wrote: > The cuda support in the 1.7 series has been evolving - a number of patches > have been applied since 1.7.3 was released, and I see another (for > optimization) scheduled. > > You might try the 1.7.4 nightly tarball and see if the problem has been fixed.
Same problem with 1.7.4-nightly. But I compiled and started my little test program on a machine with actual Infiniband hardware and the problem disappeared! I guess on machines with Inifniband hardware OB1 is not selected at runtime? Is this correct? I still believe that ompi/mca/pml/ob1/* is not linked to common_cuda.*, although it should. I’m slightly overwhelmed by automake, so I don’t know how to add this reference and try it myself.. j > > On Nov 24, 2013, at 7:11 AM, Jörg Bornschein <j...@capsec.org> wrote: > >> On 23.11.2013, at 22:56, Dmitry N. Mikushin <maemar...@gmail.com> wrote: >> >>> VT is getting out of sync with CUDA from time to time, this already >>> happened before. >> >> Yes, thats what I thought and thats why I didn’t mention it as my main >> issue. >> >> >> >> I’m rather stuck because cuda support and ob1 don’t seem to fit together — >> at least on my systems. >> >> >> j >> >> >> >>> - D. >>> >>> >>> 2013/11/24 Jörg Bornschein <j...@capsec.org>: >>>> On 23.11.2013, at 21:42, Jörg Bornschein <j...@capsec.org> wrote: >>>> >>>> Sorry, >>>> >>>>> I’m typically compiling with >>>>> >>>>> ./configure —with-cuda >>>> >>>> >>>> I’m actually compiling with >>>> >>>> ./configure —with-cuda —disable-vt >>>> >>>> because otherwise I get a compile time error: >>>> >>>> make[5]: Entering directory >>>> `/u/bornj/software-old/src/openmpi-1.7.3/ompi/contrib/vt/vt/vtlib' >>>> CC libvt_la-vt_cudart.lo >>>> CC libvt_mpi_la-vt_pform_linux.lo >>>> CC libvt_mpi_la-vt_thrd.lo >>>> CC libvt_mpi_la-vt_trc.lo >>>> CC libvt_mpi_la-vt_user_comment.lo >>>> CC libvt_mpi_la-vt_user_control.lo >>>> CC libvt_mpi_la-vt_user_count.lo >>>> CC libvt_mpi_la-vt_user_marker.lo >>>> vt_cudart.c: In function 'cudaLaunch': >>>> vt_cudart.c:2725:15: error: 'vt_cupti_events_enabled' undeclared (first >>>> use in this function) >>>> vt_cudart.c:2725:15: note: each undeclared identifier is reported only >>>> once for each function it appears in >>>> >>>> >>>> >>>> j >>>> >>>> >>>> >>>>> but I tried combining it with various other options. OMPI builds fine, >>>>> but when I try to run programs compiled against it I always get: >>>>> >>>>> /a.out: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: >>>>> undefined symbol: progress_one_cuda_htod_event >>>>> >>>>> That error even seems to make sense, because the code in >>>>> ompi/mca/pml/ob1/ refers to common_cuda.[ch], but it does not >>>>> seem to link against it's dynamic binary. >>>>> >>>>> Am I missing something? >>>>> >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> jb >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel