On 24.11.2013, at 10:22, Ralph Castain <[email protected]> wrote:
> The cuda support in the 1.7 series has been evolving - a number of patches
> have been applied since 1.7.3 was released, and I see another (for
> optimization) scheduled.
>
> You might try the 1.7.4 nightly tarball and see if the problem has been fixed.
Same problem with 1.7.4-nightly.
But I compiled and started my little test program on a machine with actual
Infiniband hardware
and the problem disappeared! I guess on machines with Inifniband hardware OB1
is not
selected at runtime? Is this correct?
I still believe that ompi/mca/pml/ob1/* is not linked to common_cuda.*,
although it
should. I’m slightly overwhelmed by automake, so I don’t know how to add this
reference and try it myself..
j
>
> On Nov 24, 2013, at 7:11 AM, Jörg Bornschein <[email protected]> wrote:
>
>> On 23.11.2013, at 22:56, Dmitry N. Mikushin <[email protected]> wrote:
>>
>>> VT is getting out of sync with CUDA from time to time, this already
>>> happened before.
>>
>> Yes, thats what I thought and thats why I didn’t mention it as my main
>> issue.
>>
>>
>>
>> I’m rather stuck because cuda support and ob1 don’t seem to fit together —
>> at least on my systems.
>>
>>
>> j
>>
>>
>>
>>> - D.
>>>
>>>
>>> 2013/11/24 Jörg Bornschein <[email protected]>:
>>>> On 23.11.2013, at 21:42, Jörg Bornschein <[email protected]> wrote:
>>>>
>>>> Sorry,
>>>>
>>>>> I’m typically compiling with
>>>>>
>>>>> ./configure —with-cuda
>>>>
>>>>
>>>> I’m actually compiling with
>>>>
>>>> ./configure —with-cuda —disable-vt
>>>>
>>>> because otherwise I get a compile time error:
>>>>
>>>> make[5]: Entering directory
>>>> `/u/bornj/software-old/src/openmpi-1.7.3/ompi/contrib/vt/vt/vtlib'
>>>> CC libvt_la-vt_cudart.lo
>>>> CC libvt_mpi_la-vt_pform_linux.lo
>>>> CC libvt_mpi_la-vt_thrd.lo
>>>> CC libvt_mpi_la-vt_trc.lo
>>>> CC libvt_mpi_la-vt_user_comment.lo
>>>> CC libvt_mpi_la-vt_user_control.lo
>>>> CC libvt_mpi_la-vt_user_count.lo
>>>> CC libvt_mpi_la-vt_user_marker.lo
>>>> vt_cudart.c: In function 'cudaLaunch':
>>>> vt_cudart.c:2725:15: error: 'vt_cupti_events_enabled' undeclared (first
>>>> use in this function)
>>>> vt_cudart.c:2725:15: note: each undeclared identifier is reported only
>>>> once for each function it appears in
>>>>
>>>>
>>>>
>>>> j
>>>>
>>>>
>>>>
>>>>> but I tried combining it with various other options. OMPI builds fine,
>>>>> but when I try to run programs compiled against it I always get:
>>>>>
>>>>> /a.out: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so:
>>>>> undefined symbol: progress_one_cuda_htod_event
>>>>>
>>>>> That error even seems to make sense, because the code in
>>>>> ompi/mca/pml/ob1/ refers to common_cuda.[ch], but it does not
>>>>> seem to link against it's dynamic binary.
>>>>>
>>>>> Am I missing something?
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> jb
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> [email protected]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> [email protected]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> _______________________________________________
>>> devel mailing list
>>> [email protected]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel