Re: [OMPI devel] [OMPI users] configuring open mpi 10.1.2 with cuda on NVIDIA TK1

2016-01-22 Thread Sylvain Jeaugey

[Moving To Devel]

I tried to look at the configure to understand why the hwloc part failed 
at getting the CUDA path. I guess the --with-cuda information is not 
propagated to the hwloc part of the configure.


If an m4 expert has an idea of how to do this the The Right Way, that 
would help.


Thanks,
Sylvain

On 01/22/2016 10:07 AM, Sylvain Jeaugey wrote:
It looks like the errors are produced by the hwloc configure ; this 
one somehow can't find CUDA (I have to check if that's a problem btw). 
Anyway, later in the configure, the VT configure finds cuda correctly, 
so it seems specific to the hwloc configure.


On 01/22/2016 10:01 AM, Kuhl, Spencer J wrote:


Hi Sylvain,


The configure does not stop, 'make all install' completes.  After 
remaking and recompiling then ignoring the configure errors, and 
confirming both a functional cuda install and functional openmpi 
install.  I went to the /usr/local/cuda/samples directory and ran 
'make' and succesfully ran 'simpleMPI' provided by NVIDIA.  The 
output suggested that everything works perfectly fine between openMPI 
and cuda on my Jetson TK1 install.  Because of this, I think it is as 
you suspected; it was just ./configure output noise.



What a frustrating exercise.  Thanks for the suggestion.  I think I 
can say 'case closed'



Spencer





*From:* users  on behalf of Sylvain 
Jeaugey 

*Sent:* Friday, January 22, 2016 11:34 AM
*To:* us...@open-mpi.org
*Subject:* Re: [OMPI users] configuring open mpi 10.1.2 with cuda on 
NVIDIA TK1

Hi Spencer,

Could you be more specific about what fails ? Did the configure stop 
at some point ? Or is it a compile error during the build ?


I'm not sure the errors you are seeing in config.log are actually the 
real problem (I'm seeing the same error traces on a perfectly working 
machine). Not pretty, but maybe just noise.


Thanks,
Sylvain

On 01/22/2016 06:48 AM, Kuhl, Spencer J wrote:


Thanks for the suggestion Ryan, I will remove the symlinks and start 
try again.  I checked config.log, and it appears that the configure 
finds cuda support, (result: yes), but once configure checks for 
cuda.h usability, conftest.c reports that a fatal error occurred, 
'cuda.h no such file or directory.'



I have copied here some grep'ed output of config.log


$ ./configure --prefix=/usr/local --with-cuda=/usr/local/cuda-6.5 
--enable-mpi-java

configure:9829: checking if --with-cuda is set
configure:9883: result: found (/usr/local/cuda-6.5/include/cuda.h)
| #include 
configure:10055: checking if have cuda support
configure:10058: result: yes (-I/usr/local/cuda-6.5)
configure:66435: result: '--prefix=/usr/local' 
'--with-cuda=/usr/local/cuda-6.5' '--enable-mpi-java'

configure:74182: checking cuda.h usability
conftest.c:643:18: fatal error: cuda.h: No such file or directory
 #include 
| #include 
configure:74182: checking cuda.h presence
conftest.c:610:18: fatal error: cuda.h: No such file or directory
 #include 
| #include 
configure:74182: checking for cuda.h
configure:74265: checking cuda_runtime_api.h usability
conftest.c:643:30: fatal error: cuda_runtime_api.h: No such file or 
directory

 #include 
| #include 
configure:74265: checking cuda_runtime_api.h presence
conftest.c:610:30: fatal error: cuda_runtime_api.h: No such file or 
directory

 #include 
| #include 
configure:74265: checking for cuda_runtime_api.h
configure:97946: running /bin/bash './configure' --disable-dns 
--disable-http --disable-rpc --disable-openssl 
--enable-thread-support --disable-evport '--prefix=/usr/local' 
'--with-cuda=/usr/local/cuda-6.5' '--enable-mpi-java' 
--cache-file=/dev/null --srcdir=. --disable-option-checking

configure:187066: result: verbs_usnic, ugni, sm, verbs, cuda
configure:193532: checking for MCA component common:cuda compile mode
configure:193585: checking if MCA component common:cuda can compile




*From:* users  on behalf of Novosielski, 
Ryan 

*Sent:* Friday, January 22, 2016 1:20 AM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] configuring open mpi 10.1.2 with cuda on 
NVIDIA TK1
I would check config.log carefully to see what specifically failed 
or wasn't found where. I would never mess around with the contents 
of /usr/include. That is sloppy stuff and likely to get you into 
trouble someday.


 *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS  |-*O*-
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novos...@rutgers.edu 
- 973/972.0922 (2x0922)

||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'

On Jan 21, 2016, at 17:45, Kuhl, Spencer J  
wrote:




Openmpi 1.10.2

cuda.h and cuda_runtime_api.h exist in /usr/local/cuda-6.5/include

using the configure trigger ./configure --with-cuda does not find 
cuda.h or cuda_runtime_ap

Re: [OMPI devel] [OMPI users] configuring open mpi 10.1.2 with cuda on NVIDIA TK1

2016-01-22 Thread Brice Goglin
Hello
hwloc doesn't have any cuda specific configure variables. We just use
standard variables like LIBS and CPPFLAGS. I guess OMPI could propagate
--with-cuda directories to hwloc by setting LIBS and CPPFLAGS before
running hwloc m4 functions, but I don't think OMPI actually cares about
hwloc reporting CUDA device locality anyway, and OMPI might stop
embedding hwloc in the near future anyway.
Brice



Le 22/01/2016 23:34, Sylvain Jeaugey a écrit :
> [Moving To Devel]
>
> I tried to look at the configure to understand why the hwloc part
> failed at getting the CUDA path. I guess the --with-cuda information
> is not propagated to the hwloc part of the configure.
>
> If an m4 expert has an idea of how to do this the The Right Way, that
> would help.
>
> Thanks,
> Sylvain
>
> On 01/22/2016 10:07 AM, Sylvain Jeaugey wrote:
>> It looks like the errors are produced by the hwloc configure ; this
>> one somehow can't find CUDA (I have to check if that's a problem
>> btw). Anyway, later in the configure, the VT configure finds cuda
>> correctly, so it seems specific to the hwloc configure.
>>
>> On 01/22/2016 10:01 AM, Kuhl, Spencer J wrote:
>>>
>>> Hi Sylvain,
>>>
>>>
>>> The configure does not stop, 'make all install' completes.  After
>>> remaking and recompiling then ignoring the configure errors, and
>>> confirming both a functional cuda install and functional openmpi
>>> install.  I went to the /usr/local/cuda/samples directory and ran
>>> 'make' and succesfully ran 'simpleMPI' provided by NVIDIA.  The
>>> output suggested that everything works perfectly fine between
>>> openMPI and cuda on my Jetson TK1 install.  Because of this, I think
>>> it is as you suspected; it was just ./configure output noise.  
>>>
>>>
>>> What a frustrating exercise.  Thanks for the suggestion.  I think I
>>> can say 'case closed'
>>>
>>>
>>> Spencer
>>>
>>>
>>>
>>>
>>> 
>>> *From:* users  on behalf of Sylvain
>>> Jeaugey 
>>> *Sent:* Friday, January 22, 2016 11:34 AM
>>> *To:* us...@open-mpi.org
>>> *Subject:* Re: [OMPI users] configuring open mpi 10.1.2 with cuda on
>>> NVIDIA TK1
>>>  
>>> Hi Spencer,
>>>
>>> Could you be more specific about what fails ? Did the configure stop
>>> at some point ? Or is it a compile error during the build ?
>>>
>>> I'm not sure the errors you are seeing in config.log are actually
>>> the real problem (I'm seeing the same error traces on a perfectly
>>> working machine). Not pretty, but maybe just noise.
>>>
>>> Thanks,
>>> Sylvain
>>>
>>> On 01/22/2016 06:48 AM, Kuhl, Spencer J wrote:

 Thanks for the suggestion Ryan, I will remove the symlinks and
 start try again.  I checked config.log, and it appears that the
 configure finds cuda support, (result: yes), but once configure
 checks for cuda.h usability, conftest.c reports that a fatal error
 occurred, 'cuda.h no such file or directory.'   


 I have copied here some grep'ed output of config.log


 $ ./configure --prefix=/usr/local --with-cuda=/usr/local/cuda-6.5
 --enable-mpi-java
 configure:9829: checking if --with-cuda is set
 configure:9883: result: found (/usr/local/cuda-6.5/include/cuda.h)
 | #include 
 configure:10055: checking if have cuda support
 configure:10058: result: yes (-I/usr/local/cuda-6.5)
 configure:66435: result:  '--prefix=/usr/local'
 '--with-cuda=/usr/local/cuda-6.5' '--enable-mpi-java'
 configure:74182: checking cuda.h usability
 conftest.c:643:18: fatal error: cuda.h: No such file or directory
  #include 
 | #include 
 configure:74182: checking cuda.h presence
 conftest.c:610:18: fatal error: cuda.h: No such file or directory
  #include 
 | #include 
 configure:74182: checking for cuda.h
 configure:74265: checking cuda_runtime_api.h usability
 conftest.c:643:30: fatal error: cuda_runtime_api.h: No such file or
 directory
  #include 
 | #include 
 configure:74265: checking cuda_runtime_api.h presence
 conftest.c:610:30: fatal error: cuda_runtime_api.h: No such file or
 directory
  #include 
 | #include 
 configure:74265: checking for cuda_runtime_api.h
 configure:97946: running /bin/bash './configure' --disable-dns
 --disable-http --disable-rpc --disable-openssl
 --enable-thread-support --disable-evport  '--prefix=/usr/local'
 '--with-cuda=/usr/local/cuda-6.5' '--enable-mpi-java'
 --cache-file=/dev/null --srcdir=. --disable-option-checking
 configure:187066: result: verbs_usnic, ugni, sm, verbs, cuda
 configure:193532: checking for MCA component common:cuda compile mode
 configure:193585: checking if MCA component common:cuda can compile



 
 *From:* users  on behalf of
 Novosielski, Ryan 
 *Sent:* Friday, January 22, 2016 1:20 AM
 *To:* Open MPI Users
 

Re: [OMPI devel] [OMPI users] configuring open mpi 10.1.2 with cuda on NVIDIA TK1

2016-01-22 Thread Jeff Squyres (jsquyres)
On Jan 22, 2016, at 5:47 PM, Brice Goglin  wrote:
> 
> hwloc doesn't have any cuda specific configure variables. We just use 
> standard variables like LIBS and CPPFLAGS. I guess OMPI could propagate 
> --with-cuda directories to hwloc by setting LIBS and CPPFLAGS before running 
> hwloc m4 functions, but I don't think OMPI actually cares about hwloc 
> reporting CUDA device locality

I guess that's a question for NVIDIA -- do you guys use (or want to use) CUDA 
device locality in the Open MPI hwloc information?

If so, it might be appropriate to do what Brice suggests -- in 
opal/mca/hwloc*/configure.m4, (temporarily) explode --with-cuda into CPPFLAGS 
and LDFLAGS.

> anyway, and OMPI might stop embedding hwloc in the near future anyway.

Mmm... I'm not convinced of that.  :-)

Regardless of what we decide w.r.t. embedding hwloc, I think it would be worth 
exploding --with-cuda in the hwloc configure.m4, if NVIDIA wants CUDA devices 
in the Open MPI hwloc info (my $0.02: NVIDIA: I think you should want this 
information :-) ).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/