Re: [OMPI users] "Warning :: opal_list_remove_item" with openmpi-2.1.0rc4
Roland, the easiest way is to use an external hwloc that is configured with --disable-nvml an other option is to hack the embedded hwloc configure.m4 and pass --disable-nvml to the embedded hwloc configure. note this requires you run autogen.sh and you hence needs recent autotools. i guess Open MPI 1.8 embeds an older hwloc that is not aware of nvml, hence the lack of warning. Cheers, Gilles On Wednesday, March 22, 2017, Roland Fehrenbacherwrote: > > "SJ" == Sylvain Jeaugey > writes: > > SJ> If you installed CUDA libraries and includes in /usr, then it's > SJ> not surprising hwloc finds them even without defining CFLAGS. > > Well, that's the place where distribution packages install to :) > I don't think a build system should misbehave, if libraries are installed > in default places. > > SJ> I'm just saying I think you won't get the error message if Open > SJ> MPI finds CUDA but hwloc does not. > > OK, so I think I need to ask the original question again: Is there a way > to suppress these warnings with a "normal" build? I guess the answer > must be yes, since 1.8.x didn't have this problem. The real question > then would be how ... > > Thanks, > > Roland > > SJ> On 03/21/2017 11:05 AM, Roland Fehrenbacher wrote: > >>> "SJ" == Sylvain Jeaugey > > writes: > >> Hi Silvain, > >> > >> I get the "NVIDIA : ..." run-time error messages just by > >> compiling with "--with-cuda=/usr": > >> > >> ./configure --prefix=${prefix} \ --mandir=${prefix}/share/man \ > >> --infodir=${prefix}/share/info \ > >> --sysconfdir=/etc/openmpi/${VERSION} --with-devel-headers \ > >> --disable-memchecker \ --disable-vt \ --with-tm --with-slurm > >> --with-pmi --with-sge \ --with-cuda=/usr \ > >> --with-io-romio-flags='--with-file-system=nfs+lustre' \ > >> --with-cma --without-valgrind \ --enable-openib-connectx-xrc \ > >> --enable-orterun-prefix-by-default \ --disable-java > >> > >> Roland > >> > SJ> Hi Siegmar, I think this "NVIDIA : ..." error message comes from > SJ> the fact that you add CUDA includes in the C*FLAGS. If you just > SJ> use --with-cuda, Open MPI will compile with CUDA support, but > SJ> hwloc will not find CUDA and that will be fine. However, setting > SJ> CUDA in CFLAGS will make hwloc find CUDA, compile CUDA support > SJ> (which is not needed) and then NVML will show this error message > SJ> when not run on a machine with CUDA devices. > >> > SJ> I guess gcc picks the environment variable, while cc does not > SJ> hence the different behavior. So again, there is no need to add > SJ> all those CUDA includes, --with-cuda is enough. > >> > SJ> About the opal_list_remove_item, we'll try to reproduce the > SJ> issue and see where it comes from. > >> > SJ> Sylvain > >> > SJ> On 03/21/2017 12:38 AM, Siegmar Gross wrote: > >> >> Hi, > >> >> > >> >> I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise > >> >> Server > >> >> 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get > >> >> once > >> >> more a warning about a missing item for one of my small > >> >> programs (it doesn't matter if I use my cc or gcc version). My > >> >> gcc version also displays the message "NVIDIA: no NVIDIA > >> >> devices found" for the server without NVIDIA devices (I don't > >> >> get the message for my cc version). I used the following > >> >> commands to build the package (${SYSTEM_ENV} is Linux and > >> >> ${MACHINE_ENV} is x86_64). > >> >> > >> >> > >> >> mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc cd > >> >> openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc > >> >> > >> >> ../openmpi-2.1.0rc4/configure \ > >> >> --prefix=/usr/local/openmpi-2.1.0_64_cc \ > >> >> --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \ > >> >> --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ > >> >> --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ > >> >> JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z > >> >> -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/ lib64" \ > >> >> CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt > >> >> -I/usr/local/include -I/usr/local/cuda/include" \ > >> >> CXXFLAGS="-m64 -I/usr/local/include -I/usr/local/cuda/include" > >> >> \ FCFLAGS="-m64" \ CPP="cpp -I/usr/local/include > >> >> -I/usr/local/cuda/include" \ CXXCPP="cpp -I/usr/local/include > >> >> -I/usr/local/cuda/include" \ --enable-mpi-cxx \ > >> >> --enable-cxx-exceptions \ --enable-mpi-java \ > >> >> --with-cuda=/usr/local/cuda \ > >> >> --with-valgrind=/usr/local/valgrind \ > >> >> --enable-mpi-thread-multiple \ --with-hwloc=internal \ > >> >> --without-verbs \ --with-wrapper-cflags="-m64 -mt" \ > >> >>
Re: [OMPI users] "Warning :: opal_list_remove_item" with openmpi-2.1.0rc4
> "SJ" == Sylvain Jeaugeywrites: SJ> If you installed CUDA libraries and includes in /usr, then it's SJ> not surprising hwloc finds them even without defining CFLAGS. Well, that's the place where distribution packages install to :) I don't think a build system should misbehave, if libraries are installed in default places. SJ> I'm just saying I think you won't get the error message if Open SJ> MPI finds CUDA but hwloc does not. OK, so I think I need to ask the original question again: Is there a way to suppress these warnings with a "normal" build? I guess the answer must be yes, since 1.8.x didn't have this problem. The real question then would be how ... Thanks, Roland SJ> On 03/21/2017 11:05 AM, Roland Fehrenbacher wrote: >>> "SJ" == Sylvain Jeaugey writes: >> Hi Silvain, >> >> I get the "NVIDIA : ..." run-time error messages just by >> compiling with "--with-cuda=/usr": >> >> ./configure --prefix=${prefix} \ --mandir=${prefix}/share/man \ >> --infodir=${prefix}/share/info \ >> --sysconfdir=/etc/openmpi/${VERSION} --with-devel-headers \ >> --disable-memchecker \ --disable-vt \ --with-tm --with-slurm >> --with-pmi --with-sge \ --with-cuda=/usr \ >> --with-io-romio-flags='--with-file-system=nfs+lustre' \ >> --with-cma --without-valgrind \ --enable-openib-connectx-xrc \ >> --enable-orterun-prefix-by-default \ --disable-java >> >> Roland >> SJ> Hi Siegmar, I think this "NVIDIA : ..." error message comes from SJ> the fact that you add CUDA includes in the C*FLAGS. If you just SJ> use --with-cuda, Open MPI will compile with CUDA support, but SJ> hwloc will not find CUDA and that will be fine. However, setting SJ> CUDA in CFLAGS will make hwloc find CUDA, compile CUDA support SJ> (which is not needed) and then NVML will show this error message SJ> when not run on a machine with CUDA devices. >> SJ> I guess gcc picks the environment variable, while cc does not SJ> hence the different behavior. So again, there is no need to add SJ> all those CUDA includes, --with-cuda is enough. >> SJ> About the opal_list_remove_item, we'll try to reproduce the SJ> issue and see where it comes from. >> SJ> Sylvain >> SJ> On 03/21/2017 12:38 AM, Siegmar Gross wrote: >> >> Hi, >> >> >> >> I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise >> >> Server >> >> 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get >> >> once >> >> more a warning about a missing item for one of my small >> >> programs (it doesn't matter if I use my cc or gcc version). My >> >> gcc version also displays the message "NVIDIA: no NVIDIA >> >> devices found" for the server without NVIDIA devices (I don't >> >> get the message for my cc version). I used the following >> >> commands to build the package (${SYSTEM_ENV} is Linux and >> >> ${MACHINE_ENV} is x86_64). >> >> >> >> >> >> mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc cd >> >> openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc >> >> >> >> ../openmpi-2.1.0rc4/configure \ >> >> --prefix=/usr/local/openmpi-2.1.0_64_cc \ >> >> --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \ >> >> --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ >> >> --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ >> >> JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z >> >> -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/ lib64" \ >> >> CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt >> >> -I/usr/local/include -I/usr/local/cuda/include" \ >> >> CXXFLAGS="-m64 -I/usr/local/include -I/usr/local/cuda/include" >> >> \ FCFLAGS="-m64" \ CPP="cpp -I/usr/local/include >> >> -I/usr/local/cuda/include" \ CXXCPP="cpp -I/usr/local/include >> >> -I/usr/local/cuda/include" \ --enable-mpi-cxx \ >> >> --enable-cxx-exceptions \ --enable-mpi-java \ >> >> --with-cuda=/usr/local/cuda \ >> >> --with-valgrind=/usr/local/valgrind \ >> >> --enable-mpi-thread-multiple \ --with-hwloc=internal \ >> >> --without-verbs \ --with-wrapper-cflags="-m64 -mt" \ >> >> --with-wrapper-cxxflags="-m64" \ --with-wrapper-fcflags="-m64" >> >> \ --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee >> >> log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc >> >> >> >> make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc rm -r >> >> /usr/local/openmpi-2.1.0_64_cc.old mv >> >> /usr/local/openmpi-2.1.0_64_cc >> >> /usr/local/openmpi-2.1.0_64_cc.old make install |& tee >> >> log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc make check |& >> >> tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc >> >> >> >> >> >> Sometimes everything works as expected. >> >> >> >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 >> >>
Re: [OMPI users] "Warning :: opal_list_remove_item" with openmpi-2.1.0rc4
Hi Akshay, Would it possible for you to provide the source to reproduce the issue? Yes, I've appended the file. Kind regards Siegmar Thanks On Tue, Mar 21, 2017 at 9:52 AM, Sylvain Jeaugey> wrote: Hi Siegmar, I think this "NVIDIA : ..." error message comes from the fact that you add CUDA includes in the C*FLAGS. If you just use --with-cuda, Open MPI will compile with CUDA support, but hwloc will not find CUDA and that will be fine. However, setting CUDA in CFLAGS will make hwloc find CUDA, compile CUDA support (which is not needed) and then NVML will show this error message when not run on a machine with CUDA devices. I guess gcc picks the environment variable, while cc does not hence the different behavior. So again, there is no need to add all those CUDA includes, --with-cuda is enough. About the opal_list_remove_item, we'll try to reproduce the issue and see where it comes from. Sylvain On 03/21/2017 12:38 AM, Siegmar Gross wrote: Hi, I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get once more a warning about a missing item for one of my small programs (it doesn't matter if I use my cc or gcc version). My gcc version also displays the message "NVIDIA: no NVIDIA devices found" for the server without NVIDIA devices (I don't get the message for my cc version). I used the following commands to build the package (${SYSTEM_ENV} is Linux and ${MACHINE_ENV} is x86_64). mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc cd openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc ../openmpi-2.1.0rc4/configure \ --prefix=/usr/local/openmpi-2.1.0_64_cc \ --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/ lib64" \ CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt -I/usr/local/include -I/usr/local/cuda/include" \ CXXFLAGS="-m64 -I/usr/local/include -I/usr/local/cuda/include" \ FCFLAGS="-m64" \ CPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \ CXXCPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \ --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ --with-cuda=/usr/local/cuda \ --with-valgrind=/usr/local/valgrind \ --enable-mpi-thread-multiple \ --with-hwloc=internal \ --without-verbs \ --with-wrapper-cflags="-m64 -mt" \ --with-wrapper-cxxflags="-m64" \ --with-wrapper-fcflags="-m64" \ --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc rm -r /usr/local/openmpi-2.1.0_64_cc.old mv /usr/local/openmpi-2.1.0_64_cc /usr/local/openmpi-2.1.0_64_cc.old make install |& tee log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc make check |& tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc Sometimes everything works as expected. loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm Parent process 0: I create 2 slave processes Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1 COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:0 Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:1 Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:2 More often I get a warning. loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm Parent process 0: I create 2 slave processes Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1 COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:0 Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2
Re: [OMPI users] "Warning :: opal_list_remove_item" with openmpi-2.1.0rc4
If you installed CUDA libraries and includes in /usr, then it's not surprising hwloc finds them even without defining CFLAGS. I'm just saying I think you won't get the error message if Open MPI finds CUDA but hwloc does not. On 03/21/2017 11:05 AM, Roland Fehrenbacher wrote: "SJ" == Sylvain Jeaugeywrites: Hi Silvain, I get the "NVIDIA : ..." run-time error messages just by compiling with "--with-cuda=/usr": ./configure --prefix=${prefix} \ --mandir=${prefix}/share/man \ --infodir=${prefix}/share/info \ --sysconfdir=/etc/openmpi/${VERSION} --with-devel-headers \ --disable-memchecker \ --disable-vt \ --with-tm --with-slurm --with-pmi --with-sge \ --with-cuda=/usr \ --with-io-romio-flags='--with-file-system=nfs+lustre' \ --with-cma --without-valgrind \ --enable-openib-connectx-xrc \ --enable-orterun-prefix-by-default \ --disable-java Roland SJ> Hi Siegmar, I think this "NVIDIA : ..." error message comes from SJ> the fact that you add CUDA includes in the C*FLAGS. If you just SJ> use --with-cuda, Open MPI will compile with CUDA support, but SJ> hwloc will not find CUDA and that will be fine. However, setting SJ> CUDA in CFLAGS will make hwloc find CUDA, compile CUDA support SJ> (which is not needed) and then NVML will show this error message SJ> when not run on a machine with CUDA devices. SJ> I guess gcc picks the environment variable, while cc does not SJ> hence the different behavior. So again, there is no need to add SJ> all those CUDA includes, --with-cuda is enough. SJ> About the opal_list_remove_item, we'll try to reproduce the SJ> issue and see where it comes from. SJ> Sylvain SJ> On 03/21/2017 12:38 AM, Siegmar Gross wrote: >> Hi, >> >> I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise >> Server >> 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get >> once >> more a warning about a missing item for one of my small programs >> (it doesn't matter if I use my cc or gcc version). My gcc version >> also displays the message "NVIDIA: no NVIDIA devices found" for >> the server without NVIDIA devices (I don't get the message for my >> cc version). I used the following commands to build the package >> (${SYSTEM_ENV} is Linux and ${MACHINE_ENV} is x86_64). >> >> >> mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc cd >> openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc >> >> ../openmpi-2.1.0rc4/configure \ >> --prefix=/usr/local/openmpi-2.1.0_64_cc \ >> --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \ >> --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ >> --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ >> JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z >> -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/ lib64" \ >> CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt -I/usr/local/include >> -I/usr/local/cuda/include" \ CXXFLAGS="-m64 -I/usr/local/include >> -I/usr/local/cuda/include" \ FCFLAGS="-m64" \ CPP="cpp >> -I/usr/local/include -I/usr/local/cuda/include" \ CXXCPP="cpp >> -I/usr/local/include -I/usr/local/cuda/include" \ >> --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ >> --with-cuda=/usr/local/cuda \ --with-valgrind=/usr/local/valgrind >> \ --enable-mpi-thread-multiple \ --with-hwloc=internal \ >> --without-verbs \ --with-wrapper-cflags="-m64 -mt" \ >> --with-wrapper-cxxflags="-m64" \ --with-wrapper-fcflags="-m64" \ >> --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee >> log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc >> >> make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc rm -r >> /usr/local/openmpi-2.1.0_64_cc.old mv >> /usr/local/openmpi-2.1.0_64_cc /usr/local/openmpi-2.1.0_64_cc.old >> make install |& tee >> log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc make check |& tee >> log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc >> >> >> Sometimes everything works as expected. >> >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 >> spawn_intra_comm Parent process 0: I create 2 slave processes >> >> Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1 >> COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES >> ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in >> COMM_ALL_PROCESSES: 0 >> >> Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2 >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 1 >> >> Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2 >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 2 >> >> >> >> More often I get a warning. >> >> loki spawn 144 mpiexec -np 1 --host
Re: [OMPI users] "Warning :: opal_list_remove_item" with openmpi-2.1.0rc4
> "SJ" == Sylvain Jeaugeywrites: Hi Silvain, I get the "NVIDIA : ..." run-time error messages just by compiling with "--with-cuda=/usr": ./configure --prefix=${prefix} \ --mandir=${prefix}/share/man \ --infodir=${prefix}/share/info \ --sysconfdir=/etc/openmpi/${VERSION} --with-devel-headers \ --disable-memchecker \ --disable-vt \ --with-tm --with-slurm --with-pmi --with-sge \ --with-cuda=/usr \ --with-io-romio-flags='--with-file-system=nfs+lustre' \ --with-cma --without-valgrind \ --enable-openib-connectx-xrc \ --enable-orterun-prefix-by-default \ --disable-java Roland SJ> Hi Siegmar, I think this "NVIDIA : ..." error message comes from SJ> the fact that you add CUDA includes in the C*FLAGS. If you just SJ> use --with-cuda, Open MPI will compile with CUDA support, but SJ> hwloc will not find CUDA and that will be fine. However, setting SJ> CUDA in CFLAGS will make hwloc find CUDA, compile CUDA support SJ> (which is not needed) and then NVML will show this error message SJ> when not run on a machine with CUDA devices. SJ> I guess gcc picks the environment variable, while cc does not SJ> hence the different behavior. So again, there is no need to add SJ> all those CUDA includes, --with-cuda is enough. SJ> About the opal_list_remove_item, we'll try to reproduce the SJ> issue and see where it comes from. SJ> Sylvain SJ> On 03/21/2017 12:38 AM, Siegmar Gross wrote: >> Hi, >> >> I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise >> Server >> 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get >> once >> more a warning about a missing item for one of my small programs >> (it doesn't matter if I use my cc or gcc version). My gcc version >> also displays the message "NVIDIA: no NVIDIA devices found" for >> the server without NVIDIA devices (I don't get the message for my >> cc version). I used the following commands to build the package >> (${SYSTEM_ENV} is Linux and ${MACHINE_ENV} is x86_64). >> >> >> mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc cd >> openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc >> >> ../openmpi-2.1.0rc4/configure \ >> --prefix=/usr/local/openmpi-2.1.0_64_cc \ >> --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \ >> --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ >> --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ >> JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z >> -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/ lib64" \ >> CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt -I/usr/local/include >> -I/usr/local/cuda/include" \ CXXFLAGS="-m64 -I/usr/local/include >> -I/usr/local/cuda/include" \ FCFLAGS="-m64" \ CPP="cpp >> -I/usr/local/include -I/usr/local/cuda/include" \ CXXCPP="cpp >> -I/usr/local/include -I/usr/local/cuda/include" \ >> --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ >> --with-cuda=/usr/local/cuda \ --with-valgrind=/usr/local/valgrind >> \ --enable-mpi-thread-multiple \ --with-hwloc=internal \ >> --without-verbs \ --with-wrapper-cflags="-m64 -mt" \ >> --with-wrapper-cxxflags="-m64" \ --with-wrapper-fcflags="-m64" \ >> --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee >> log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc >> >> make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc rm -r >> /usr/local/openmpi-2.1.0_64_cc.old mv >> /usr/local/openmpi-2.1.0_64_cc /usr/local/openmpi-2.1.0_64_cc.old >> make install |& tee >> log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc make check |& tee >> log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc >> >> >> Sometimes everything works as expected. >> >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 >> spawn_intra_comm Parent process 0: I create 2 slave processes >> >> Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1 >> COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES >> ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in >> COMM_ALL_PROCESSES: 0 >> >> Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2 >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 1 >> >> Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2 >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 2 >> >> >> >> More often I get a warning. >> >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 >> spawn_intra_comm Parent process 0: I create 2 slave processes >> >> Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1 >> COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES >> ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in >> COMM_ALL_PROCESSES: 0 >> >> Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2
Re: [OMPI users] "Warning :: opal_list_remove_item" with openmpi-2.1.0rc4
Hi Siegmar, Would it possible for you to provide the source to reproduce the issue? Thanks On Tue, Mar 21, 2017 at 9:52 AM, Sylvain Jeaugeywrote: > Hi Siegmar, > > I think this "NVIDIA : ..." error message comes from the fact that you add > CUDA includes in the C*FLAGS. If you just use --with-cuda, Open MPI will > compile with CUDA support, but hwloc will not find CUDA and that will be > fine. However, setting CUDA in CFLAGS will make hwloc find CUDA, compile > CUDA support (which is not needed) and then NVML will show this error > message when not run on a machine with CUDA devices. > > I guess gcc picks the environment variable, while cc does not hence the > different behavior. So again, there is no need to add all those CUDA > includes, --with-cuda is enough. > > About the opal_list_remove_item, we'll try to reproduce the issue and see > where it comes from. > > Sylvain > > > On 03/21/2017 12:38 AM, Siegmar Gross wrote: > >> Hi, >> >> I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise Server >> 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get once >> more a warning about a missing item for one of my small programs (it >> doesn't matter if I use my cc or gcc version). My gcc version also >> displays the message "NVIDIA: no NVIDIA devices found" for the server >> without NVIDIA devices (I don't get the message for my cc version). >> I used the following commands to build the package (${SYSTEM_ENV} >> is Linux and ${MACHINE_ENV} is x86_64). >> >> >> mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc >> cd openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc >> >> ../openmpi-2.1.0rc4/configure \ >> --prefix=/usr/local/openmpi-2.1.0_64_cc \ >> --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \ >> --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ >> --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ >> JAVA_HOME=/usr/local/jdk1.8.0_66 \ >> LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack -L/usr/local/lib64 >> -L/usr/local/cuda/ >> lib64" \ >> CC="cc" CXX="CC" FC="f95" \ >> CFLAGS="-m64 -mt -I/usr/local/include -I/usr/local/cuda/include" \ >> CXXFLAGS="-m64 -I/usr/local/include -I/usr/local/cuda/include" \ >> FCFLAGS="-m64" \ >> CPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \ >> CXXCPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \ >> --enable-mpi-cxx \ >> --enable-cxx-exceptions \ >> --enable-mpi-java \ >> --with-cuda=/usr/local/cuda \ >> --with-valgrind=/usr/local/valgrind \ >> --enable-mpi-thread-multiple \ >> --with-hwloc=internal \ >> --without-verbs \ >> --with-wrapper-cflags="-m64 -mt" \ >> --with-wrapper-cxxflags="-m64" \ >> --with-wrapper-fcflags="-m64" \ >> --with-wrapper-ldflags="-mt" \ >> --enable-debug \ >> |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc >> >> make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc >> rm -r /usr/local/openmpi-2.1.0_64_cc.old >> mv /usr/local/openmpi-2.1.0_64_cc /usr/local/openmpi-2.1.0_64_cc.old >> make install |& tee log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc >> make check |& tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc >> >> >> Sometimes everything works as expected. >> >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm >> Parent process 0: I create 2 slave processes >> >> Parent process 0 running on loki >> MPI_COMM_WORLD ntasks: 1 >> COMM_CHILD_PROCESSES ntasks_local: 1 >> COMM_CHILD_PROCESSES ntasks_remote: 2 >> COMM_ALL_PROCESSES ntasks: 3 >> mytid in COMM_ALL_PROCESSES:0 >> >> Child process 0 running on nfs1 >> MPI_COMM_WORLD ntasks: 2 >> COMM_ALL_PROCESSES ntasks: 3 >> mytid in COMM_ALL_PROCESSES:1 >> >> Child process 1 running on nfs2 >> MPI_COMM_WORLD ntasks: 2 >> COMM_ALL_PROCESSES ntasks: 3 >> mytid in COMM_ALL_PROCESSES:2 >> >> >> >> More often I get a warning. >> >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm >> Parent process 0: I create 2 slave processes >> >> Parent process 0 running on loki >> MPI_COMM_WORLD ntasks: 1 >> COMM_CHILD_PROCESSES ntasks_local: 1 >> COMM_CHILD_PROCESSES ntasks_remote: 2 >> COMM_ALL_PROCESSES ntasks: 3 >> mytid in COMM_ALL_PROCESSES:0 >> >> Child process 0 running on nfs1 >> MPI_COMM_WORLD ntasks: 2 >> COMM_ALL_PROCESSES ntasks: 3 >> >> Child process 1 running on nfs2 >> MPI_COMM_WORLD ntasks: 2 >> COMM_ALL_PROCESSES ntasks: 3 >> mytid in COMM_ALL_PROCESSES:2 >> mytid in COMM_ALL_PROCESSES:1 >> Warning :: opal_list_remove_item - the item 0x25a76f0 is not on the list >> 0x7f96db515998 >> loki spawn 144 >> >> >> >> I would be grateful, if somebody can fix the problem. Do you need anything >> else? Thank you very much for any help in advance. >> >> >> Kind regards >> >> Siegmar >>
Re: [OMPI users] "Warning :: opal_list_remove_item" with openmpi-2.1.0rc4
Hi Siegmar, I think this "NVIDIA : ..." error message comes from the fact that you add CUDA includes in the C*FLAGS. If you just use --with-cuda, Open MPI will compile with CUDA support, but hwloc will not find CUDA and that will be fine. However, setting CUDA in CFLAGS will make hwloc find CUDA, compile CUDA support (which is not needed) and then NVML will show this error message when not run on a machine with CUDA devices. I guess gcc picks the environment variable, while cc does not hence the different behavior. So again, there is no need to add all those CUDA includes, --with-cuda is enough. About the opal_list_remove_item, we'll try to reproduce the issue and see where it comes from. Sylvain On 03/21/2017 12:38 AM, Siegmar Gross wrote: Hi, I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get once more a warning about a missing item for one of my small programs (it doesn't matter if I use my cc or gcc version). My gcc version also displays the message "NVIDIA: no NVIDIA devices found" for the server without NVIDIA devices (I don't get the message for my cc version). I used the following commands to build the package (${SYSTEM_ENV} is Linux and ${MACHINE_ENV} is x86_64). mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc cd openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc ../openmpi-2.1.0rc4/configure \ --prefix=/usr/local/openmpi-2.1.0_64_cc \ --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/ lib64" \ CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt -I/usr/local/include -I/usr/local/cuda/include" \ CXXFLAGS="-m64 -I/usr/local/include -I/usr/local/cuda/include" \ FCFLAGS="-m64" \ CPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \ CXXCPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \ --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ --with-cuda=/usr/local/cuda \ --with-valgrind=/usr/local/valgrind \ --enable-mpi-thread-multiple \ --with-hwloc=internal \ --without-verbs \ --with-wrapper-cflags="-m64 -mt" \ --with-wrapper-cxxflags="-m64" \ --with-wrapper-fcflags="-m64" \ --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc rm -r /usr/local/openmpi-2.1.0_64_cc.old mv /usr/local/openmpi-2.1.0_64_cc /usr/local/openmpi-2.1.0_64_cc.old make install |& tee log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc make check |& tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc Sometimes everything works as expected. loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm Parent process 0: I create 2 slave processes Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1 COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:0 Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:1 Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:2 More often I get a warning. loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm Parent process 0: I create 2 slave processes Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1 COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:0 Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:2 mytid in COMM_ALL_PROCESSES:1 Warning :: opal_list_remove_item - the item 0x25a76f0 is not on the list 0x7f96db515998 loki spawn 144 I would be grateful, if somebody can fix the problem. Do you need anything else? Thank you very much for any help in advance. Kind regards Siegmar ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and
[OMPI users] "Warning :: opal_list_remove_item" with openmpi-2.1.0rc4
Hi, I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get once more a warning about a missing item for one of my small programs (it doesn't matter if I use my cc or gcc version). My gcc version also displays the message "NVIDIA: no NVIDIA devices found" for the server without NVIDIA devices (I don't get the message for my cc version). I used the following commands to build the package (${SYSTEM_ENV} is Linux and ${MACHINE_ENV} is x86_64). mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc cd openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc ../openmpi-2.1.0rc4/configure \ --prefix=/usr/local/openmpi-2.1.0_64_cc \ --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/ lib64" \ CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt -I/usr/local/include -I/usr/local/cuda/include" \ CXXFLAGS="-m64 -I/usr/local/include -I/usr/local/cuda/include" \ FCFLAGS="-m64" \ CPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \ CXXCPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \ --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ --with-cuda=/usr/local/cuda \ --with-valgrind=/usr/local/valgrind \ --enable-mpi-thread-multiple \ --with-hwloc=internal \ --without-verbs \ --with-wrapper-cflags="-m64 -mt" \ --with-wrapper-cxxflags="-m64" \ --with-wrapper-fcflags="-m64" \ --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc rm -r /usr/local/openmpi-2.1.0_64_cc.old mv /usr/local/openmpi-2.1.0_64_cc /usr/local/openmpi-2.1.0_64_cc.old make install |& tee log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc make check |& tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc Sometimes everything works as expected. loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm Parent process 0: I create 2 slave processes Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1 COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:0 Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:1 Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:2 More often I get a warning. loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm Parent process 0: I create 2 slave processes Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1 COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:0 Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:2 mytid in COMM_ALL_PROCESSES:1 Warning :: opal_list_remove_item - the item 0x25a76f0 is not on the list 0x7f96db515998 loki spawn 144 I would be grateful, if somebody can fix the problem. Do you need anything else? Thank you very much for any help in advance. Kind regards Siegmar ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users