[OMPI devel] Build fails for Git versions (master and v4.0.x)

2019-07-31 Thread Jan Bierbaum via devel
Hello!

After I ran into problems with a self-compiled OpenMPI 4.0.1 and CP2K
('make test' fails for the latter and also a couple of input files are
dysfunctional with the MPI version), I though it might help to give the
Git version of OpenMPI a try. However, I can build neither 'v4.0.x'
(673ddae) nor 'master' (7b7ad5e). Both fail during the linking of
'libopen-pal.so'. Is this expected?

The error for 'master' is ('v4.0.x' shows a different line number in the
Makefile):

> make[2]: Entering directory 
> '/dev/shm/Setup/build/openmpi-git/opal/tools/wrappers'
>   CC   opal_wrapper.o
>   CCLD opal_wrapper
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_crs_none_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_reachable_netlink_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_pstat_linux_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_shmem_posix_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_btl_tcp_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_patcher_overwrite_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_btl_uct_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_allocator_bucket_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_shmem_sysv_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_pmix_isolated_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_btl_vader_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_shmem_mmap_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_pmix_pmix4x_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_btl_self_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_allocator_basic_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_rcache_grdma_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_mpool_hugepage_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_btl_sm_component'
> ../../../opal/.libs/libopen-pal.so: undefined reference to 
> `mca_reachable_weighted_component'
> collect2: error: ld returned 1 exit status
> Makefile:1836: recipe for target 'opal_wrapper' failed


Software used:

- automake (GNU automake) 1.15
- m4 (GNU M4) 1.4.18
- autoconf (GNU Autoconf) 2.69
- libtoolize (GNU libtool) 2.4.6
- flex 2.6.1
- gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
- UCT version=1.5.1 revision 7e67a4b


Build process:

> $ git clone … ompi; git checkout $BRANCH
> $ cd ompi
> $ ./autogen.pl &> auto.log
> $ ./configure --prefix=$DIR --disable-timing --disable-mpi-cxx 
> --enable-shared --enable-weak-symbols --enable-binaries --enable-mpi 
> --enable-mpi-interface-warning --enable-mpi-fortran --enable-c11-atomics 
> --enable-builtin-atomics --enable-fast-install --enable-mpi1-compatibility 
> --without-cuda --without-verbs --with-ucx=${PATH_TO_UCX} --disable-debug 
> --disable-mem-debug &> configure.log
> $ make -j 8 &> make.log


I also tried a serial build to avoid potential races in the build
process but to no avail. The respective log files are attached in
compressed form and, for your convenience, also available online

auto.log -> https://pastebin.com/2w5RDNdc
configure.log -> https://pastebin.com/chWtk4pw
make.log -> https://pastebin.com/kYWscGYD


As a side question: Are there any functionality tests for OpenMPI in the
sense that they check whether communication works properly, i.e. no lost
messages, message contents unchanged, …?


Regards, Jan


ompi-master.tar.bz2
Description: Binary data
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Build fails for Git versions (master and v4.0.x)

2019-07-31 Thread Jan Bierbaum via devel
On 31.07.19 22:12, Jeff Squyres (jsquyres) wrote:
> Just to make sure you're not dealing with anything left over from and old / 
> stale build:
> 
> cd top-of-source-tree
> git clean -dfx
> ./autogen.pl |& tee auto.out
> ./configure ... |& tee config.out
> make V=1 -j 8 |& tee make.out
Thanks a lot. This completely fixed those build problems. I used 'git
clean -df' (without x) before and could have sworn I also tried a fresh
clone … well, obviously I hadn't.

Any suggestions for my question about a test suite for (Open)MPI that
also covers correct communication? It would be great to have some way to
check my setup “layer by layer”.


Regards, Jan
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Build fails for Git versions (master and v4.0.x)

2019-07-31 Thread Jan Bierbaum via devel
On 31.07.19 23:54, Jeff Squyres (jsquyres) wrote:
> We don't really have any test suites that just test, for example, the
> BTLs.  We usually rely on the usual MPI benchmarks and test suites
> (e.g., the Intel MPI benchmarks have a correctness-checking mode).
I guess I'll also move in this direction. Thanks again for your help!


Regards, Jan
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Debug options break build

2019-09-19 Thread Jan Bierbaum via devel
Switching on various debug options, my builds of OpenMPI with UCX fail 
(and his time I made sure it's not due to my own stupidity … I hope).


The problematic options and respective compiler errors are '--enable-timing'


Making all in mca/ess/pmi
make[2]: Entering directory '/dev/shm/openmpi-4.0.2rc2/build/orte/mca/ess/pmi'
  CC   ess_pmi_component.lo
  CC   ess_pmi_module.lo
In file included from ../../../../../orte/mca/ess/pmi/ess_pmi_module.c:57:
../../../../../orte/mca/ess/pmi/ess_pmi_module.c: In function ‘rte_init’:
../../../../../orte/mca/ess/pmi/ess_pmi_module.c:467:26: error: 
‘ess_base_setup’ undeclared (first use in this function); did you mean 
‘event_base_set’?
 OPAL_TIMING_ENV_NEXT(ess_base_setup, "state_framework_open");
  ^~
../../../../../opal/util/timings.h:103:13: note: in definition of macro 
‘OPAL_TIMING_ENV_NEXT’
 if( h->enabled ){  
   \
 ^
../../../../../orte/mca/ess/pmi/ess_pmi_module.c:467:26: note: each undeclared 
identifier is reported only once for each function it appears in
 OPAL_TIMING_ENV_NEXT(ess_base_setup, "state_framework_open");
  ^~
../../../../../opal/util/timings.h:103:13: note: in definition of macro 
‘OPAL_TIMING_ENV_NEXT’
 if( h->enabled ){  
   \
 ^
make[2]: *** [Makefile:1857: ess_pmi_module.lo] Error 1
make[2]: Leaving directory '/dev/shm/openmpi-4.0.2rc2/build/orte/mca/ess/pmi'



and '--enable-mem-debug'


Making all in profile
make[3]: Entering directory '/dev/shm/ompi/build/oshmem/shmem/c/profile'
  LN_S pshmem_init.c
  LN_S pshmem_finalize.c

[…]

  CC   pshmem_put.lo
  CC   pshmem_g.lo
pshmem_free.c: In function ‘_shfree’:
pshmem_free.c:65:39: error: macro "free" passed 2 arguments, but takes just 1
 rc = s->allocator->free(s, ptr);
   ^
pshmem_free.c:65:12: warning: assignment to ‘int’ from ‘int (*)(map_segment_t 
*, void *)’ {aka ‘int (*)(struct map_segment *, void *)’} makes integer from 
pointer without a cast [-Wint-conversion]
 rc = s->allocator->free(s, ptr);
^
make[3]: *** [Makefile:1964: pshmem_free.lo] Error 1
make[3]: *** Waiting for unfinished jobs
pshmem_realloc.c: In function ‘_shrealloc’:
pshmem_realloc.c:59:56: error: macro "realloc" passed 4 arguments, but takes 
just 2
 rc = s->allocator->realloc(s, size, ptr, &pBuff);
^
pshmem_realloc.c:59:12: warning: assignment to ‘int’ from ‘int 
(*)(map_segment_t *, size_t,  void *, void **)’ {aka ‘int (*)(struct 
map_segment *, long unsigned int,  void *, void **)’} makes integer from 
pointer without a cast [-Wint-conversion]
 rc = s->allocator->realloc(s, size, ptr, &pBuff);
^
make[3]: *** [Makefile:1964: pshmem_realloc.lo] Error 1
make[3]: Leaving directory '/dev/shm/ompi/build/oshmem/shmem/c/profile'



Preparing this report, I just noticed that the '--enable-timing' bug has 
already been fixed on 'master' with commit 
8e7d874e14a5485dceff836419e36b6b24a66f48. Would be nice if this could 
make it into the 'v4.0.x' branch.



Software used:

- automake (GNU automake) 1.16.1
- m4 (GNU M4) 1.4.18
- autoconf (GNU Autoconf) 2.69
- libtoolize (GNU libtool) 2.4.6
- flex 2.6.4
- gcc (Debian 8.3.0-6) 8.3.0
- UCT version=1.6.1-rc2


Build process:


$ git clone https://github.com/open-mpi/ompi.git
$ cd ompi
$ ./autogen.pl &> auto.log
$ ./configure --prefix=${DIR} --with-ucx=${PATH_TO_UCX} --enable-mem-debug &> 
configure.log
$ make -j 8 &> make.log



The respective log files are attached in compressed form and, for your 
convenience, also available online


auto.log   -> https://pastebin.com/cysbi3Vx
configure.log  -> https://pastebin.com/rEcngh6D
make.log   -> https://pastebin.com/HMETcSVA



Regards, Jan


logs.tar.bz2
Description: Binary data
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Debug options break build

2019-09-19 Thread Jan Bierbaum via devel

On 19.09.19 22:40, Jeff Squyres (jsquyres) wrote:
I am unable to reproduce these issues on master HEAD; assumedly they 
have something to do with UCX...?


I filed https://github.com/open-mpi/ompi/issues/6995 to track the issue.

Yes, builds using '--enable-mem-debug' fail only when they also involve
UCX. Sorry for not pointing that out explicitly.

'--enable-timing' breaks regardless of UCX. As mentioned this has
already been fixed in 'master' (commit
8e7d874e14a5485dceff836419e36b6b24a66f48). It would be great to also
have this fix in 'v4.0.x'.


Regards, Jan
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel