r27078 (ML collective component) broke some Solaris OMPI builds.
1) In ompi/mca/coll/ml/coll_ml_lmngr.c
199 #ifdef HAVE_POSIX_MEMALIGN
200 if((errno = posix_memalign(&lmngr->base_addr,
201 lmngr->list_alignment,
202 lmngr->list_size * lmngr->list_block_size))
!= 0) {
203 ML_ERROR(("Failed to allocate memory: %s [%d]", errno,
strerror(errno)));
204 return OMPI_ERROR;
205 }
206 #else
207 lmngr->base_addr =
208 malloc(lmngr->list_size * lmngr->list_block_size +
lmngr->list_alignment);
209 if(NULL == lmngr->base_addr) {
210 ML_ERROR(("Failed to allocate memory: %s [%d]", errno,
strerror(errno)));
211 return OMPI_ERROR;
212 }
213
214 lmngr->base_addr =
(void*)OPAL_ALIGN((uintptr_t)lmngr->base_addr,
215 lmngr->list_align, uintptr_t);
216 #endif
The "#else" code path has multiple problems -- specifically at the
statement on lines 214-215:
- OPAL_ALIGN needs to be defined (e.g., #include "opal/align.h")
- uintptr_t need to be defined (e.g., #include "opal_stdint.h")
- list_align should be list_alignment
I could fix, but need help with...
2) http://www.open-mpi.org/mtt/index.php?do_redir=2089 Somehow,
coll_ml is getting pulled into libmpi.so. E.g., this doesn't look right:
% nm ompi/.libs/libmpi.so | grep mca_coll_ml
[13161] | 2556704| 172|FUNC |LOCL |0 |11
|mca_coll_ml_alloc_op_prog_single_frag_dag
[13171] | 2555488| 344|FUNC |LOCL |0 |11
|mca_coll_ml_buffer_recycling
[13173] | 2555392| 92|FUNC |LOCL |0 |11 |mca_coll_ml_err
[23992] | 0| 0|FUNC |GLOB |0 |UNDEF
|mca_coll_ml_memsync_intra
The UNDEF is causing a problem, but I'm guessing all that mca_coll_ml_
stuff shouldn't be in there at all in the first place. This is on one
Solaris system, while another doesn't see the problem and builds fine.