The issue #2 was fixed in r27178.
Paul - Thanks for help !!!

Regards,

Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Aug 21, 2012, at 11:36 AM, Eugene Loh wrote:

r27078 (ML collective component) broke some Solaris OMPI builds.

1)  In ompi/mca/coll/ml/coll_ml_lmngr.c
    199 #ifdef HAVE_POSIX_MEMALIGN
    200     if((errno = posix_memalign(&lmngr->base_addr,
    201                     lmngr->list_alignment,
    202                     lmngr->list_size * lmngr->list_block_size))
!= 0) {
    203         ML_ERROR(("Failed to allocate memory: %s [%d]", errno,
strerror(errno)));
    204         return OMPI_ERROR;
    205     }
    206 #else
    207     lmngr->base_addr =
    208         malloc(lmngr->list_size * lmngr->list_block_size +
lmngr->list_alignment);
    209     if(NULL == lmngr->base_addr) {
    210         ML_ERROR(("Failed to allocate memory: %s [%d]", errno,
strerror(errno)));
    211         return OMPI_ERROR;
    212     }
    213
    214     lmngr->base_addr =
(void*)OPAL_ALIGN((uintptr_t)lmngr->base_addr,
    215             lmngr->list_align, uintptr_t);
    216 #endif
   The "#else" code path has multiple problems -- specifically at the
statement on lines 214-215:
   - OPAL_ALIGN needs to be defined (e.g., #include "opal/align.h")
   - uintptr_t need to be defined (e.g., #include "opal_stdint.h")
   - list_align should be list_alignment

I could fix, but need help with...

2)  http://www.open-mpi.org/mtt/index.php?do_redir=2089  Somehow,
coll_ml is getting pulled into libmpi.so.  E.g., this doesn't look right:

   % nm ompi/.libs/libmpi.so | grep mca_coll_ml
   [13161] |   2556704|       172|FUNC |LOCL |0    |11
|mca_coll_ml_alloc_op_prog_single_frag_dag
   [13171] |   2555488|       344|FUNC |LOCL |0    |11
|mca_coll_ml_buffer_recycling
   [13173] |   2555392|        92|FUNC |LOCL |0    |11     |mca_coll_ml_err
   [23992] |         0|         0|FUNC |GLOB |0    |UNDEF
|mca_coll_ml_memsync_intra

The UNDEF is causing a problem, but I'm guessing all that mca_coll_ml_
stuff shouldn't be in there at all in the first place.  This is on one
Solaris system, while another doesn't see the problem and builds fine.
_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to