Hi Ralph,

Question about the MPI-RTE interface change in r29931.  The change was not
reflected in the "ompi/mca/rte/rte.h" file.

I'm curious how the newly added "struct ompi_proc_t" relates to the "struct ompi_process_info_t" that is described in the "rte.h" file?

I understand the general motivation for the API change but it is less clear
to me how the information previously defined in the header changes (or does
not change)?


  Thomas Naughton                                      naught...@ornl.gov
  Research Associate                                   (865) 576-4184

On Mon, 16 Dec 2013, svn-commit-mai...@open-mpi.org wrote:

Author: rhc (Ralph Castain)
Date: 2013-12-16 22:26:00 EST (Mon, 16 Dec 2013)
New Revision: 29931
URL: https://svn.open-mpi.org/trac/ompi/changeset/29931

Revert r29917 and replace it with a fix that resolves the thread deadlock while 
retaining the desired debug info. In an earlier commit, we had changed the 
modex accordingly:

* automatically retrieve the hostname (and all RTE info) for all procs during 
MPI_Init if nprocs < cutoff

* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon 
the first call to modex_recv for that proc. This would provide the hostname for 
debugging purposes as we only report errors on messages, and so we must have 
called modex_recv to get the endpoint info

* BTLs are not to call modex_recv until they need the endpoint info for first 
message - i.e., not during add_procs so we don't call it for every process in 
the job, but only those with whom we communicate

My understanding is that only some BTLs have been modified to meet that third 
requirement, but those include the Cray ones where jobs are big enough that 
launch times were becoming an issue. Other BTLs would hopefully be modified as 
time went on and interest in using them at scale arose. Meantime, those BTLs 
would call modex_recv on every proc, and we would therefore be no worse than 
the prior behavior.

This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of 
the ompi_process_name_t for the proc so that the hostname can be easily 
inserted. I have advised the ORNL folks of the change.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock

Text files modified:
  trunk/ompi/mca/rte/orte/rte_orte.h        |     7 ++++---
  trunk/ompi/mca/rte/orte/rte_orte_module.c |    27 ++++++++++++++++++---------
  trunk/ompi/proc/proc.c                    |    26 ++++++++++++++++++++++----
  trunk/ompi/runtime/ompi_module_exchange.c |    10 +++++-----
  4 files changed, 49 insertions(+), 21 deletions(-)

Reply via email to