https://github.com/open-mpi/ompi/pull/7547 fixes it and has an explanation as to why it wasn't catching us elsewhere in the MPI code
On Mar 20, 2020, at 9:22 AM, Ralph Castain via devel <devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > wrote: Odd - the topology object gets filled in during init, well before the fence (as it doesn't need the fence, being a purely local op). Let me take a look On Mar 20, 2020, at 9:15 AM, Barrett, Brian <bbarr...@amazon.com <mailto:bbarr...@amazon.com> > wrote: PMIx folks - When using mpirun for launching, it looks like opal_hwloc_topology isn't filled in at the point where we need the information (mtl_ofi_component_init()). This would end up being before the modex fence, since the goal is to figure out which address the process should publish. I'm not sure that makes a difference here, but wanted to figure out if this was expected and, if so, if we had options for getting the right data from PMIx early enough in the process. Sorry, this is part of the runtime changes I haven't been following closely enough. Brian -----Original Message----- From: devel <devel-boun...@lists.open-mpi.org <mailto:devel-boun...@lists.open-mpi.org> > on behalf of Ralph Castain via devel <devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > Reply-To: Open MPI Developers <devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > Date: Wednesday, March 18, 2020 at 2:08 PM To: "Zhang, William" <wilzh...@amazon.com <mailto:wilzh...@amazon.com> > Cc: Ralph Castain <r...@open-mpi.org <mailto:r...@open-mpi.org> >, OpenMPI Devel <devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > Subject: RE: [EXTERNAL] [OMPI devel] Add multi nic support for ofi MTL using hwloc Excellent - thanks! Now if only the OpenMP people would be so reasonable...sigh. On Mar 18, 2020, at 10:26 AM, Zhang, William <wilzh...@amazon.com <mailto:wilzh...@amazon.com> > wrote: Hello, We're getting the topology info using the opal_hwloc_topology object, we won't be doing our own discovery. William On 3/17/20, 11:54 PM, "devel on behalf of Ralph Castain via devel" <devel-boun...@lists.open-mpi.org <mailto:devel-boun...@lists.open-mpi.org> on behalf of devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > wrote: Hey folks I saw the referenced "new feature" on the v5 feature spreadsheet and wanted to ask a quick question. Is the OFI MTL going to be doing its own hwloc topology discovery for this feature? Or is it going to access the topology info via PMIx and the OPAL hwloc abstraction? I ask because we know that having every proc do its own topology discovery is a major problem on large-core systems (e.g., KNL or Power9). If OFI is going to do an hwloc discovery operation, then we need to ensure this doesn't happen unless specifically requested by a user willing to pay that price (and it was significant). Can someone from Amazon (as the item is assigned to them) please clarify? Ralph