Thanks for your response, George.
Just confirming that this should be okay to use iteratively is a huge help.
After further investigation, this only seems to occur on my test workstation
with the following …
Open MPI repo revision: v4.0.2
Open MPI release date: Oct 07, 2019
Open RTE: 4.0.2
Configured architecture: x86_64-apple-darwin19.2.0
g++ --version
Apple clang version 11.0.3 (clang-1103.0.32.29)
Target: x86_64-apple-darwin19.2.0
Thread model: posix
I am not currently able to duplicate the errors on an actual Linux cluster with
OpenMPI 4.0.2.
So, this is probably insignificant for most production use--but in case you are
interested, from what I can tell, this code block should reproduce the error
for OpenMPI\Clang...
int main(int argc, const char * argv[]) {
MPI_Init(NULL, NULL);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
for(int run=1; run<=30; run++) {
MPI_Comm topology;
const int send[1] = { world_rank == world_size-1 ? 0 : world_rank+1 };
const int receive[1] = { world_rank > 0 ? world_rank-1 : world_size-1 };
const int degrees[1] = { 1 };
const int weights[1] = { 1 };
printf("rank %d send -> %d\r\n", world_rank, send[0]);
printf("rank %d receive -> %d\r\n", world_rank, receive[0]);
MPI_Comm oldcomm = MPI_COMM_WORLD;
MPI_Dist_graph_create(oldcomm, 1, send, degrees, receive, weights,
MPI_INFO_NULL, 1, &topology);
}
}
Thanks,
-Bradley
On Apr 6, 2020, at 10:36 AM, George Bosilca
<[email protected]<mailto:[email protected]>> wrote:
Bradley,
You call then through a blocking MPI function, the operation is therefore
completed by the time you return from the MPI call. So, short story you should
be safe calling the dost_graph_create in a loop.
The segfault indicates a memory issue with some of the internals of the
treematch. Do you have an example that reproduces this issue so that I can take
a look and fix it ?
Thanks,
George.
On Mon, Apr 6, 2020 at 11:31 AM Bradley Morgan via devel
<[email protected]<mailto:[email protected]>> wrote:
Hello OMPI Developers and Community,
I am interested in investigating dynamic runtime optimization of MPI topologies
using an evolutionary approach.
My initial testing is resulting in segfaults\sigabrts when I attempt to
iteratively create a new communicator with reordering enabled, e.g…
[88881] Signal: Segmentation fault: 11 (11)
[88881] Signal code: Address not mapped (1)
[88881] Failing at address: 0x0
[88881] [ 0] 0 libsystem_platform.dylib 0x00007fff69dff42d
_sigtramp + 29
[88881] [ 1] 0 mpi_island_model_ea 0x0000000100000032
mpi_island_model_ea + 50
[88881] [ 2] 0 mca_topo_treematch.so 0x0000000105ddcbf9
free_list_child + 41
[88881] [ 3] 0 mca_topo_treematch.so 0x0000000105ddcbf9
free_list_child + 41
[88881] [ 4] 0 mca_topo_treematch.so 0x0000000105ddcd1f
tm_free_tree + 47
[88881] [ 5] 0 mca_topo_treematch.so 0x0000000105dd6967
mca_topo_treematch_dist_graph_create + 9479
[88881] [ 6] 0 libmpi.40.dylib 0x00000001001992e0
MPI_Dist_graph_create + 640
[88881] [ 7] 0 mpi_island_model_ea 0x00000001000050c7 main +
1831
I see in some documentation where MPI_Dist_graph_create is not interrupt safe,
which I interpret to mean it is not really designed for iterative use without
some sort of safeguard to keep it from overlapping.
I guess my question is, are the topology mapping functions really meant to be
called in iteration, or are they meant for single use?
If you guys think this is something that might be possible, do you have any
suggestions for calling the topology mapping iteratively or any hints, docs,
etc. on what else might be going wrong here?
Thanks,
Bradley