Hi,
It looks like the rmda OSC component does not progress passive RMA operations at the target during calls to MPI_WIN_(UN)LOCK. As a sample case take a master-worker program where each worker writes to an entry in an array exposed in the master's window: MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); If (rank == 0) { // Master code MPI_Alloc_mem(size * sizeof(int), MPI_INFO_NULL, &array); MPI_Win_create(array, size * sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win); do { MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win); nonzeros = count non-zero elements of array MPI_Win_unlock(0, win); } while(nonzeros < size-1); MPI_Win_free(&win); MPI_Free_mem(array); } else { // Worker code int one = 1; MPI_Win_create(NULL, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win); // Postpone the RMA with a rank-specific time sleep(rank); MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win); MPI_Put(&one, 1, MPI_INT, 0, rank, 1, MPI_INT, win); MPI_Win_unlock(0, win); MPI_Win_free(&win); } Attached is a complete sample program. The program hangs when run with the default MCA settings: $ mpirun -n 3 ./rma.x [1379003818.571960] 0 workers checked in [1379003819.571317] Worker 1 acquired lock [1379003819.571374] Worker 1 unlocking the window [1379003820.571342] Worker 2 acquired lock [1379003820.571384] Worker 2 unlocking the window <hangs> On the other hand, it works as expected if pt2pt is forced: $ mpirun --mca osc pt2pt -n 3 ./rma.x | sort [1379003926.000442] 0 workers checked in [1379003926.998981] Worker 1 acquired lock [1379003926.999027] Worker 1 unlocking the window [1379003926.999076] Worker 1 synched [1379003926.999078] 1 workers checked in [1379003927.998917] Worker 2 acquired lock [1379003927.998940] Worker 2 unlocking the window [1379003927.998962] Worker 2 synched [1379003927.998964] 2 workers checked in [1379003927.998973] All workers checked in [1379003927.998996] Worker 1 done [1379003927.998996] Worker 2 done [1379003927.999099] Master finished All processes are started on the same host. Open MPI is 1.6.4 without progression thread. The output from ompi_info is attached. The same behaviour (hang with rdma, success with pt2pt) is observed when the tcp BTL is used and when all processes run on separate cluster nodes and talk via the openib BTL. Is this a bug in the rdma OSC component or does the sample program violate the MPI correctness requirements for RMA operations? Kind regards, Hristo -- Hristo Iliev, PhD - High Performance Computing Team RWTH Aachen University, Center for Computing and Communication Rechen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany)
Package: Open MPI pk224...@linuxbmc0601.rz.rwth-aachen.de Distribution Open MPI: 1.6.4 Open MPI SVN revision: r28081 Open MPI release date: Feb 19, 2013 Open RTE: 1.6.4 Open RTE SVN revision: r28081 Open RTE release date: Feb 19, 2013 OPAL: 1.6.4 OPAL SVN revision: r28081 OPAL release date: Feb 19, 2013 MPI API: 2.1 Ident string: 1.6.4 Prefix: /opt/MPI/openmpi-1.6.4/linux/intel Configured architecture: x86_64-unknown-linux-gnu Configure host: linuxbmc0601.rz.RWTH-Aachen.DE Configured by: pk224850 Configured on: Wed May 22 17:01:57 CEST 2013 Configure host: linuxbmc0601.rz.RWTH-Aachen.DE Built by: pk224850 Built on: Wed May 22 17:18:51 CEST 2013 Built host: linuxbmc0601.rz.RWTH-Aachen.DE C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: icc C compiler absolute: /opt/intel/Compiler/11.1/080/bin/intel64/icc C compiler family name: INTEL C compiler version: 1110.20101201 C++ compiler: icpc C++ compiler absolute: /opt/intel/Compiler/11.1/080/bin/intel64/icpc Fortran77 compiler: ifort -nofor-main -f77rtl -fpconstant -intconstant Fortran77 compiler abs: /opt/intel/Compiler/11.1/080/bin/intel64/ifort Fortran90 compiler: ifort -nofor-main Fortran90 compiler abs: /opt/intel/Compiler/11.1/080/bin/intel64/ifort C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: yes Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no) Sparse Groups: no Internal debug support: no MPI interface warnings: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: no Heterogeneous support: yes mpirun default --prefix: yes MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol vis. support: yes Host topology support: yes MPI extensions: affinity example FT Checkpoint support: no (checkpoint thread: no) VampirTrace support: no MPI_MAX_PROCESSOR_NAME: 256 MPI_MAX_ERROR_STRING: 256 MPI_MAX_OBJECT_NAME: 64 MPI_MAX_INFO_KEY: 36 MPI_MAX_INFO_VAL: 256 MPI_MAX_PORT_NAME: 1024 MPI_MAX_DATAREP_STRING: 128 MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.6.4) MCA memory: linux (MCA v2.0, API v2.0, Component v1.6.4) MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.4) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.6.4) MCA carto: file (MCA v2.0, API v2.0, Component v1.6.4) MCA shmem: mmap (MCA v2.0, API v2.0, Component v1.6.4) MCA shmem: posix (MCA v2.0, API v2.0, Component v1.6.4) MCA shmem: sysv (MCA v2.0, API v2.0, Component v1.6.4) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.6.4) MCA maffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.4) MCA timer: linux (MCA v2.0, API v2.0, Component v1.6.4) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.6.4) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.6.4) MCA sysinfo: linux (MCA v2.0, API v2.0, Component v1.6.4) MCA hwloc: hwloc132 (MCA v2.0, API v2.0, Component v1.6.4) MCA dpm: orte (MCA v2.0, API v2.0, Component v1.6.4) MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.6.4) MCA allocator: basic (MCA v2.0, API v2.0, Component v1.6.4) MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.6.4) MCA coll: basic (MCA v2.0, API v2.0, Component v1.6.4) MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.6.4) MCA coll: inter (MCA v2.0, API v2.0, Component v1.6.4) MCA coll: self (MCA v2.0, API v2.0, Component v1.6.4) MCA coll: sm (MCA v2.0, API v2.0, Component v1.6.4) MCA coll: sync (MCA v2.0, API v2.0, Component v1.6.4) MCA coll: tuned (MCA v2.0, API v2.0, Component v1.6.4) MCA io: romio (MCA v2.0, API v2.0, Component v1.6.4) MCA mpool: fake (MCA v2.0, API v2.0, Component v1.6.4) MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.6.4) MCA mpool: sm (MCA v2.0, API v2.0, Component v1.6.4) MCA pml: bfo (MCA v2.0, API v2.0, Component v1.6.4) MCA pml: csum (MCA v2.0, API v2.0, Component v1.6.4) MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.6.4) MCA pml: v (MCA v2.0, API v2.0, Component v1.6.4) MCA bml: r2 (MCA v2.0, API v2.0, Component v1.6.4) MCA rcache: vma (MCA v2.0, API v2.0, Component v1.6.4) MCA btl: self (MCA v2.0, API v2.0, Component v1.6.4) MCA btl: ofud (MCA v2.0, API v2.0, Component v1.6.4) MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.4) MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.4) MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.4) MCA topo: unity (MCA v2.0, API v2.0, Component v1.6.4) MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.6.4) MCA osc: rdma (MCA v2.0, API v2.0, Component v1.6.4) MCA iof: hnp (MCA v2.0, API v2.0, Component v1.6.4) MCA iof: orted (MCA v2.0, API v2.0, Component v1.6.4) MCA iof: tool (MCA v2.0, API v2.0, Component v1.6.4) MCA oob: tcp (MCA v2.0, API v2.0, Component v1.6.4) MCA odls: default (MCA v2.0, API v2.0, Component v1.6.4) MCA ras: cm (MCA v2.0, API v2.0, Component v1.6.4) MCA ras: loadleveler (MCA v2.0, API v2.0, Component v1.6.4) MCA ras: lsf (MCA v2.0, API v2.0, Component v1.6.4) MCA ras: slurm (MCA v2.0, API v2.0, Component v1.6.4) MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.6.4) MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.6.4) MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.6.4) MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.6.4) MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.6.4) MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.6.4) MCA rml: oob (MCA v2.0, API v2.0, Component v1.6.4) MCA routed: binomial (MCA v2.0, API v2.0, Component v1.6.4) MCA routed: cm (MCA v2.0, API v2.0, Component v1.6.4) MCA routed: direct (MCA v2.0, API v2.0, Component v1.6.4) MCA routed: linear (MCA v2.0, API v2.0, Component v1.6.4) MCA routed: radix (MCA v2.0, API v2.0, Component v1.6.4) MCA routed: slave (MCA v2.0, API v2.0, Component v1.6.4) MCA plm: lsf (MCA v2.0, API v2.0, Component v1.6.4) MCA plm: rsh (MCA v2.0, API v2.0, Component v1.6.4) MCA plm: slurm (MCA v2.0, API v2.0, Component v1.6.4) MCA filem: rsh (MCA v2.0, API v2.0, Component v1.6.4) MCA errmgr: default (MCA v2.0, API v2.0, Component v1.6.4) MCA ess: env (MCA v2.0, API v2.0, Component v1.6.4) MCA ess: hnp (MCA v2.0, API v2.0, Component v1.6.4) MCA ess: lsf (MCA v2.0, API v2.0, Component v1.6.4) MCA ess: singleton (MCA v2.0, API v2.0, Component v1.6.4) MCA ess: slave (MCA v2.0, API v2.0, Component v1.6.4) MCA ess: slurm (MCA v2.0, API v2.0, Component v1.6.4) MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.6.4) MCA ess: tool (MCA v2.0, API v2.0, Component v1.6.4) MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.6.4) MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.6.4) MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.6.4) MCA notifier: command (MCA v2.0, API v1.0, Component v1.6.4) MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.6.4)
#include <stdio.h> #include <string.h> #include <unistd.h> #include <mpi.h> int main (int argc, char **argv) { MPI_Win win; int rank, size; MPI_Init(&argc, &argv); MPI_Barrier(MPI_COMM_WORLD); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { int *array; MPI_Alloc_mem(size * sizeof(int), MPI_INFO_NULL, &array); memset(array, 0, size * sizeof(int)); MPI_Win_create(array, size * sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win); int ready, ready1 = -1; do { MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win); for (int i = ready = 0; i < size; ready += array[i++]); if (ready != ready1) { printf("[%.6f] %d workers checked in\n", MPI_Wtime(), ready); ready1 = ready; } MPI_Win_unlock(0, win); } while (ready < size-1); printf("[%.6f] All workers checked in\n", MPI_Wtime()); MPI_Win_free(&win); MPI_Free_mem(array); printf("[%.6f] Master finished\n", MPI_Wtime()); } else { int one = 1; MPI_Win_create(NULL, 0, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win); sleep(rank); MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win); printf("[%.6f] Worker %d acquired lock\n", MPI_Wtime(), rank); MPI_Put(&one, 1, MPI_INT, 0, rank, 1, MPI_INT, win); printf("[%.6f] Worker %d unlocking the window\n", MPI_Wtime(), rank); MPI_Win_unlock(0, win); printf("[%.6f] Worker %d synched\n", MPI_Wtime(), rank); MPI_Win_free(&win); printf("[%.6f] Worker %d done\n", MPI_Wtime(), rank); } MPI_Finalize(); return 0; }
smime.p7s
Description: S/MIME cryptographic signature