I'm not sure this is actually a bug, but the difference may surprise users. It seems that the implementation of MPI_Ireduce_scatter(MPI_IN_PLACE,...) (ab?)uses the recvbuf to compute the intermediate reduction, while MPI_Reduce_scatter(MPI_IN_PLACE,...) does not.
Look at the following code (setup to be run in up to 16 processes). While MPI_Reduce_scatter() does not change the second and following elements of recvbuf, the nonblocking variant do modify the second and following entries in some ranks. [dalcinl@kw2060 openmpi]$ cat ireduce_scatter.c #include <stdlib.h> #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int i,size,rank; int recvbuf[] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}; int rcounts[] = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (size > 16) MPI_Abort(MPI_COMM_WORLD,1); #ifndef NBCOLL #define NBCOLL 1 #endif #if NBCOLL { MPI_Request request; MPI_Ireduce_scatter(MPI_IN_PLACE, recvbuf, rcounts, MPI_INT, MPI_SUM, MPI_COMM_WORLD, &request); MPI_Wait(&request,MPI_STATUS_IGNORE); } #else MPI_Reduce_scatter(MPI_IN_PLACE, recvbuf, rcounts, MPI_INT, MPI_SUM, MPI_COMM_WORLD); #endif printf("[%d] rbuf[%d]=%2d expected:%2d\n", rank, 0, recvbuf[i], size); for (i=1; i<size; i++) { printf("[%d] rbuf[%d]=%2d expected:%2d\n", rank, i, recvbuf[i], 1); } MPI_Finalize(); return 0; } [dalcinl@kw2060 openmpi]$ mpicc -DNBCOLL=0 ireduce_scatter.c [dalcinl@kw2060 openmpi]$ mpiexec -n 3 ./a.out | sort [0] rbuf[0]= 3 expected: 3 [0] rbuf[1]= 1 expected: 1 [0] rbuf[2]= 1 expected: 1 [1] rbuf[0]= 3 expected: 3 [1] rbuf[1]= 1 expected: 1 [1] rbuf[2]= 1 expected: 1 [2] rbuf[0]= 3 expected: 3 [2] rbuf[1]= 1 expected: 1 [2] rbuf[2]= 1 expected: 1 [dalcinl@kw2060 openmpi]$ mpicc -DNBCOLL=1 ireduce_scatter.c [dalcinl@kw2060 openmpi]$ mpiexec -n 3 ./a.out | sort [0] rbuf[0]= 3 expected: 3 [0] rbuf[1]= 2 expected: 1 [0] rbuf[2]= 2 expected: 1 [1] rbuf[0]= 3 expected: 3 [1] rbuf[1]= 1 expected: 1 [1] rbuf[2]= 1 expected: 1 [2] rbuf[0]= 3 expected: 3 [2] rbuf[1]= 1 expected: 1 [2] rbuf[2]= 1 expected: 1 -- Lisandro Dalcin --------------- CIMEC (UNL/CONICET) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo 3000 Santa Fe, Argentina Tel: +54-342-4511594 (ext 1016) Tel/Fax: +54-342-4511169