Hi George, Having looked again you're correct about the two 2buf reductions being wrong. For now, I've updated my patch of nbc.c to copy buf1 into buf3 and then do buf3 OP= buf2 (see below).
Patching ompi_3buff_op_reduce to cope with user-defined operations is certainly possible, but I don't really understand the implications of doing that for the rest of the codebase (this is the first time I've looked at the internals of OpenMPI). Best, Rupert if (ompi_op_is_intrinsic(opargs.op)) { /* This does buf3 = buf1 OP buf2 */ ompi_3buff_op_reduce(opargs.op, buf1, buf2, buf3, opargs.count, opargs.datatype); } else { /* Copy buf1 -> buf3 (if necessary) * then do buf3 OP= buf2 * If the output is the same as the first input, we don't need to copy * This only applies to the second if the operator commutes */ if (buf1 == buf3) { ompi_op_reduce(opargs.op, buf2, buf3, opargs.count, opargs.datatype); } else if (buf2 == buf3 && ompi_op_is_commute(opargs.op)) { ompi_op_reduce(opargs.op, buf1, buf3, opargs.count, opargs.datatype); } else { res = NBC_Copy(buf1, opargs.count, opargs.datatype, buf3, opargs.count, opargs.datatype, handle->comm); if(res != NBC_OK) { printf("NBC_Copy() failed (code: %i)\n", res); ret=res; goto error; } ompi_op_reduce(opargs.op, buf2, buf3, opargs.count, opargs.datatype); } } > Rupert, > > You are right, the code of any non-blocking reduce is not built with > user-level op in mind. However, I'm not sure about your patch. One > reason is that ompi_3buff is doing target = source1 op source2 while > ompi_2buf is doing target op= source (notice the op=) > > Thus you can't replace ompi_3buff by 2 ompi_2buff because you > basically replace target = source1 op source2 by target op= source1 op > source2 > > Moreover, I much nicer solution will be to patch directly the > ompi_3buff_op_reduce function in op.h to fallback to a user defined > function when necessary. > > George.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.