Hi George,
Having looked again you're correct about the two 2buf reductions being wrong.
For now, I've updated my patch of nbc.c to copy buf1 into buf3 and then do buf3
OP= buf2 (see below).
Patching ompi_3buff_op_reduce to cope with user-defined operations is certainly
possible, but I don't really understand the implications of doing that for the
rest of the codebase (this is the first time I've looked at the internals of
OpenMPI).
Best,
Rupert
if (ompi_op_is_intrinsic(opargs.op)) {
/* This does buf3 = buf1 OP buf2 */
ompi_3buff_op_reduce(opargs.op, buf1, buf2, buf3, opargs.count,
opargs.datatype);
} else {
/* Copy buf1 -> buf3 (if necessary)
* then do buf3 OP= buf2
* If the output is the same as the first input, we don't need to copy
* This only applies to the second if the operator commutes */
if (buf1 == buf3) {
ompi_op_reduce(opargs.op, buf2, buf3, opargs.count,
opargs.datatype);
} else if (buf2 == buf3 && ompi_op_is_commute(opargs.op)) {
ompi_op_reduce(opargs.op, buf1, buf3, opargs.count,
opargs.datatype);
} else {
res = NBC_Copy(buf1, opargs.count, opargs.datatype, buf3,
opargs.count, opargs.datatype, handle->comm);
if(res != NBC_OK) { printf("NBC_Copy() failed (code: %i)\n", res);
ret=res; goto error; }
ompi_op_reduce(opargs.op, buf2, buf3, opargs.count,
opargs.datatype);
}
}
> Rupert,
>
> You are right, the code of any non-blocking reduce is not built with
> user-level op in mind. However, I'm not sure about your patch. One
> reason is that ompi_3buff is doing target = source1 op source2 while
> ompi_2buf is doing target op= source (notice the op=)
>
> Thus you can't replace ompi_3buff by 2 ompi_2buff because you
> basically replace target = source1 op source2 by target op= source1 op
> source2
>
> Moreover, I much nicer solution will be to patch directly the
> ompi_3buff_op_reduce function in op.h to fallback to a user defined
> function when necessary.
>
> George.
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.