Rupert,

You are right, the code of any non-blocking reduce is not built with
user-level op in mind. However, I'm not sure about your patch. One
reason is that ompi_3buff is doing  target = source1 op source2 while
   ompi_2buf is doing target op= source (notice the op=)

Thus you can't replace ompi_3buff by 2 ompi_2buff because you
basically replace target = source1 op source2 by target op= source1 op
source2

Moreover, I much nicer solution will be to patch directly the
ompi_3buff_op_reduce function in op.h to fallback to a user defined
function when necessary.

  George.

On Wed, Apr 23, 2014 at 12:52 PM, Rupert Nash <rupert.n...@ed.ac.uk> wrote:
> Hello devel list
>
> I've been trying to use a non-blocking MPI_Iallreduce in a CFD application 
> I'm working on, but it kept segfaulting on me. I have reduced it to a simple 
> test case - see the gist here for the full code
>         https://gist.github.com/rupertnash/11222282
> build and run with:
>         mpicc test.c -o test && mpirun -n 2 ./test
>
> I am working on OS X Mavericks with open-mpi 1.8 built from the source 
> tarball.
>
> Through some debugging I have narrowed the problem down:
> In ompi/mca/coll/libnbc/nbc.c, in NBC_Start_round, where the code switches on 
> which type of operation has been put in the schedule:
>
>       case OP:
>         NBC_DEBUG(5, "  OP   (offset %li) ", (long)ptr-(long)myschedule);
>         NBC_GET_BYTES(ptr,opargs);
>         NBC_DEBUG(5, "*buf1: %p, buf2: %p, count: %i, type: %lu)\n", 
> opargs.buf1, opargs.buf2, opargs.count, (unsigned long)opargs.datatype);
>         /* get buffers */
>         /* SNIP */
> --->    ompi_3buff_op_reduce(opargs.op, buf1, buf2, buf3, opargs.count, 
> opargs.datatype);
>         break;
>
> The line marked with an arrow --> is the problem. Looking at the comments 
> describing ompi_3buff_op_reduce, it states "This function will *only* be 
> invoked on intrinsic MPI_Ops." Examining the code bears this out as it's 
> clearly indexing into a table of function pointers, which are all null for a 
> user-defined MPI_Op.
>
> Presumably the fix will be to replace the use of the 3buffer version with the 
> usual ompi_op_reduce, at least of non-intrinsic operations. I have made a 
> temporary patch by replacing the arrowed line with the following:
>         if (0 != (opargs.op->o_flags & OMPI_OP_FLAGS_INTRINSIC)) {
>           ompi_3buff_op_reduce(opargs.op, buf1, buf2, buf3, opargs.count, 
> opargs.datatype);
>         } else {
>           ompi_op_reduce(opargs.op, buf1, buf3, opargs.count, 
> opargs.datatype);
>           ompi_op_reduce(opargs.op, buf2, buf3, opargs.count, 
> opargs.datatype);
>         }
> However this is the first time I've looked under the hood of OpenMPI. 
> Hopefully you can patch it properly soon.
>
> Best wishes,
>
> Rupert
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14586.php

Reply via email to