It even says so in the code:

ompi/mca/mtl/psm/mtl_psm.c:

       /* Default error handling is enabled, errors will not be returned to
         * user.  PSM prints the error and the offending endpoint's hostname
         * and exits with -1 */

Disabling the default PSM error handler makes MPI_Cancel() fail
gracefully. But then no error is handled anymore.

                Adrian

On Thu, Jan 15, 2015 at 10:21:05PM +0100, Adrian Reber wrote:
> As PSM on master is still broken I applied it on 1.8.4. Unfortunately it
> does not work. The error is the same as before.
> 
> Looking at your patch I would also expect that this is the correct fix
> and I even tried to change ompi_mtl_psm_cancel() to always return
> OMPI_SUCCESS. MPI_Cancel() still fails.
> 
> Looking at the PSM code it seems it can directly call exit(-1) and thus
> terminating and never returning to Open MPI. I do not see any debug
> output from Open MPI after "Cannot cancel send requests" from PSM.
> 
>               Adrian
> 
> On Thu, Jan 15, 2015 at 01:43:11PM -0500, George Bosilca wrote:
> > >From the MPI standard perspective MPI_Cancel doesn't have to succeed, it
> > can also gracefully fail. However, the PSM MTL diverges from the MPI
> > standard and if a request cannot be canceled an error is returned. Here is
> > a patch to fix this issue.
> > 
> > diff --git a/ompi/mca/mtl/psm/mtl_psm_cancel.c
> > b/ompi/mca/mtl/psm/mtl_psm_cancel
> > index 6da3386..277c761 100644
> > --- a/ompi/mca/mtl/psm/mtl_psm_cancel.c
> > +++ b/ompi/mca/mtl/psm/mtl_psm_cancel.c
> > @@ -37,10 +37,8 @@ int ompi_mtl_psm_cancel(struct mca_mtl_base_module_t*
> > mtl,
> >      if(PSM_OK == err) {
> >        mtl_request->ompi_req->req_status._cancelled = true;
> >        mtl_psm_request->super.completion_callback(&mtl_psm_request->super);
> > -      return OMPI_SUCCESS;
> > -    } else {
> > -      return OMPI_ERROR;
> >      }
> > +    return OMPI_SUCCESS;
> >    } else if(PSM_MQ_INCOMPLETE == err) {
> >      return OMPI_SUCCESS;
> >    }
> > 
> >   George.
> > 
> > 
> > On Thu, Jan 15, 2015 at 1:30 PM, Adrian Reber <adr...@lisas.de> wrote:
> > 
> > > Doing
> > >
> > > MPI_Isend()
> > >
> > > followed by a
> > >
> > > MPI_Cancel()
> > >
> > > fails on my PSM based system with 1.8.4 like this:
> > >
> > > n040108:0.1.Cannot cancel send requests (req=0x2b6279787f80)
> > > n040108:0.0.Cannot cancel send requests (req=0x2b3a3dc92f80)
> > > -------------------------------------------------------
> > > Primary job  terminated normally, but 1 process returned
> > > a non-zero exit code.. Per user-direction, the job has been aborted.
> > > -------------------------------------------------------
> > > --------------------------------------------------------------------------
> > > mpirun detected that one or more processes exited with non-zero status,
> > > thus causing
> > > the job to be terminated. The first process to do so was:
> > >
> > >   Process name: [[58364,1],1]
> > >   Exit code:    255
> > > --------------------------------------------------------------------------
> > >
> > > Is this something PSM actually cannot do or an Open MPI error?
> > >
> > >                 Adrian
> > > _______________________________________________
> > > devel mailing list
> > > de...@open-mpi.org
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > Link to this post:
> > > http://www.open-mpi.org/community/lists/devel/2015/01/16783.php
> > >
> 
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2015/01/16784.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16786.php

                Adrian

-- 
Adrian Reber <adr...@lisas.de>            http://lisas.de/~adrian/
C-3PO: 
        Don't call me a mindless philosopher, you overweight
        glob of grease!

Reply via email to