Thanks, George. It works!  In addition, the following code would also cause
a problem.  checking if count ==0 should be moved to the beginning of the
code ompi/mpi/c/reduce.c and ireduce.c, or fix it in other way.

Dahai


#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>

int main(int argc, char** argv)
{
    int r[1], s[1];
    MPI_Init(&argc,&argv);

    s[0] = 1;
    r[0] = -1;
    MPI_Reduce(s,r,0,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD);
    printf("%d\n",r[0]);
    MPI_Reduce(NULL,NULL,0,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD);
    MPI_Finalize();
}
~


On Thu, May 4, 2017 at 9:18 PM, George Bosilca <bosi...@icl.utk.edu> wrote:

> I was able to reproduce it (with the correct version of OMPI, aka. the
> v2.x branch). The problem seems to be that we are lacking a part of
> the fe68f230991 commit, that remove a free on a statically allocated array.
> Here is the corresponding patch:
>
> diff --git a/ompi/errhandler/errhandler_predefined.c
> b/ompi/errhandler/errhandler_predefined.c
> index 4d50611c12..54ac63553c 100644
> --- a/ompi/errhandler/errhandler_predefined.c
> +++ b/ompi/errhandler/errhandler_predefined.c
> @@ -15,6 +15,7 @@
>   * Copyright (c) 2010-2011 Oak Ridge National Labs.  All rights reserved.
>   * Copyright (c) 2012      Los Alamos National Security, LLC.
>   *                         All rights reserved.
> + * Copyright (c) 2016      Intel, Inc.  All rights reserved.
>   * $COPYRIGHT$
>   *
>   * Additional copyrights may follow
> @@ -181,6 +182,7 @@ static void backend_fatal_aggregate(char *type,
>      const char* const unknown_error_code = "Error code: %d (no associated
> error message)";
>      const char* const unknown_error = "Unknown error";
>      const char* const unknown_prefix = "[?:?]";
> +    bool generated = false;
>
>      // these do not own what they point to; they're
>      // here to avoid repeating expressions such as
> @@ -211,6 +213,8 @@ static void backend_fatal_aggregate(char *type,
>                  err_msg = NULL;
>                  opal_output(0, "%s", "Could not write to err_msg");
>                  opal_output(0, unknown_error_code, *error_code);
> +            } else {
> +                generated = true;
>              }
>          }
>      }
> @@ -256,7 +260,9 @@ static void backend_fatal_aggregate(char *type,
>      }
>
>      free(prefix);
> -    free(err_msg);
> +    if (generated) {
> +        free(err_msg);
> +    }
>  }
>
>  /*
>
>   George.
>
>
>
> On Thu, May 4, 2017 at 10:03 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> Can you get a stack trace?
>>
>> > On May 4, 2017, at 6:44 PM, Dahai Guo <dahai....@gmail.com> wrote:
>> >
>> > Hi, George:
>> >
>> > attached is the ompi_info.  I built it on Power8 arch. The configure is
>> also simple.
>> >
>> > ../configure --prefix=${installdir} \
>> > --enable-orterun-prefix-by-default
>> >
>> > Dahai
>> >
>> > On Thu, May 4, 2017 at 4:45 PM, George Bosilca <bosi...@icl.utk.edu>
>> wrote:
>> > Dahai,
>> >
>> > You are right the segfault is unexpected. I can't replicate this on my
>> mac. What architecture are you seeing this issue ? How was your OMPI
>> compiled ?
>> >
>> > Please post the output of ompi_info.
>> >
>> > Thanks,
>> > George.
>> >
>> >
>> >
>> > On Thu, May 4, 2017 at 5:42 PM, Dahai Guo <dahai....@gmail.com> wrote:
>> > Those messages are what I like to see. But, there are some other error
>> messages and core dump I don't like, as I attached in my previous email.  I
>> think something might be wrong with errhandler in openmpi.  Similar thing
>> happened for Bcast, etc
>> >
>> > Dahai
>> >
>> > On Thu, May 4, 2017 at 4:32 PM, Nathan Hjelm <hje...@me.com> wrote:
>> > By default MPI errors are fatal and abort. The error message says it
>> all:
>> >
>> > *** An error occurred in MPI_Reduce
>> > *** reported by process [3645440001,0]
>> > *** on communicator MPI_COMM_WORLD
>> > *** MPI_ERR_COUNT: invalid count argument
>> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>> > *** and potentially your MPI job)
>> >
>> > If you want different behavior you have to change the default error
>> handler on the communicator using MPI_Comm_set_errhandler. You can set it
>> to MPI_ERRORS_RETURN and check the error code or you can create your own
>> function. See MPI 3.1 Chapter 8.
>> >
>> > -Nathan
>> >
>> > On May 04, 2017, at 02:58 PM, Dahai Guo <dahai....@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> Using opemi 2.1,  the following code resulted in the core dump,
>> although only a simple error msg was expected.  Any idea what is wrong?  It
>> seemed related the errhandler somewhere.
>> >>
>> >>
>> >> D.G.
>> >>
>> >>
>> >>  *** An error occurred in MPI_Reduce
>> >>  *** reported by process [3645440001,0]
>> >>  *** on communicator MPI_COMM_WORLD
>> >>  *** MPI_ERR_COUNT: invalid count argument
>> >>  *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
>> abort,
>> >>  ***    and potentially your MPI job)
>> >> ......
>> >>
>> >> [1,1]<stderr>:1000151c0000-1000151e0000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:1000151e0000-100015250000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015250000-100015270000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015270000-1000152e0000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:1000152e0000-100015300000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015300000-100015510000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015510000-100015530000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015530000-100015740000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015740000-100015760000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015760000-100015970000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015970000-100015990000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015990000-100015ba0000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015ba0000-100015bc0000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015bc0000-100015dd0000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015dd0000-100015df0000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100015df0000-100016000000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100016000000-100016020000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100016020000-100016230000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100016230000-100016250000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100016250000-100016460000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:100016460000-100016470000 rw-p 00000000 00:00 0
>> >> [1,1]<stderr>:3fffd4630000-3fffd46c0000 rw-p 00000000 00:00 0
>>                   [stack]
>> >> ------------------------------------------------------------
>> --------------
>> >>
>> >> #include <stdlib.h>
>> >> #include <stdio.h>
>> >> #include <mpi.h>
>> >> int main(int argc, char** argv)
>> >> {
>> >>
>> >>     int r[1], s[1];
>> >>     MPI_Init(&argc,&argv);
>> >>
>> >>     s[0] = 1;
>> >>     r[0] = -1;
>> >>     MPI_Reduce(s,r,-1,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD);
>> >>     printf("%d\n",r[0]);
>> >>     MPI_Finalize();
>> >> }
>> >>
>> >> _______________________________________________
>> >> devel mailing list
>> >> devel@lists.open-mpi.org
>> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel@lists.open-mpi.org
>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> >
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel@lists.open-mpi.org
>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> >
>> >
>> > _______________________________________________
>> > devel mailing list
>> > devel@lists.open-mpi.org
>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> >
>> > <opmi_info.txt>_______________________________________________
>> > devel mailing list
>> > devel@lists.open-mpi.org
>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>>
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>
>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to