[O-MPI devel] NPB- FT errors
When running the NPB - FT using 128 nodes problem size C, I get the following error with both btl_tcp and btl_mvapi: -bash-3.00$ mpirun -np 128 -machinefile ~/dqlist -mca btl self,tcp - mca mpi_leave_pinned 0 ./bin/ft.C.128 NAS Parallel Benchmarks 2.3 -- FT Benchmark No input file inputft.data. Using compiled defaults Size: 512x512x512 Iterations : 20 Number of processes : 128 Processor array : 1x128 Layout type : 1D [dq049:27360] *** An error occurred in MPI_Reduce [dq049:27360] *** on communicator MPI_COMM_WORLD [dq049:27360] *** MPI_ERR_OP: invalid reduce operation [dq049:27360] *** MPI_ERRORS_ARE_FATAL (goodbye) [dq048:27568] *** An error occurred in MPI_Reduce [dq048:27568] *** on communicator MPI_COMM_WORLD [dq048:27568] *** MPI_ERR_OP: invalid reduce operation [dq048:27568] *** MPI_ERRORS_ARE_FATAL (goodbye) [dq088:24879] *** An error occurred in MPI_Reduce
Re: [O-MPI devel] NPB- FT errors
Now fixed in SVN. Thanks! On Oct 11, 2005, at 6:06 PM, Galen M. Shipman wrote: When running the NPB - FT using 128 nodes problem size C, I get the following error with both btl_tcp and btl_mvapi: -bash-3.00$ mpirun -np 128 -machinefile ~/dqlist -mca btl self,tcp - mca mpi_leave_pinned 0 ./bin/ft.C.128 NAS Parallel Benchmarks 2.3 -- FT Benchmark No input file inputft.data. Using compiled defaults Size: 512x512x512 Iterations : 20 Number of processes : 128 Processor array : 1x128 Layout type : 1D [dq049:27360] *** An error occurred in MPI_Reduce [dq049:27360] *** on communicator MPI_COMM_WORLD [dq049:27360] *** MPI_ERR_OP: invalid reduce operation [dq049:27360] *** MPI_ERRORS_ARE_FATAL (goodbye) [dq048:27568] *** An error occurred in MPI_Reduce [dq048:27568] *** on communicator MPI_COMM_WORLD [dq048:27568] *** MPI_ERR_OP: invalid reduce operation [dq048:27568] *** MPI_ERRORS_ARE_FATAL (goodbye) [dq088:24879] *** An error occurred in MPI_Reduce ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] porting guide?
On Mon, Oct 10, 2005 at 11:26:41PM -0500, Brian Barrett wrote: > On Oct 10, 2005, at 6:59 PM, Brooks Davis wrote: > > > The configure output ends with: > > > > ... > > config.status: creating test/util/Makefile > > config.status: creating include/ompi_config.h > > config.status: creating include/mpi.h > > config.status: include/mpi.h is unchanged > > config.status: linking ./opal/mca/timer/base/timer_base_null.h to > > opal/mca/timer/base/base_impl.h > > > > I've attached gziped copies of the configure output and config.log. > > Ok, this was a silly error on our part - a header file wasn't shipped > as part of the distribution tarball. I committed a patch to the > trunk to fix this bug and it should be in the 1.0 as soon as the 1.0 > release manager gets a chance to review the commit (should be > tomorrow). If you want to try a nightly, they are available here: > >http://www.open-mpi.org/nightly/ > > Of course, the 1.0 nightly for tomorrow morning will not have the fix > just yet. Thanks. With the following patchs and passing --disable-pretty-print-stacktrace to configure I was able to get truck at rev 7709 to build. I haven't done any testing yet, but that's a good first step. The libutil.h check is to get openpty(). The existing code checked for the libary, but not the header. The f77_get_fortran_handle_max.m4 change is because FreeBSD's eval appears to use ints internally so they overflow to negative numbers and cause problems. Fortunatly, they roll back over once properly escaped. -- Brooks Index: configure.ac === --- configure.ac(revision 7709) +++ configure.ac(working copy) @@ -1043,7 +1043,7 @@ ompi_show_title "Header file tests" AC_CHECK_HEADERS([alloca.h aio.h arpa/inet.h dirent.h \ -dlfcn.h execinfo.h err.h fcntl.h inttypes.h libgen.h \ +dlfcn.h execinfo.h err.h fcntl.h inttypes.h libgen.h libutil.h \ net/if.h netdb.h netinet/in.h netinet/tcp.h \ poll.h pthread.h pty.h pwd.h sched.h stdint.h \ string.h strings.h stropts.h sys/fcntl.h sys/ipc.h \ Index: config/f77_get_fortran_handle_max.m4 === --- config/f77_get_fortran_handle_max.m4(revision 7709) +++ config/f77_get_fortran_handle_max.m4(working copy) @@ -34,7 +34,10 @@ ompi_fint_max=`expr $ompi_fint_max \* 2` ompi_sizeof_fint=`expr $ompi_sizeof_fint - 1` done -ompi_fint_max=`expr $ompi_fint_max - 1` +# ompi_fint_max might be negative here due to integer rollover +# on some systems. Escape it just in case. This doesn't handle +# all possible cases, but hopefully it's good enough. +ompi_fint_max=`expr \( $ompi_fint_max \) - 1` fi # Get INT_MAX. Compute a SWAG if we are cross compiling or something @@ -55,7 +58,10 @@ ompi_cint_max=`expr $ompi_cint_max \* 2` ompi_sizeof_cint=`expr $ompi_sizeof_cint - 1` done -ompi_cint_max=`expr $ompi_cint_max - 1`]) +# ompi_cint_max might be negative here due to integer rollover +# on some systems. Escape it just in case. This doesn't handle +# all possible cases, but hopefully it's good enough. +ompi_cint_max=`expr \( $ompi_cint_max \) - 1`]) if test "$ompi_cint_max" = "0" ; then # wow - something went really wrong. Be conservative Index: orte/mca/iof/base/iof_base_setup.c === --- orte/mca/iof/base/iof_base_setup.c (revision 7709) +++ orte/mca/iof/base/iof_base_setup.c (working copy) @@ -47,6 +47,9 @@ # include # endif #endif +#ifdef HAVE_LIBUTIL_H +#include +#endif #include "mca/iof/base/iof_base_setup.h" -- Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 pgpvl_TL1BdI2.pgp Description: PGP signature
Re: [O-MPI devel] porting guide?
On Oct 11, 2005, at 11:01 PM, Brooks Davis wrote: Of course, the 1.0 nightly for tomorrow morning will not have the fix just yet. Thanks. With the following patchs and passing --disable-pretty-print-stacktrace to configure I was able to get truck at rev 7709 to build. I haven't done any testing yet, but that's a good first step. Can you elaborate on why you needed that? If there's a problem with the stacktrace stuff on BSD, I'd like to make it either disable by default or fix whatever is required to work properly on BSD. The libutil.h check is to get openpty(). The existing code checked for the libary, but not the header. Thanks! The f77_get_fortran_handle_max.m4 change is because FreeBSD's eval appears to use ints internally so they overflow to negative numbers and cause problems. Fortunatly, they roll back over once properly escaped. I don't quite understand this -- are you saying that $ompi_fint_max becomes a negative number after all the *2's, and then when we escape it and subtract one, it becomes positive? (ditto for ompi_cint_max) Thanks for the patch! -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] porting guide?
On Tue, Oct 11, 2005 at 11:20:57PM -0400, Jeff Squyres wrote: > On Oct 11, 2005, at 11:01 PM, Brooks Davis wrote: > > >> Of course, the 1.0 nightly for tomorrow morning will not have the fix > >> just yet. > > > > Thanks. With the following patchs and passing > > --disable-pretty-print-stacktrace to configure I was able to get truck > > at rev 7709 to build. I haven't done any testing yet, but that's a > > good > > first step. > > Can you elaborate on why you needed that? If there's a problem with > the stacktrace stuff on BSD, I'd like to make it either disable by > default or fix whatever is required to work properly on BSD. There were a bunch of undefined symbols that I didn't track down. Hopefully there's just a missing header file. I need to dig into it more. I just disabled it because I was hoping that would be the only issue. It wasn't but, I had stop working before I could try again with stack traces enabled. > > The f77_get_fortran_handle_max.m4 change is because FreeBSD's eval > > appears to use ints internally so they overflow to negative numbers and > > cause problems. Fortunatly, they roll back over once properly escaped. > > I don't quite understand this -- are you saying that $ompi_fint_max > becomes a negative number after all the *2's, and then when we escape > it and subtract one, it becomes positive? (ditto for ompi_cint_max) On FreeBSD, eval is using 32-bit signed numbers internally on i386 (it looks like it uses longs in general, but I haven't tested on a 64-bit machine yet). This means that when you compute 2^31 you get INT_MAX + 1 which is negative. Subtracting one gives INT_MAX so you get the right value despite the sign weirdness. Assuming I'm correct about longs being used, we'll also be OK on 64-bit machines even if this code is used to compute the maximum value of a 64-bit signed integer. It won't work for unsigned numbers on either system though. -- Brooks -- Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 pgpwUqirJqYGs.pgp Description: PGP signature