date:20051011

[O-MPI devel] NPB- FT errors

2005-10-11 Thread Galen M. Shipman

When running the NPB - FT using 128 nodes problem size C, I get the  
following error with both btl_tcp and btl_mvapi:


-bash-3.00$ mpirun -np 128 -machinefile ~/dqlist -mca btl self,tcp - 
mca mpi_leave_pinned 0  ./bin/ft.C.128



NAS Parallel Benchmarks 2.3 -- FT Benchmark

No input file inputft.data. Using compiled defaults
Size: 512x512x512
Iterations  :  20
Number of processes : 128
Processor array :   1x128
Layout type :  1D
[dq049:27360] *** An error occurred in MPI_Reduce
[dq049:27360] *** on communicator MPI_COMM_WORLD
[dq049:27360] *** MPI_ERR_OP: invalid reduce operation
[dq049:27360] *** MPI_ERRORS_ARE_FATAL (goodbye)
[dq048:27568] *** An error occurred in MPI_Reduce
[dq048:27568] *** on communicator MPI_COMM_WORLD
[dq048:27568] *** MPI_ERR_OP: invalid reduce operation
[dq048:27568] *** MPI_ERRORS_ARE_FATAL (goodbye)
[dq088:24879] *** An error occurred in MPI_Reduce

Re: [O-MPI devel] NPB- FT errors

2005-10-11 Thread Jeff Squyres


Now fixed in SVN.  Thanks!


On Oct 11, 2005, at 6:06 PM, Galen M. Shipman wrote:


When running the NPB - FT using 128 nodes problem size C, I get the
following error with both btl_tcp and btl_mvapi:

-bash-3.00$ mpirun -np 128 -machinefile ~/dqlist -mca btl self,tcp -
mca mpi_leave_pinned 0  ./bin/ft.C.128


NAS Parallel Benchmarks 2.3 -- FT Benchmark

No input file inputft.data. Using compiled defaults
Size: 512x512x512
Iterations  :  20
Number of processes : 128
Processor array :   1x128
Layout type :  1D
[dq049:27360] *** An error occurred in MPI_Reduce
[dq049:27360] *** on communicator MPI_COMM_WORLD
[dq049:27360] *** MPI_ERR_OP: invalid reduce operation
[dq049:27360] *** MPI_ERRORS_ARE_FATAL (goodbye)
[dq048:27568] *** An error occurred in MPI_Reduce
[dq048:27568] *** on communicator MPI_COMM_WORLD
[dq048:27568] *** MPI_ERR_OP: invalid reduce operation
[dq048:27568] *** MPI_ERRORS_ARE_FATAL (goodbye)
[dq088:24879] *** An error occurred in MPI_Reduce

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Re: [O-MPI devel] porting guide?

2005-10-11 Thread Brooks Davis

On Mon, Oct 10, 2005 at 11:26:41PM -0500, Brian Barrett wrote:
> On Oct 10, 2005, at 6:59 PM, Brooks Davis wrote:
> 
> > The configure output ends with:
> >
> > ...
> > config.status: creating test/util/Makefile
> > config.status: creating include/ompi_config.h
> > config.status: creating include/mpi.h
> > config.status: include/mpi.h is unchanged
> > config.status: linking ./opal/mca/timer/base/timer_base_null.h to  
> > opal/mca/timer/base/base_impl.h
> >
> > I've attached gziped copies of the configure output and config.log.
> 
> Ok, this was a silly error on our part - a header file wasn't shipped  
> as part of the distribution tarball.  I committed a patch to the  
> trunk to fix this bug and it should be in the 1.0 as soon as the 1.0  
> release manager gets a chance to review the commit (should be  
> tomorrow).  If you want to try a nightly, they are available here:
> 
>http://www.open-mpi.org/nightly/
> 
> Of course, the 1.0 nightly for tomorrow morning will not have the fix  
> just yet.

Thanks.  With the following patchs and passing
--disable-pretty-print-stacktrace to configure I was able to get truck
at rev 7709 to build.  I haven't done any testing yet, but that's a good
first step.

The libutil.h check is to get openpty().  The existing code checked for
the libary, but not the header.

The f77_get_fortran_handle_max.m4 change is because FreeBSD's eval
appears to use ints internally so they overflow to negative numbers and
cause problems.  Fortunatly, they roll back over once properly escaped.

-- Brooks

Index: configure.ac
===
--- configure.ac(revision 7709)
+++ configure.ac(working copy)
@@ -1043,7 +1043,7 @@
 ompi_show_title "Header file tests"
 
 AC_CHECK_HEADERS([alloca.h aio.h arpa/inet.h dirent.h \
-dlfcn.h execinfo.h err.h fcntl.h inttypes.h libgen.h \
+dlfcn.h execinfo.h err.h fcntl.h inttypes.h libgen.h libutil.h \
 net/if.h netdb.h netinet/in.h netinet/tcp.h \
 poll.h pthread.h pty.h pwd.h sched.h stdint.h \
 string.h strings.h stropts.h sys/fcntl.h sys/ipc.h \
Index: config/f77_get_fortran_handle_max.m4
===
--- config/f77_get_fortran_handle_max.m4(revision 7709)
+++ config/f77_get_fortran_handle_max.m4(working copy)
@@ -34,7 +34,10 @@
 ompi_fint_max=`expr $ompi_fint_max \* 2`
 ompi_sizeof_fint=`expr $ompi_sizeof_fint - 1`
 done
-ompi_fint_max=`expr $ompi_fint_max - 1`
+# ompi_fint_max might be negative here due to integer rollover
+# on some systems.  Escape it just in case.  This doesn't handle
+# all possible cases, but hopefully it's good enough.
+ompi_fint_max=`expr \( $ompi_fint_max \) - 1`
 fi
 
 # Get INT_MAX.  Compute a SWAG if we are cross compiling or something
@@ -55,7 +58,10 @@
 ompi_cint_max=`expr $ompi_cint_max \* 2`
 ompi_sizeof_cint=`expr $ompi_sizeof_cint - 1`
 done
-ompi_cint_max=`expr $ompi_cint_max - 1`])
+# ompi_cint_max might be negative here due to integer rollover
+# on some systems.  Escape it just in case.  This doesn't handle
+# all possible cases, but hopefully it's good enough.
+ompi_cint_max=`expr \( $ompi_cint_max \) - 1`])
 
 if test "$ompi_cint_max" = "0" ; then
 # wow - something went really wrong.  Be conservative
Index: orte/mca/iof/base/iof_base_setup.c
===
--- orte/mca/iof/base/iof_base_setup.c  (revision 7709)
+++ orte/mca/iof/base/iof_base_setup.c  (working copy)
@@ -47,6 +47,9 @@
 #  include 
 # endif
 #endif
+#ifdef HAVE_LIBUTIL_H
+#include 
+#endif
 
 #include "mca/iof/base/iof_base_setup.h"
 

-- 
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4


pgpvl_TL1BdI2.pgp
Description: PGP signature

Re: [O-MPI devel] porting guide?

2005-10-11 Thread Jeff Squyres


On Oct 11, 2005, at 11:01 PM, Brooks Davis wrote:


Of course, the 1.0 nightly for tomorrow morning will not have the fix
just yet.


Thanks.  With the following patchs and passing
--disable-pretty-print-stacktrace to configure I was able to get truck
at rev 7709 to build.  I haven't done any testing yet, but that's a 
good

first step.


Can you elaborate on why you needed that?  If there's a problem with 
the stacktrace stuff on BSD, I'd like to make it either disable by 
default or fix whatever is required to work properly on BSD.



The libutil.h check is to get openpty().  The existing code checked for
the libary, but not the header.


Thanks!


The f77_get_fortran_handle_max.m4 change is because FreeBSD's eval
appears to use ints internally so they overflow to negative numbers and
cause problems.  Fortunatly, they roll back over once properly escaped.


I don't quite understand this -- are you saying that $ompi_fint_max 
becomes a negative number after all the *2's, and then when we escape 
it and subtract one, it becomes positive?  (ditto for ompi_cint_max)


Thanks for the patch!

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Re: [O-MPI devel] porting guide?

2005-10-11 Thread Brooks Davis

On Tue, Oct 11, 2005 at 11:20:57PM -0400, Jeff Squyres wrote:
> On Oct 11, 2005, at 11:01 PM, Brooks Davis wrote:
> 
> >> Of course, the 1.0 nightly for tomorrow morning will not have the fix
> >> just yet.
> >
> > Thanks.  With the following patchs and passing
> > --disable-pretty-print-stacktrace to configure I was able to get truck
> > at rev 7709 to build.  I haven't done any testing yet, but that's a 
> > good
> > first step.
> 
> Can you elaborate on why you needed that?  If there's a problem with 
> the stacktrace stuff on BSD, I'd like to make it either disable by 
> default or fix whatever is required to work properly on BSD.

There were a bunch of undefined symbols that I didn't track down.
Hopefully there's just a missing header file.  I need to dig into it
more.  I just disabled it because I was hoping that would be the only
issue.  It wasn't but, I had stop working before I could try again
with stack traces enabled.

> > The f77_get_fortran_handle_max.m4 change is because FreeBSD's eval
> > appears to use ints internally so they overflow to negative numbers and
> > cause problems.  Fortunatly, they roll back over once properly escaped.
> 
> I don't quite understand this -- are you saying that $ompi_fint_max 
> becomes a negative number after all the *2's, and then when we escape 
> it and subtract one, it becomes positive?  (ditto for ompi_cint_max)

On FreeBSD, eval is using 32-bit signed numbers internally on i386 (it
looks like it uses longs in general, but I haven't tested on a 64-bit
machine yet).  This means that when you compute 2^31 you get INT_MAX + 1
which is negative.  Subtracting one gives INT_MAX so you get the right
value despite the sign weirdness.  Assuming I'm correct about longs
being used, we'll also be OK on 64-bit machines even if this code is
used to compute the maximum value of a 64-bit signed integer.  It won't
work for unsigned numbers on either system though.

-- Brooks

-- 
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

pgpwUqirJqYGs.pgp
Description: PGP signature

[O-MPI devel] NPB- FT errors

Re: [O-MPI devel] NPB- FT errors

Re: [O-MPI devel] porting guide?

Re: [O-MPI devel] porting guide?

Re: [O-MPI devel] porting guide?

5 matches

Site Navigation

Mail list logo

Footer information