That's a really weird backtrace - it seems to indicate that the
datatype engine is improperly calling free(). Can you try running
without openib (add "-mca btl_base_exclude openib" to the mpirun
arguments) and see if the problem goes away? Also, what platform was
this on?
Thanks,
Brian
On Oct 21, 2005, at 6:37 PM, Troy Benjegerdes wrote:
On Fri, Oct 21, 2005 at 06:26:32PM -0500, Troy Benjegerdes wrote:
On Fri, Oct 21, 2005 at 04:12:05PM -0500, Andrew Friedley wrote:
I just committed a fix to the trunk to fix your original segfault
down
in opal_show_help() - this is the same problem Ken posted. This fix
should make it into the v1.0 branch eventually. Even so, you are
going
to run into the real problem you were handling - this fix is just
for
proper error handling/output.
The error below looks like a word size mismatch - one thing is
compiled
64bit, the other is compiled 32bit. Make sure everything is
compiled
either 32bit or 64bit.
Another note.. I think I may have had some problems because I
built with
'make -j16'.. has anyone else tried parallel make builds?
I have a working mpirun now.
Now I'm back to having NetPIPE segfault when I run it..
And here's a backtrace:
0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
#0 0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
#1 0x00002aaaaaecb016 in opal_mem_free_free_hook ()
from /usr/local/lib/libopal.so.0
#2 0x00002aaaaac0c663 in ompi_convertor_cleanup ()
from /usr/local/lib/libmpi.so.0
#3 0x00002aaaaeb41dbe in mca_pml_ob1_match_completion_cache ()
from /usr/local/lib/openmpi/mca_pml_ob1.so
#4 0x00002aaaaf179c7b in mca_btl_sm_component_progress ()
from /usr/local/lib/openmpi/mca_btl_sm.so
#5 0x00002aaaaee5eefe in mca_bml_r2_progress ()
from /usr/local/lib/openmpi/mca_bml_r2.so
#6 0x00002aaaaeb3dd4e in mca_pml_ob1_progress ()
from /usr/local/lib/openmpi/mca_pml_ob1.so
#7 0x00002aaaaaeb5c4a in opal_progress () from
/usr/local/lib/libopal.so.0
#8 0x00002aaaaeb3c265 in mca_pml_ob1_recv ()
from /usr/local/lib/openmpi/mca_pml_ob1.so
#9 0x00002aaaaf6a0936 in mca_coll_basic_barrier_intra_lin ()
from /usr/local/lib/openmpi/mca_coll_basic.so
#10 0x00002aaaaac1f3b8 in PMPI_Barrier () from
/usr/local/lib/libmpi.so.0
---Type <return> to continue, or q <return> to quit---#11
0x0000000000403016 inSync ()
#12 0x0000000000401ef8 in main ()
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel