That's a really weird backtrace - it seems to indicate that the datatype engine is improperly calling free(). Can you try running without openib (add "-mca btl_base_exclude openib" to the mpirun arguments) and see if the problem goes away? Also, what platform was this on?

Thanks,

Brian

On Oct 21, 2005, at 6:37 PM, Troy Benjegerdes wrote:

On Fri, Oct 21, 2005 at 06:26:32PM -0500, Troy Benjegerdes wrote:

On Fri, Oct 21, 2005 at 04:12:05PM -0500, Andrew Friedley wrote:

I just committed a fix to the trunk to fix your original segfault down
in opal_show_help() - this is the same problem Ken posted. This fix
should make it into the v1.0 branch eventually. Even so, you are going to run into the real problem you were handling - this fix is just for
proper error handling/output.

The error below looks like a word size mismatch - one thing is compiled 64bit, the other is compiled 32bit. Make sure everything is compiled
either 32bit or 64bit.


Another note.. I think I may have had some problems because I built with
'make -j16'.. has anyone else tried parallel make builds?

I have a working mpirun now.

Now I'm back to having NetPIPE segfault when I run it..


And here's a backtrace:

0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
#0  0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
#1  0x00002aaaaaecb016 in opal_mem_free_free_hook ()
   from /usr/local/lib/libopal.so.0
#2  0x00002aaaaac0c663 in ompi_convertor_cleanup ()
   from /usr/local/lib/libmpi.so.0
#3  0x00002aaaaeb41dbe in mca_pml_ob1_match_completion_cache ()
   from /usr/local/lib/openmpi/mca_pml_ob1.so
#4  0x00002aaaaf179c7b in mca_btl_sm_component_progress ()
   from /usr/local/lib/openmpi/mca_btl_sm.so
#5  0x00002aaaaee5eefe in mca_bml_r2_progress ()
   from /usr/local/lib/openmpi/mca_bml_r2.so
#6  0x00002aaaaeb3dd4e in mca_pml_ob1_progress ()
   from /usr/local/lib/openmpi/mca_pml_ob1.so
#7  0x00002aaaaaeb5c4a in opal_progress () from
/usr/local/lib/libopal.so.0
#8  0x00002aaaaeb3c265 in mca_pml_ob1_recv ()
   from /usr/local/lib/openmpi/mca_pml_ob1.so
#9  0x00002aaaaf6a0936 in mca_coll_basic_barrier_intra_lin ()
   from /usr/local/lib/openmpi/mca_coll_basic.so
#10 0x00002aaaaac1f3b8 in PMPI_Barrier () from
/usr/local/lib/libmpi.so.0
---Type <return> to continue, or q <return> to quit---#11
0x0000000000403016 inSync ()
#12 0x0000000000401ef8 in main ()

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to