troy@opteron1:/usr/src/netpipe3-dev$ mpirun -np 2 -mca btl_base_exclude
openib NPmpi
1: opteron1
0: opteron1
mpirun noticed that job rank 1 with PID 352 on node "localhost" exited
on signal 11.
1 process killed (possibly by Open MPI)

This is debian-amd64 (from 
deb http://mirror.espri.arizona.edu/debian-amd64/debian/ etch main )

On Mon, Oct 24, 2005 at 10:36:29AM -0500, Brian Barrett wrote:
> That's a really weird backtrace - it seems to indicate that the  
> datatype engine is improperly calling free().  Can you try running  
> without openib (add "-mca btl_base_exclude openib" to the mpirun  
> arguments) and see if the problem goes away?  Also, what platform was  
> this on?
> 
> Thanks,
> 
> Brian
> 
> On Oct 21, 2005, at 6:37 PM, Troy Benjegerdes wrote:
> 
> > On Fri, Oct 21, 2005 at 06:26:32PM -0500, Troy Benjegerdes wrote:
> >
> >> On Fri, Oct 21, 2005 at 04:12:05PM -0500, Andrew Friedley wrote:
> >>
> >>> I just committed a fix to the trunk to fix your original segfault  
> >>> down
> >>> in opal_show_help() - this is the same problem Ken posted. This fix
> >>> should make it into the v1.0 branch eventually.  Even so, you are  
> >>> going
> >>> to run into the real problem you were handling - this fix is just  
> >>> for
> >>> proper error handling/output.
> >>>
> >>> The error below looks like a word size mismatch - one thing is  
> >>> compiled
> >>> 64bit, the other is compiled 32bit.  Make sure everything is  
> >>> compiled
> >>> either 32bit or 64bit.
> >>>
> >>
> >> Another note.. I think I may have had some problems because I  
> >> built with
> >> 'make -j16'.. has anyone else tried parallel make builds?
> >>
> >> I have a working mpirun now.
> >>
> >> Now I'm back to having NetPIPE segfault when I run it..
> >>
> >
> > And here's a backtrace:
> >
> > 0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
> > #0  0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
> > #1  0x00002aaaaaecb016 in opal_mem_free_free_hook ()
> >    from /usr/local/lib/libopal.so.0
> > #2  0x00002aaaaac0c663 in ompi_convertor_cleanup ()
> >    from /usr/local/lib/libmpi.so.0
> > #3  0x00002aaaaeb41dbe in mca_pml_ob1_match_completion_cache ()
> >    from /usr/local/lib/openmpi/mca_pml_ob1.so
> > #4  0x00002aaaaf179c7b in mca_btl_sm_component_progress ()
> >    from /usr/local/lib/openmpi/mca_btl_sm.so
> > #5  0x00002aaaaee5eefe in mca_bml_r2_progress ()
> >    from /usr/local/lib/openmpi/mca_bml_r2.so
> > #6  0x00002aaaaeb3dd4e in mca_pml_ob1_progress ()
> >    from /usr/local/lib/openmpi/mca_pml_ob1.so
> > #7  0x00002aaaaaeb5c4a in opal_progress () from
> > /usr/local/lib/libopal.so.0
> > #8  0x00002aaaaeb3c265 in mca_pml_ob1_recv ()
> >    from /usr/local/lib/openmpi/mca_pml_ob1.so
> > #9  0x00002aaaaf6a0936 in mca_coll_basic_barrier_intra_lin ()
> >    from /usr/local/lib/openmpi/mca_coll_basic.so
> > #10 0x00002aaaaac1f3b8 in PMPI_Barrier () from
> > /usr/local/lib/libmpi.so.0
> > ---Type <return> to continue, or q <return> to quit---#11
> > 0x0000000000403016 inSync ()
> > #12 0x0000000000401ef8 in main ()
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
--------------------------------------------------------------------------
Troy Benjegerdes                'da hozer'                ho...@hozed.org  

Somone asked me why I work on this free (http://www.fsf.org/philosophy/)
software stuff and not get a real job. Charles Shultz had the best answer:

"Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's why
I draw cartoons. It's my life." -- Charles Shultz

Reply via email to