Okay, I tracked this silliness down. On odin, my platform file builds both 
shared and static. It appears that mpicc in that situation defaults to picking 
the static build, and so I wind up with a static executable. This behavior was 
unexpected - I thought we would default to dynamic, but support static if that 
flag was given to mpicc. Call me surprised, but at least now I know.

I found that the Lustre headers and libs are indeed on the system, and so your 
analysis of the problem is correct.

When I build with nothing on the configure line, we only build shared and so 
the executable is dynamic - and the problem goes away.

HTH
Ralph

On Oct 30, 2012, at 12:06 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Sure - I can do that.
> 
> On Oct 30, 2012, at 11:29 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote:
> 
>> glad to hear that. However, since we are also having the problem with
>> the lustre-fs module for static builds, I think it would still make
>> sense to disable fs/lustre/ for 1.7.0
>> 
>> Edgar
>> 
>> On 10/30/2012 12:34 PM, Ralph Castain wrote:
>>> I hate odin :-(
>>> 
>>> FWIW: it all works fine today, no matter how I configure it. No earthly 
>>> idea what happened.
>>> 
>>> Ignore these droids....
>>> 
>>> 
>>> On Oct 30, 2012, at 7:28 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote:
>>> 
>>>> ok, so a couple of things.
>>>> 
>>>> I still think it is the same issue that I observed 1-2 days ago. Could
>>>> you try to remove the fs/lustre component from your compilation, e.g. by
>>>> adding an .ompi_ignore file into that directory, and see whether this
>>>> fixes the issue?
>>>> 
>>>> I tried on my machine (no lustre, no ib) compilations with
>>>> --disable-mpi-io *or* --disable-io-romio, and both worked correctly and
>>>> I could run things. Note, that the flags are truly different meanwhile,
>>>> since the second flag is now equivalent to --enable-mca-no-build=io:romio
>>>> The first flag disables the io, fcoll, fs and sharedfp frameworks.
>>>> (prior to ompio they had basically the same effect).
>>>> 
>>>> In your particular case this means, that you disabled romio, but the
>>>> entire ompio stack is still compiled, and error must come from that
>>>> portion. If my suspecion is correct, it is still liblustre
>>>> messing around with the malloc hooks, and that causes the stack frame to
>>>> be completely broken. I thought I fixed that since we did not have the
>>>> issue on trunk, but we did observe that in the 1.7 branch 1-2 days back
>>>> as well, and I was looking into that.
>>>> 
>>>> That being said, there is another malloc-hooks issue that makes me a bit
>>>> nervous. The compilation of the otf stuff produced a ton of warnings on
>>>> my machine with gcc4.6.2 also with respect to the _malloc_hooks and
>>>> _realloc_hooks. Not sure whether this contributed to the problem as
>>>> well, just thought I bring it up since we seem to have a corrupted stack
>>>> frame problem.
>>>> 
>>>> Thanks
>>>> Edgar
>>>> 
>>>> 
>>>> On 10/30/2012 8:29 AM, Edgar Gabriel wrote:
>>>>> ok, I'll look into this. I noticed a problem with static builds on
>>>>> lustre file systems recently, and I was wandering whether its the same
>>>>> issue or not. But I'll check what's going on.
>>>>> 
>>>>> THanks
>>>>> Edgar
>>>>> 
>>>>> On 10/30/2012 7:22 AM, Ralph Castain wrote:
>>>>>> No to Lustre, and I didn't build static
>>>>>> 
>>>>>> I'm not sure what, if any, parallel file system might be present. In the 
>>>>>> case that works, I just built with no configure args other than prefix. 
>>>>>> ompi_info shows both romio and mpio built, but nothing more about what 
>>>>>> support they built internally.
>>>>>> 
>>>>>> 
>>>>>> On Oct 30, 2012, at 4:14 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote:
>>>>>> 
>>>>>>> Ralph,
>>>>>>> 
>>>>>>> just out curiosity: is there a lustre file system on the machine and is
>>>>>>> this a static build ?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Edgar
>>>>>>> 
>>>>>>> On 10/29/2012 9:17 PM, Ralph Castain wrote:
>>>>>>>> Hmmm...I added that directory and tried this on odin (which is an 
>>>>>>>> IB-based machine). Any MPI proc segfaults:
>>>>>>>> 
>>>>>>>> Core was generated by `./hello'.
>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>> w#0  _sysio_p_validate (pno=0x0, intnt=0x0, path=0x0) at 
>>>>>>>> src/inode.c:574
>>>>>>>> 574    src/inode.c: No such file or directory.
>>>>>>>>        in src/inode.c
>>>>>>>> (gdb) where
>>>>>>>> #0  _sysio_p_validate (pno=0x0, intnt=0x0, path=0x0) at src/inode.c:574
>>>>>>>> #1  0x00002aaaabd3f3e9 in _sysio_path_walk (parent=0x0, 
>>>>>>>> nd=0x7fffffffd8e0) at src/namei.c:216
>>>>>>>> #2  0x00002aaaabd3faad in _sysio_namei (parent=0x0, path=<value 
>>>>>>>> optimized out>, flags=0, intnt=0x7fffffffd950, pnop=0x7fffffffd970) at 
>>>>>>>> src/namei.c:505
>>>>>>>> #3  0x00002aaaabd3fd98 in open (path=0x2aaaac24280f 
>>>>>>>> "/sys/devices/system/node", flags=<value optimized out>) at 
>>>>>>>> src/open.c:179
>>>>>>>> #4  0x00002aaaabd43d5b in opendir (name=0x2aaaac24280f 
>>>>>>>> "/sys/devices/system/node") at src/stddir.c:60
>>>>>>>> #5  0x00002aaaac241825 in numa_max_node () from /usr/lib64/libnuma.so.1
>>>>>>>> #6  0x00002aaaac241d13 in numa_init () from /usr/lib64/libnuma.so.1
>>>>>>>> #7  0x00002aaaaaab845b in call_init () from /lib64/ld-linux-x86-64.so.2
>>>>>>>> #8  0x00002aaaaaab8565 in _dl_init_internal () from 
>>>>>>>> /lib64/ld-linux-x86-64.so.2
>>>>>>>> #9  0x00002aaaaaaabaaa in _dl_start_user () from 
>>>>>>>> /lib64/ld-linux-x86-64.so.2
>>>>>>>> #10 0x0000000000000001 in ?? ()
>>>>>>>> #11 0x00007fffffffe03c in ?? ()
>>>>>>>> #12 0x0000000000000000 in ?? ()
>>>>>>>> 
>>>>>>>> I got the same thing whether I excluded openib or not. I then ran on 
>>>>>>>> my Linux cluster, which doesn't have IB at all - and it ran fine. Also 
>>>>>>>> runs clean on the Mac. However, in both those cases, I had left IO 
>>>>>>>> romio enabled.
>>>>>>>> 
>>>>>>>> Now on odin, I always disable-io-romio. So I tried deliberately 
>>>>>>>> enabling it, and everything works. So this appears to be something 
>>>>>>>> that the IO work has broken.
>>>>>>>> 
>>>>>>>> Edgar: can you please fix --disable-io-romio?
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Ralph
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Oct 29, 2012, at 11:55 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote:
>>>>>>>> 
>>>>>>>>> I'm sorry to add one more thing to the list, but beyond this file, it
>>>>>>>>> looks like also the entire ompi/mca/common/verbs/ directory is also
>>>>>>>>> missing in the 1.7 branch, but is required to compile the bcoll
>>>>>>>>> framework.  It is there in the trunk, but missing in the 1.7 branch...
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> Edgar
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 10/26/2012 5:31 PM, Ralph Castain wrote:
>>>>>>>>>> Okay, I'll fix for tonights tarball.
>>>>>>>>>> 
>>>>>>>>>> Thanks!
>>>>>>>>>> 
>>>>>>>>>> On Oct 26, 2012, at 3:28 PM, "Shamis, Pavel" <sham...@ornl.gov> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> There is a bug in makefile. The file existing in svn, but it is not 
>>>>>>>>>>> listed in the Makefile.am. As a result, it wasn't pulled to the 
>>>>>>>>>>> tarball.
>>>>>>>>>>> 
>>>>>>>>>>> Pavel (Pasha) Shamis
>>>>>>>>>>> ---
>>>>>>>>>>> Computer Science Research Group
>>>>>>>>>>> Computer Science and Math Division
>>>>>>>>>>> Oak Ridge National Laboratory
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Oct 26, 2012, at 2:33 PM, Edgar Gabriel wrote:
>>>>>>>>>>> 
>>>>>>>>>>> we have trouble compiling the 1.7 series on a machine in Dresden.
>>>>>>>>>>> Specifically, we receive an error message when compiling the
>>>>>>>>>>> bcol/iboffload component (other infiniband components compile fine).
>>>>>>>>>>> 
>>>>>>>>>>> Any idea/suggestions what we might be doing wrong or what to look 
>>>>>>>>>>> for?
>>>>>>>>>>> 
>>>>>>>>>>> make[2]: Entering directory
>>>>>>>>>>> `/home/h2/gabriel/openmpi-1.7rc4/ompi/mca/bcol/iboffload'
>>>>>>>>>>> CC       bcol_iboffload_module.lo
>>>>>>>>>>> CC       bcol_iboffload_mca.lo
>>>>>>>>>>> CC       bcol_iboffload_endpoint.lo
>>>>>>>>>>> CC       bcol_iboffload_frag.lo
>>>>>>>>>>> In file included from bcol_iboffload_frag.c:16:0:
>>>>>>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No 
>>>>>>>>>>> such
>>>>>>>>>>> file or directory
>>>>>>>>>>> compilation terminated.
>>>>>>>>>>> make[2]: *** [bcol_iboffload_frag.lo] Error 1
>>>>>>>>>>> make[2]: *** Waiting for unfinished jobs....
>>>>>>>>>>> In file included from bcol_iboffload_mca.c:18:0:
>>>>>>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No 
>>>>>>>>>>> such
>>>>>>>>>>> file or directory
>>>>>>>>>>> compilation terminated.
>>>>>>>>>>> make[2]: *** [bcol_iboffload_mca.lo] Error 1
>>>>>>>>>>> In file included from bcol_iboffload_endpoint.c:23:0:
>>>>>>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No 
>>>>>>>>>>> such
>>>>>>>>>>> file or directory
>>>>>>>>>>> compilation terminated.
>>>>>>>>>>> make[2]: *** [bcol_iboffload_endpoint.lo] Error 1
>>>>>>>>>>> In file included from bcol_iboffload_module.c:39:0:
>>>>>>>>>>> bcol_iboffload.h:46:36: fatal error: bcol_iboffload_qp_info.h: No 
>>>>>>>>>>> such
>>>>>>>>>>> file or directory
>>>>>>>>>>> compilation terminated.
>>>>>>>>>>> make[2]: *** [bcol_iboffload_module.lo] Error 1
>>>>>>>>>>> make[2]: Leaving directory
>>>>>>>>>>> `/home/h2/gabriel/openmpi-1.7rc4/ompi/mca/bcol/iboffload'
>>>>>>>>>>> make[1]: *** [all-recursive] Error 1
>>>>>>>>>>> make[1]: Leaving directory `/home/h2/gabriel/openmpi-1.7rc4/ompi'
>>>>>>>>>>> make: *** [all-recursive] Error 1
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> Edgar
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Edgar Gabriel
>>>>>>>>>>> Associate Professor
>>>>>>>>>>> Parallel Software Technologies Lab      http://pstl.cs.uh.edu
>>>>>>>>>>> Department of Computer Science          University of Houston
>>>>>>>>>>> Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
>>>>>>>>>>> Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
>>>>>>>>>>> 
>>>>>>>>>>> <signature.asc>_______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> de...@open-mpi.org<mailto:de...@open-mpi.org>
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> de...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> de...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> Edgar Gabriel
>>>>>>>>> Associate Professor
>>>>>>>>> Parallel Software Technologies Lab      http://pstl.cs.uh.edu
>>>>>>>>> Department of Computer Science          University of Houston
>>>>>>>>> Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
>>>>>>>>> Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> de...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> de...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Edgar Gabriel
>>>>>>> Associate Professor
>>>>>>> Parallel Software Technologies Lab      http://pstl.cs.uh.edu
>>>>>>> Department of Computer Science          University of Houston
>>>>>>> Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
>>>>>>> Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>> 
>>>> -- 
>>>> Edgar Gabriel
>>>> Associate Professor
>>>> Parallel Software Technologies Lab      http://pstl.cs.uh.edu
>>>> Department of Computer Science          University of Houston
>>>> Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
>>>> Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> -- 
>> Edgar Gabriel
>> Associate Professor
>> Parallel Software Technologies Lab      http://pstl.cs.uh.edu
>> Department of Computer Science          University of Houston
>> Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
>> Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Reply via email to