[OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-09 Thread Dennis McRitchie
Hi,

I'm encountering a strange problem and can't find it having been discussed on 
this mailing list.

When building and running my parallel program using any recent Intel compiler 
and OpenMPI 1.2.8, TotalView behaves entirely correctly, displaying the 
"Process mpirun is a parallel job. Do you want to stop the job now?" dialog 
box, and stopping at the start of the program. The code displayed is the source 
code of my program's function main, and the stack trace window shows that we 
are stopped in the poll function many levels "up" from my main function's call 
to MPI_Init. I can then set breakpoints, single step, etc., and the code runs 
appropriately.

But when building and running using Intel compilers with OpenMPI 1.3.x or 
1.4.x, TotalView displays the usual dialog box, and stops at the start of the 
program; but my main program's source code is *not* displayed. The stack trace 
window again shows that we are stopped in the poll function several levels "up" 
from my main function's call to MPI_Init; but this time, the code displayed is 
the assembler code for the poll function itself.

If I click on 'main' in the stack trace window, the source code for my 
program's function main is then displayed, and I can now set breakpoints, 
single step, etc. as usual.

So why is the program's source code not displayed when using 1.3.x and 1.4.x, 
but is displayed when using 1.2.8. This change in behavior is fairly confusing 
to our users, and it would be nice to have it work as it used to, if possible.

Thanks,
   Dennis

Dennis McRitchie
Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University




Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-09 Thread Dennis McRitchie
Thanks Terry.

Unfortunately, -fno-omit-frame-pointer is the default for the Intel compiler 
when -g  is used, which I am using since it is necessary for source level 
debugging. So the compiler kindly tells me that it is ignoring your suggested 
option when I specify it.  :)

Also, since I can reproduce this problem by simply changing the OpenMPI 
version, without changing the compiler version, it strikes me as being more 
likely to be an OpenMPI-related issue: 1.2.8 works, but anything later does not 
(as described below).

I have tried different versions of TotalView from 8.1 to 8.9, but all behave 
the same.

I was wondering if a change to the openmpi-totalview.tcl script might be needed?

Dennis


From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Terry Dontje
Sent: Wednesday, February 09, 2011 5:02 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] Totalview not showing main program on startup with 
OpenMPI 1.3.x and 1.4.x

This sounds like something I ran into some time ago that involved the compiler 
omitting frame pointers.  You may want to try to compile your code with 
-fno-omit-frame-pointer.  I am unsure if you may need to do the same while 
building MPI though.

--td

On 02/09/2011 02:49 PM, Dennis McRitchie wrote:

Hi,



I'm encountering a strange problem and can't find it having been discussed on 
this mailing list.



When building and running my parallel program using any recent Intel compiler 
and OpenMPI 1.2.8, TotalView behaves entirely correctly, displaying the 
"Process mpirun is a parallel job. Do you want to stop the job now?" dialog 
box, and stopping at the start of the program. The code displayed is the source 
code of my program's function main, and the stack trace window shows that we 
are stopped in the poll function many levels "up" from my main function's call 
to MPI_Init. I can then set breakpoints, single step, etc., and the code runs 
appropriately.



But when building and running using Intel compilers with OpenMPI 1.3.x or 
1.4.x, TotalView displays the usual dialog box, and stops at the start of the 
program; but my main program's source code is *not* displayed. The stack trace 
window again shows that we are stopped in the poll function several levels "up" 
from my main function's call to MPI_Init; but this time, the code displayed is 
the assembler code for the poll function itself.



If I click on 'main' in the stack trace window, the source code for my 
program's function main is then displayed, and I can now set breakpoints, 
single step, etc. as usual.



So why is the program's source code not displayed when using 1.3.x and 1.4.x, 
but is displayed when using 1.2.8. This change in behavior is fairly confusing 
to our users, and it would be nice to have it work as it used to, if possible.



Thanks,

   Dennis



Dennis McRitchie

Computational Science and Engineering Support (CSES)

Academic Services Department

Office of Information Technology

Princeton University





___

users mailing list

us...@open-mpi.org<mailto:us...@open-mpi.org>

http://www.open-mpi.org/mailman/listinfo.cgi/users

--
[Oracle]
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com<mailto:terry.don...@oracle.com>




Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-11 Thread Dennis McRitchie
Hi Terry,

Someone else at the University builds the packages that I use, and we've been 
experimenting for the last few days with different openmpi build options to see 
what might be causing this.

Re the stack, I can always see the entire stack in the TV stack pane, and I can 
always click on 'main' in the stack pane and thereby make my main program's 
source code appear. I can then debug as usual. But, as you said, this is still 
no way to debug a program...

The only thing that might point the finger at OpenMPI is that the same build 
options led to different behavior when running with OpenMPI 1.2.8 vs. anything 
later. But I imagine that it will turn out to be related to the availability 
(or the lack thereof) of OpenMPI symbols to TotalView as to whether it thinks 
it should be displaying assembler or not.

I'll keep you posted with our progress.

Thanks for the tips.

Dennis

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Terry Dontje
Sent: Friday, February 11, 2011 6:38 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] Totalview not showing main program on startup with 
OpenMPI 1.3.x and 1.4.x

Sorry I have to ask this, did you build your lastest OMPI version, not just the 
application, with the -g flag too.

IIRC, when I ran into this issue I was actually able to do stepi's and 
eventually pop up the stack however that is really no way to debug a program 
:-).

Unless OMPI is somehow trashing the stack I don't see what OMPI could be doing 
to cause this type of an issue.  Again when I ran into this issue known working 
programs still worked I just was unable to get a full stack.  So it was 
definitely an interfacing issue between totalview and the executable (or the 
result of how the executable and libraries were compiled).   Another thing I 
noticed was when using Solaris Studio dbx I was also able to see the full stack 
where I could not when using totaview.  I am not sure if gdb could also see the 
full stack or not but it might be worth a try to attach gdb to a running 
program and see if you get a full stack.

--td


On 02/09/2011 05:35 PM, Dennis McRitchie wrote:
Thanks Terry.

Unfortunately, -fno-omit-frame-pointer is the default for the Intel compiler 
when -g  is used, which I am using since it is necessary for source level 
debugging. So the compiler kindly tells me that it is ignoring your suggested 
option when I specify it.  :)

Also, since I can reproduce this problem by simply changing the OpenMPI 
version, without changing the compiler version, it strikes me as being more 
likely to be an OpenMPI-related issue: 1.2.8 works, but anything later does not 
(as described below).

I have tried different versions of TotalView from 8.1 to 8.9, but all behave 
the same.

I was wondering if a change to the openmpi-totalview.tcl script might be needed?

Dennis


From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> 
[mailto:users-boun...@open-mpi.org] On Behalf Of Terry Dontje
Sent: Wednesday, February 09, 2011 5:02 PM
To: us...@open-mpi.org<mailto:us...@open-mpi.org>
Subject: Re: [OMPI users] Totalview not showing main program on startup with 
OpenMPI 1.3.x and 1.4.x

This sounds like something I ran into some time ago that involved the compiler 
omitting frame pointers.  You may want to try to compile your code with 
-fno-omit-frame-pointer.  I am unsure if you may need to do the same while 
building MPI though.

--td

On 02/09/2011 02:49 PM, Dennis McRitchie wrote:

Hi,



I'm encountering a strange problem and can't find it having been discussed on 
this mailing list.



When building and running my parallel program using any recent Intel compiler 
and OpenMPI 1.2.8, TotalView behaves entirely correctly, displaying the 
"Process mpirun is a parallel job. Do you want to stop the job now?" dialog 
box, and stopping at the start of the program. The code displayed is the source 
code of my program's function main, and the stack trace window shows that we 
are stopped in the poll function many levels "up" from my main function's call 
to MPI_Init. I can then set breakpoints, single step, etc., and the code runs 
appropriately.



But when building and running using Intel compilers with OpenMPI 1.3.x or 
1.4.x, TotalView displays the usual dialog box, and stops at the start of the 
program; but my main program's source code is *not* displayed. The stack trace 
window again shows that we are stopped in the poll function several levels "up" 
from my main function's call to MPI_Init; but this time, the code displayed is 
the assembler code for the poll function itself.



If I click on 'main' in the stack trace window, the source code for my 
program's function main is then displayed, and I can now set breakpoints, 
single step, etc. as usual.



So why is the program's source code not displayed when using 1.3.x an

[OMPI users] Can't get TotalView to find main program

2007-07-05 Thread Dennis McRitchie
rom
'/lib/libdl.so.2'...done
Library /opt/intel/fc/9.1.040/lib/libimf.so, with 2 asects, was linked
at 0x, and initially loaded at 0x902a3c00
Mapping 38346 bytes of ELF string data from
'/opt/intel/fc/9.1.040/lib/libimf.so'...done
Library /opt/intel/fc/9.1.040/lib/libirc.so, with 2 asects, was linked
at 0x, and initially loaded at 0x904e0900
Mapping 12223 bytes of ELF string data from
'/opt/intel/fc/9.1.040/lib/libirc.so'...done
Library /lib/ld-linux.so.2, with 2 asects, was linked at 0x00a6f000, and
initially loaded at 0x9000d600
Mapping 390 bytes of ELF string data from '/lib/ld-linux.so.2'...done
Indexing 348 bytes of DWARF '.eh_frame' symbols from
'/lib/ld-linux.so.2'...done
**
Automatically starting orterun
**
Library /lib/libnss_nis.so.2, with 2 asects, was linked at 0x,
and initially loaded at 0x90520c00
Mapping 1974 bytes of ELF string data from '/lib/libnss_nis.so.2'...done
Indexing 4 bytes of DWARF '.eh_frame' symbols from
'/lib/libnss_nis.so.2'...done
Library /lib/libnss_files.so.2, with 2 asects, was linked at 0x,
and initially loaded at 0x90528b00
Mapping 2020 bytes of ELF string data from
'/lib/libnss_files.so.2'...done
Indexing 4 bytes of DWARF '.eh_frame' symbols from
'/lib/libnss_files.so.2'...done

[the following is the output of my program]
rank = 0, size = 1, sysname = Linux, nodename = adroit, release =
2.6.9-42.0.10.ELsmp, version = #1 SMP Tue Feb 27 07:12:58 EST 2007,
machine = i686

Could not find the user's main function.
Check TV::Private::main_names in tvdinit.tvd
^^^

The output from the 64-bit system is essentially the same, except that
the checksum warnings don't appear, and the paths are different. 

tvdinit.tvd is located under the totalview root directory in
linux-x86/lib (32-bit system) or linux-x86-64/lib (64-bit system), and
appears to contain the correct main program name ("main"), which 'nm' is
able to display.

If I don't source openmpi-totalview.tcl in .tvdrc, then I still get the
last 2 lines above, but the program doesn't start. If I tell TV to run
it, then it runs to the end. In either case, I never see any source or
assembly code, so I can't put in any breakpoints.

Yet I can debug it under TotalView this way (i.e., via mpirun) when
building with and using MPICH, and I can debug it under TV with OpenMPI
if I run my program directly (i.e., not using mpirun or mpiexec).

Any idea why the main program can't be found when running under mpirun?
Does openmpi need to be built with either --enable-debug or
--enable-mem-debug? The "configure --help" says the former is not for
general MPI users. Unclear about the latter.

Thanks,
   Dennis

Dennis McRitchie
Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University



Re: [OMPI users] Can't get TotalView to find main program

2007-07-12 Thread Dennis McRitchie
Thanks for the reply Jeff.

Yes, I did compile my test app with -g, but unfortunately, our rpm build
process stripped the symbols from orterun, so that turned out to be the
culprit. Once we fixed that and used openmpi-totalview.tcl to start
things up, TotalView debugging started working.

Unfortunately, I still can't get the TotalView message queue feature to
work. The option is greyed out, probably because I got the following
error, once for every process:

In process mpirun.N: Failed to find the global symbol
MPID_recvs

where uname_test.intel is my test app, and N is the process' rank. Note
that I get the same error whether I built openmpi and my test app with
the Intel compiler or the gcc compiler.

In looking in /ompi/debuggers, I see that the error is
coming out of ompi_dll.c, and it caused by not finding either
"mca_pml_base_send_requests" or "mca_pml_base_recv_requests" in the
image. I presume that the image in question is either orterun or my test
app, and if I run the strings command against them, unsurprisingly I do
not find either of these strings.

But if I compile the same test app against the MPICH library, I *can*
use TotalView's message queue feature with it. So I think the problem is
not with the test app itself.

Is there anything I need to do to enable the viewing of message queues
with TV when using openmpi 1.2.3?

Thanks,
   Dennis

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Monday, July 09, 2007 10:06 AM
To: Open MPI Users
Subject: Re: [OMPI users] Can't get TotalView to find main program

On Jul 5, 2007, at 4:02 PM, Dennis McRitchie wrote:

> Any idea why the main program can't be found when running under 
> mpirun?

Just to be sure: you compiled your test MPI application with -g, right?

> Does openmpi need to be built with either --enable-debug or 
> --enable-mem-debug? The "configure --help" says the former is not for 
> general MPI users. Unclear about the latter.

No, both of those should be just for OMPI developers; you should not
need them for user installations.  Indeed, OMPI should build itself with
-g as relevant for TV support (i.e., use -g to compile the relevant .c
files in libmpi); you shouldn't need to build OMPI itself with -g.

--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Can't get TotalView to find main program

2007-07-12 Thread Dennis McRitchie
Thanks George.

That will be very helpful.

Dennis 

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of George Bosilca
Sent: Thursday, July 12, 2007 4:35 PM
To: Open MPI Users
Subject: Re: [OMPI users] Can't get TotalView to find main program

Dennis,

The message queue feature is not yet available on the 1.2.3. One should
use the latest version (from svn trunk or from the nightly
builds) in order to get it. I'll make sure it get included in 1.2.4 if
we release it before the 1.3.

   Thanks,
 george.

On Jul 12, 2007, at 3:34 PM, Dennis McRitchie wrote:

> Thanks for the reply Jeff.
>
> Yes, I did compile my test app with -g, but unfortunately, our rpm 
> build process stripped the symbols from orterun, so that turned out to

> be the culprit. Once we fixed that and used openmpi-totalview.tcl to 
> start things up, TotalView debugging started working.
>
> Unfortunately, I still can't get the TotalView message queue feature 
> to work. The option is greyed out, probably because I got the 
> following error, once for every process:
>
> In process mpirun.N: Failed to find the global 
> symbol MPID_recvs
>
> where uname_test.intel is my test app, and N is the process' rank.  
> Note
> that I get the same error whether I built openmpi and my test app with

> the Intel compiler or the gcc compiler.
>
> In looking in /ompi/debuggers, I see that the error 
> is coming out of ompi_dll.c, and it caused by not finding either 
> "mca_pml_base_send_requests" or "mca_pml_base_recv_requests" in the 
> image. I presume that the image in question is either orterun or my 
> test app, and if I run the strings command against them, 
> unsurprisingly I do not find either of these strings.
>
> But if I compile the same test app against the MPICH library, I *can* 
> use TotalView's message queue feature with it. So I think the problem 
> is not with the test app itself.
>
> Is there anything I need to do to enable the viewing of message queues

> with TV when using openmpi 1.2.3?
>
> Thanks,
>Dennis
>
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-bounces@open- mpi.org] 
> On Behalf Of Jeff Squyres
> Sent: Monday, July 09, 2007 10:06 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Can't get TotalView to find main program
>
> On Jul 5, 2007, at 4:02 PM, Dennis McRitchie wrote:
>
>> Any idea why the main program can't be found when running under 
>> mpirun?
>
> Just to be sure: you compiled your test MPI application with -g, 
> right?
>
>> Does openmpi need to be built with either --enable-debug or 
>> --enable-mem-debug? The "configure --help" says the former is not for

>> general MPI users. Unclear about the latter.
>
> No, both of those should be just for OMPI developers; you should not 
> need them for user installations.  Indeed, OMPI should build itself 
> with -g as relevant for TV support (i.e., use -g to compile the 
> relevant .c files in libmpi); you shouldn't need to build OMPI itself 
> with -g.
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Multiple definition of `malloc' and other symbols when doing static builds with mpif90

2006-12-13 Thread Dennis McRitchie
When creating a static build of an MPI program, I get a number of fatal
error messages, as listed below. They are all regarding conflicts
between duplicate definitions and different sizes of malloc, free,
realloc, etc. for the RHEL4 and openmpi versions of these functions. I
could build openmpi with --without-memorymanager, but we are using
infiniband and need the memory manager features.

I am using openmpi v1.1.2 built against the Intel fortran compiler v9.1
on RHEL4.

"/usr/local/openmpi-intel/bin/mpif90 -showme" returns:

ifort -I/usr/local/openmpi-1.1.2-intel/include -pthread
-I/usr/local/openmpi-1.1.2-intel/lib
-L/usr/local/openmpi-1.1.2-intel/lib -lmpi_f90 -lmpi -lorte -lopal -lrt
-lpbs -ldl -Wl,--export-dynamic -lnsl -lutil -ldl

The offending line is the one that links the program. I've added the -v
option so you can see all the utilities invoked below.

Is there any way to prevent this behavior?

Thanks,
   Dennis


/usr/local/openmpi-intel/bin/mpif90 -v -traceback -mp -warn all -module
mod -X -I. -static  -g   -o OFDFT  obj/dcstep.o  obj/dcsrch.o
obj/F77flush.o  obj/Constants.o  obj/Timer.o  obj/Fourier.o
obj/MathFunctions.o  obj/DataTypes.o  obj/GridUtilities.o  obj/System.o
obj/Output.o  obj/Ewald.o  obj/FunctionalDataStruct.o
obj/FunctionalPotential.o  obj/FunctionalKinetic.o  obj/Calculator.o
obj/DiscretizePDE.o  obj/DiscretizeOFDFT.o  obj/Multigrid.o
obj/MultigridOptimizers.o  obj/CellOptimizers.o  obj/IonOptimizers.o
obj/RhoOptimizers.o  obj/Optimizer.o  obj/ReadInputFiles.o
obj/InitializeInputs.o  obj/OFDFT.o -L/usr/local/intel/lib -lrfftw_mpi
-lfftw_mpi -lrfftw -lfftw
Version 9.1
/opt/intel/fc/9.1.039/bin/fortcom-mP1OPT_version=910
-mGLOB_source_language=GLOB_SOURCE_LANGUAGE_F90 -mGLOB_tune_for_fort
-mGLOB_use_fort_dope_vector -mP2OPT_static_promotion
-mP1OPT_print_version=FALSE -mP3OPT_use_mspp_call_convention
-mCG_use_gas_got_workaround=F -mP2OPT_align_option_used=TRUE
"-mGLOB_options_string=-I/usr/local/openmpi-1.1.2-intel/include
-I/usr/local/openmpi-1.1.2-intel/lib -I. -pthread -v -traceback -mp
-warn all -module mod -X -static -g -o OFDFT -L/usr/local/intel/lib
-lrfftw_mpi -lfftw_mpi -lrfftw -lfftw
-L/usr/local/openmpi-1.1.2-intel/lib -lmpi_f90 -lmpi -lorte -lopal -lrt
-lpbs -ldl -Wl,--export-dynamic -lnsl -lutil -ldl"
-mGLOB_cxx_limited_range=FALSE -mGLOB_traceback
-mP3OPT_emit_line_numbers -mGLOB_debug_target=GLOB_DEBUG_TARGET_ALL
-mGLOB_debug_format=GLOB_DEBUG_FORMAT_DWARF20
-mGLOB_as_output_backup_file_name=/tmp/ifortABDp1Pas_.s
-mGLOB_machine_model=GLOB_MACHINE_MODEL_IA32_NONE
-mGLOB_use_base_pointer -mGLOB_maintain_precision
-mGLOB_precision_mask=0x -mP2OPT_subs_out_of_bound=FALSE
-mIPOPT_ninl_user_level=2 -mIPOPT_activate -mP2OPT_hlo -mIPOPT_link
-mIPOPT_ipo_activate -mIPOPT_ipo_mo_activate -mIPOPT_ipo_mo_nfiles=1
-mIPOPT_source_files_list=/tmp/ifortighInHlst
-mIPOPT_short_data_info=/tmp/ifort0zku2Ysdata
-mIPOPT_link_script_file=/tmp/ifort4SghHgscript -mIPOPT_global_data
"-mIPOPT_link_version=2.15.92.0.220040927"
"-mIPOPT_cmdline_link="/usr/lib/gcc/i386-redhat-linux/3.4.6/../../../crt
1.o" "/usr/lib/gcc/i386-redhat-linux/3.4.6/../../../crti.o"
"/usr/lib/gcc/i386-redhat-linux/3.4.6/crtbeginT.o" "-static" "-m"
"elf_i386" "-o" "OFDFT" "/opt/intel/fc/9.1.039/lib/for_main.o"
"obj/dcstep.o" "obj/dcsrch.o" "obj/F77flush.o" "obj/Constants.o"
"obj/Timer.o" "obj/Fourier.o" "obj/MathFunctions.o" "obj/DataTypes.o"
"obj/GridUtilities.o" "obj/System.o" "obj/Output.o" "obj/Ewald.o"
"obj/FunctionalDataStruct.o" "obj/FunctionalPotential.o"
"obj/FunctionalKinetic.o" "obj/Calculator.o" "obj/DiscretizePDE.o"
"obj/DiscretizeOFDFT.o" "obj/Multigrid.o" "obj/MultigridOptimizers.o"
"obj/CellOptimizers.o" "obj/IonOptimizers.o" "obj/RhoOptimizers.o"
"obj/Optimizer.o" "obj/ReadInputFiles.o""obj/InitializeInputs.o"
"obj/OFDFT.o" "-L/usr/local/intel/lib" "-lrfftw_mpi" "-lfftw_mpi"
"-lrfftw" "-lfftw" "-L/usr/local/openmpi-1.1.2-intel/lib" "-lmpi_f90"
"-lmpi" "-lorte" "-lopal" "-lrt" "-lpbs" "-ldl" "--export-dynamic"
"-lnsl" "-lutil" "-ldl" "-L/opt/intel/fc/9.1.039/lib"
"-L/usr/lib/gcc/i386-redhat-linux/3.4.6/"
"-L/usr/lib/gcc/i386-redhat-linux/3.4.6/../../../" "-lifport"
"-lifcoremt" "-limf" "-lm" "-lipgo" "-lpthread" "-lirc" "-ldl" "-lc"
"-lgcc" "-lgcc_eh" "-lirc_s" "-ldl" "-lc"
"/usr/lib/gcc/i386-redhat-linux/3.4.6/crtend.o"
"/usr/lib/gcc/i386-redhat-linux/3.4.6/../../../crtn.o"" -mIPOPT_save_il0
-mIPOPT_il_in_obj -mIPOPT_ipo_activate_warn=FALSE
-mIPOPT_obj_output_file_name=/tmp/ipo_ifortobAc47.o
"-mGLOB_linker_version=2.15.92.0.2 20040927"
-mP3OPT_asm_target=P3OPT_ASM_TARGET_GAS
-mGLOB_obj_output_file=/tmp/ipo_ifortobAc47.o
-mP1OPT_source_file_name=/tmp/ipo_ifortobAc47.f obj/dcstep.o
obj/dcsrch.o obj/F77flush.o obj/Constants.o obj/Timer.o obj/Fourier.o
obj/MathFunctions.o obj/DataTypes.o obj/GridUtilities.o obj/System.o
obj/Output.o obj/Ewald.o obj/FunctionalDataStruct.o
obj/FunctionalPotential.o obj/FunctionalKinetic.o obj/Calculator.

[OMPI users] mpicc problems finding libraries (mostly)

2006-12-21 Thread Dennis McRitchie
I am trying to build openmpi so that mpicc does not require me to set up
the compiler's environment, and so that any executables built with mpicc
can run without my having to point LD_LIBRARY_PATH to the openmpi lib
directory. I made some unsuccessful attempts to accomplish this (which I
describe below), but after building openmpi using the Intel compiler, I
found the following:

1) When typing "/mpicc -showme" I get:
/mpicc: error while loading shared libraries: libsvml.so:
cannot open shared object file: No such file or directory

I then set LD_LIBRARY_PATH to point to the Intel compiler libraries, and
now "-showme" works, and returns:
icc -I/usr/local/openmpi-1.1.2-intel/include
-I/usr/local/openmpi-1.1.2-intel/include/openmpi -pthread
-L/usr/local/openmpi-1.1.2-intel/lib -L/usr/ofed/lib -L/usr/ofed/lib64
-lmpi -lorte -lopal -libverbs -lrt -lpbs -lnsl -lutil

However...

2) When typing "/mpicc hello.c" I now get:

--
The Open MPI wrapper compiler was unable to find the specified compiler
icc in your PATH.

Note that this compiler was either specified at configure time or in
one of several possible environment variables.

--

Of course, this is due to the fact that -showme indicates that mpicc
invokes "icc" instead of "/icc". If I now set up the PATH
to the Intel compiler, it works. However...

3) When I try to run the executable thus created, I get:
./a.out: error while loading shared libraries: libmpi.so.0: cannot open
shared object file: No such file or directory

I now need to set LD_LIBRARY_PATH to point to the openmpi lib directory.
---
---

To avoid problems (1) and (2), I built openmpi with:
export CC=/opt/intel/cce/latest/bin/icc
export CXX=/opt/intel/cce/latest/bin/icpc
export F77=/opt/intel/fce/latest/bin/ifort
export FC=/opt/intel/fce/latest/bin/ifort
export
LDFLAGS="-Wl,-rpath,/opt/intel/cce/latest/lib,-rpath,/opt/intel/fce/late
st/lib"

But while this satisfied the configure script and all its tests, it did
not produce the results I hoped for.

To avoid problem (3), I added the following option to configure:
--with-wrapper-ldflags=-Wl,-rpath,/usr/local/openmpi-1.1.2-intel/lib

I was hoping "-showme" would add this to its parameters, but no such
luck. Looking at the build output, it seems that the
--with-wrapper-ldflags parameter seems to be parsed differently from how
LDFLAGS gets parsed, and I get a compilation line:
/opt/intel/cce/latest/bin/icc -O3 -DNDEBUG -fno-strict-aliasing -pthread
-Wl,-rpath -Wl,/opt/intel/cce/latest/lib -Wl,-rpath
-Wl,/opt/intel/fce/latest/lib -o .libs/opal_wrapper opal_wrapper.o
../../../opal/.libs/libopal.so -lnsl -lutil -Wl,--rpath
-Wl,/usr/local/openmpi-1.1.2-intel/lib

Notice that the rpath preceding the openmpi lib directory is specified
as "--rpath", which is probably why it is ignored. Is this perhaps a
bug?

Can you help me accomplish any or all of these goals?

Thanks.

Dennis McRitchie
Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University



[OMPI users] Can't run simple job with openmpi using the Intel compiler

2007-02-02 Thread Dennis McRitchie
When I submit a simple job (described below) using PBS, I always get one
of the following two errors:
1) [adroit-28:03945] [0,0,1]-[0,0,0] mca_oob_tcp_peer_recv_blocking:
recv() failed with errno=104

2) [adroit-30:03770] [0,0,3]-[0,0,0] mca_oob_tcp_peer_complete_connect:
connection failed (errno=111) - retrying (pid=3770)

The program does a uname and prints out results to standard out. The
only MPI calls it makes are MPI_Init, MPI_Comm_size, MPI_Comm_rank, and
MPI_Finalize. I have tried it with both openmpi v 1.1.2 and 1.1.4, built
with Intel C compiler 9.1.045, and get the same results. But if I build
the same versions of openmpi using gcc, the test program always works
fine. The app itself is built with mpicc.

It runs successfully if run from the command line with "mpiexec -n X
", where X is 1 to 8, but if I wrap it in the
following qsub command file:
---
#PBS -l pmem=512mb,nodes=1:ppn=1,walltime=0:10:00
#PBS -m abe
# #PBS -o /home0/dmcr/my_mpi/curt/uname_test.gcc.stdout
# #PBS -e /home0/dmcr/my_mpi/curt/uname_test.gcc.stderr

cd /home/dmcr/my_mpi/openmpi
echo "About to call mpiexec"
module list
mpiexec -n 1 uname_test.intel
echo "After call to mpiexec"


it fails on any number of processors from 1 to 8, and the application
segfaults.

The complete standard error of an 8-processsor job follows (note that
mpiexec ran on adroit-31, but usually there is no info about adroit-31
in standard error):
-
Currently Loaded Modulefiles:
  1) intel/9.1/32/C/9.1.045 4) intel/9.1/32/default
  2) intel/9.1/32/Fortran/9.1.040   5) openmpi/intel/1.1.2/32
  3) intel/9.1/32/Iidb/9.1.045
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x5
[0] func:/usr/local/openmpi/1.1.4/intel/i386/lib/libopal.so.0 [0xb72c5b]
*** End of error message ***
^@[adroit-29:03934] [0,0,2]-[0,0,0] mca_oob_tcp_peer_recv_blocking:
recv() failed with errno=104
[adroit-28:03945] [0,0,1]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv()
failed with errno=104
[adroit-30:03770] [0,0,3]-[0,0,0] mca_oob_tcp_peer_complete_connect:
connection failed (errno=111) - retrying (pid=3770)
--

The complete standard error of an 1-processsor job follows:
--
Currently Loaded Modulefiles:
  1) intel/9.1/32/C/9.1.045 4) intel/9.1/32/default
  2) intel/9.1/32/Fortran/9.1.040   5) openmpi/intel/1.1.2/32
  3) intel/9.1/32/Iidb/9.1.045
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x2
[0] func:/usr/local/openmpi/1.1.2/intel/i386/lib/libopal.so.0 [0x27d847]
*** End of error message ***
^@[adroit-31:08840] [0,0,1]-[0,0,0] mca_oob_tcp_peer_complete_connect:
connection failed (errno=111) - retrying (pid=8840)
---

Any thoughts as to why this might be failing?

Thanks,
   Dennis

Dennis McRitchie
Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University



Re: [OMPI users] Can't run simple job with openmpi using the Intel compiler

2007-02-02 Thread Dennis McRitchie
Also, I see mention in your FAQ about config.log. My openmpi does not
appear to be generating it, at least not anywhere in the install tree.
How can I enable the creation of the log file?

Thanks,
   Dennis

-Original Message-
From: Dennis McRitchie 
Sent: Friday, February 02, 2007 6:08 PM
To: 'Open MPI Users'
Subject: Can't run simple job with openmpi using the Intel compiler

When I submit a simple job (described below) using PBS, I always get one
of the following two errors:
1) [adroit-28:03945] [0,0,1]-[0,0,0] mca_oob_tcp_peer_recv_blocking:
recv() failed with errno=104

2) [adroit-30:03770] [0,0,3]-[0,0,0] mca_oob_tcp_peer_complete_connect:
connection failed (errno=111) - retrying (pid=3770)

The program does a uname and prints out results to standard out. The
only MPI calls it makes are MPI_Init, MPI_Comm_size, MPI_Comm_rank, and
MPI_Finalize. I have tried it with both openmpi v 1.1.2 and 1.1.4, built
with Intel C compiler 9.1.045, and get the same results. But if I build
the same versions of openmpi using gcc, the test program always works
fine. The app itself is built with mpicc.

It runs successfully if run from the command line with "mpiexec -n X
", where X is 1 to 8, but if I wrap it in the
following qsub command file:
---
#PBS -l pmem=512mb,nodes=1:ppn=1,walltime=0:10:00
#PBS -m abe
# #PBS -o /home0/dmcr/my_mpi/curt/uname_test.gcc.stdout
# #PBS -e /home0/dmcr/my_mpi/curt/uname_test.gcc.stderr

cd /home/dmcr/my_mpi/openmpi
echo "About to call mpiexec"
module list
mpiexec -n 1 uname_test.intel
echo "After call to mpiexec"


it fails on any number of processors from 1 to 8, and the application
segfaults.

The complete standard error of an 8-processsor job follows (note that
mpiexec ran on adroit-31, but usually there is no info about adroit-31
in standard error):
-
Currently Loaded Modulefiles:
  1) intel/9.1/32/C/9.1.045 4) intel/9.1/32/default
  2) intel/9.1/32/Fortran/9.1.040   5) openmpi/intel/1.1.2/32
  3) intel/9.1/32/Iidb/9.1.045
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x5
[0] func:/usr/local/openmpi/1.1.4/intel/i386/lib/libopal.so.0 [0xb72c5b]
*** End of error message ***
^@[adroit-29:03934] [0,0,2]-[0,0,0] mca_oob_tcp_peer_recv_blocking:
recv() failed with errno=104
[adroit-28:03945] [0,0,1]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv()
failed with errno=104
[adroit-30:03770] [0,0,3]-[0,0,0] mca_oob_tcp_peer_complete_connect:
connection failed (errno=111) - retrying (pid=3770)
--

The complete standard error of an 1-processsor job follows:
--
Currently Loaded Modulefiles:
  1) intel/9.1/32/C/9.1.045 4) intel/9.1/32/default
  2) intel/9.1/32/Fortran/9.1.040   5) openmpi/intel/1.1.2/32
  3) intel/9.1/32/Iidb/9.1.045
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x2
[0] func:/usr/local/openmpi/1.1.2/intel/i386/lib/libopal.so.0 [0x27d847]
*** End of error message ***
^@[adroit-31:08840] [0,0,1]-[0,0,0] mca_oob_tcp_peer_complete_connect:
connection failed (errno=111) - retrying (pid=8840)
---

Any thoughts as to why this might be failing?

Thanks,
   Dennis

Dennis McRitchie
Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University



Re: [OMPI users] Can't run simple job with openmpi using the Intelcompiler

2007-02-03 Thread Dennis McRitchie
Sorry. I just realized that you must mean the log file created by the
configure script. I was looking for a OpenMPI runtime log file. Is there
such a thing?

Dennis

> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Dennis McRitchie
> Sent: Friday, February 02, 2007 6:12 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Can't run simple job with openmpi 
> using the Intelcompiler
> 
> Also, I see mention in your FAQ about config.log. My openmpi does not
> appear to be generating it, at least not anywhere in the install tree.
> How can I enable the creation of the log file?
> 
> Thanks,
>Dennis
> 
> -Original Message-
> From: Dennis McRitchie 
> Sent: Friday, February 02, 2007 6:08 PM
> To: 'Open MPI Users'
> Subject: Can't run simple job with openmpi using the Intel compiler
> 
> When I submit a simple job (described below) using PBS, I 
> always get one
> of the following two errors:
> 1) [adroit-28:03945] [0,0,1]-[0,0,0] mca_oob_tcp_peer_recv_blocking:
> recv() failed with errno=104
> 
> 2) [adroit-30:03770] [0,0,3]-[0,0,0] 
> mca_oob_tcp_peer_complete_connect:
> connection failed (errno=111) - retrying (pid=3770)
> 
> The program does a uname and prints out results to standard out. The
> only MPI calls it makes are MPI_Init, MPI_Comm_size, 
> MPI_Comm_rank, and
> MPI_Finalize. I have tried it with both openmpi v 1.1.2 and 
> 1.1.4, built
> with Intel C compiler 9.1.045, and get the same results. But 
> if I build
> the same versions of openmpi using gcc, the test program always works
> fine. The app itself is built with mpicc.
> 
> It runs successfully if run from the command line with "mpiexec -n X
> ", where X is 1 to 8, but if I wrap it in the
> following qsub command file:
> ---
> #PBS -l pmem=512mb,nodes=1:ppn=1,walltime=0:10:00
> #PBS -m abe
> # #PBS -o /home0/dmcr/my_mpi/curt/uname_test.gcc.stdout
> # #PBS -e /home0/dmcr/my_mpi/curt/uname_test.gcc.stderr
> 
> cd /home/dmcr/my_mpi/openmpi
> echo "About to call mpiexec"
> module list
> mpiexec -n 1 uname_test.intel
> echo "After call to mpiexec"
> 
> 
> it fails on any number of processors from 1 to 8, and the application
> segfaults.
> 
> The complete standard error of an 8-processsor job follows (note that
> mpiexec ran on adroit-31, but usually there is no info about adroit-31
> in standard error):
> -
> Currently Loaded Modulefiles:
>   1) intel/9.1/32/C/9.1.045 4) intel/9.1/32/default
>   2) intel/9.1/32/Fortran/9.1.040   5) openmpi/intel/1.1.2/32
>   3) intel/9.1/32/Iidb/9.1.045
> Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> Failing at addr:0x5
> [0] func:/usr/local/openmpi/1.1.4/intel/i386/lib/libopal.so.0 
> [0xb72c5b]
> *** End of error message ***
> ^@[adroit-29:03934] [0,0,2]-[0,0,0] mca_oob_tcp_peer_recv_blocking:
> recv() failed with errno=104
> [adroit-28:03945] [0,0,1]-[0,0,0] 
> mca_oob_tcp_peer_recv_blocking: recv()
> failed with errno=104
> [adroit-30:03770] [0,0,3]-[0,0,0] mca_oob_tcp_peer_complete_connect:
> connection failed (errno=111) - retrying (pid=3770)
> --
> 
> The complete standard error of an 1-processsor job follows:
> --
> Currently Loaded Modulefiles:
>   1) intel/9.1/32/C/9.1.045 4) intel/9.1/32/default
>   2) intel/9.1/32/Fortran/9.1.040   5) openmpi/intel/1.1.2/32
>   3) intel/9.1/32/Iidb/9.1.045
> Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> Failing at addr:0x2
> [0] func:/usr/local/openmpi/1.1.2/intel/i386/lib/libopal.so.0 
> [0x27d847]
> *** End of error message ***
> ^@[adroit-31:08840] [0,0,1]-[0,0,0] mca_oob_tcp_peer_complete_connect:
> connection failed (errno=111) - retrying (pid=8840)
> ---
> 
> Any thoughts as to why this might be failing?
> 
> Thanks,
>Dennis
> 
> Dennis McRitchie
> Computational Science and Engineering Support (CSES)
> Academic Services Department
> Office of Information Technology
> Princeton University
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 



Re: [OMPI users] Can't run simple job with openmpi using the Intel compiler

2007-02-05 Thread Dennis McRitchie
Thanks for the suggestion, and I should have mentioned it, but there is
no firewall set up on any of the compute nodes. Only on the head node on
the eth interface to the outside world are there firewall restrictions.

Also, as I mentioned, I was able to build and run this test app
successfully using a gcc-built openmpi with a gcc-built application.
This only happens running the Intel-compiler-built openmpi with an
Intel-compiler-built application. 

Any other thoughts?

Dennis

> -Original Message-
> From: Gurhan Ozen [mailto:gurhan.o...@gmail.com] 
> Sent: Sunday, February 04, 2007 3:10 PM
> To: Open MPI Users; Dennis McRitchie
> Subject: Re: [OMPI users] Can't run simple job with openmpi 
> using the Intel compiler
> 
> On 2/2/07, Dennis McRitchie  wrote:
> > When I submit a simple job (described below) using PBS, I 
> always get one
> > of the following two errors:
> > 1) [adroit-28:03945] [0,0,1]-[0,0,0] mca_oob_tcp_peer_recv_blocking:
> > recv() failed with errno=104
> >
> > 2) [adroit-30:03770] [0,0,3]-[0,0,0] 
> mca_oob_tcp_peer_complete_connect:
> > connection failed (errno=111) - retrying (pid=3770)
> >
> 
>  Hi Dennis,
> Looks like you could be blocked by a firewall. Can you make sure that
> you disable firewalls on  both nodes and try ?
> 
> gurhan
> 
> > The program does a uname and prints out results to standard out. The
> > only MPI calls it makes are MPI_Init, MPI_Comm_size, 
> MPI_Comm_rank, and
> > MPI_Finalize. I have tried it with both openmpi v 1.1.2 and 
> 1.1.4, built
> > with Intel C compiler 9.1.045, and get the same results. 
> But if I build
> > the same versions of openmpi using gcc, the test program 
> always works
> > fine. The app itself is built with mpicc.
> >
> > It runs successfully if run from the command line with "mpiexec -n X
> > ", where X is 1 to 8, but if I wrap it in the
> > following qsub command file:
> > ---
> > #PBS -l pmem=512mb,nodes=1:ppn=1,walltime=0:10:00
> > #PBS -m abe
> > # #PBS -o /home0/dmcr/my_mpi/curt/uname_test.gcc.stdout
> > # #PBS -e /home0/dmcr/my_mpi/curt/uname_test.gcc.stderr
> >
> > cd /home/dmcr/my_mpi/openmpi
> > echo "About to call mpiexec"
> > module list
> > mpiexec -n 1 uname_test.intel
> > echo "After call to mpiexec"
> > 
> >
> > it fails on any number of processors from 1 to 8, and the 
> application
> > segfaults.
> >
> > The complete standard error of an 8-processsor job follows 
> (note that
> > mpiexec ran on adroit-31, but usually there is no info 
> about adroit-31
> > in standard error):
> > -
> > Currently Loaded Modulefiles:
> >   1) intel/9.1/32/C/9.1.045 4) intel/9.1/32/default
> >   2) intel/9.1/32/Fortran/9.1.040   5) openmpi/intel/1.1.2/32
> >   3) intel/9.1/32/Iidb/9.1.045
> > Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> > Failing at addr:0x5
> > [0] 
> func:/usr/local/openmpi/1.1.4/intel/i386/lib/libopal.so.0 [0xb72c5b]
> > *** End of error message ***
> > ^@[adroit-29:03934] [0,0,2]-[0,0,0] mca_oob_tcp_peer_recv_blocking:
> > recv() failed with errno=104
> > [adroit-28:03945] [0,0,1]-[0,0,0] 
> mca_oob_tcp_peer_recv_blocking: recv()
> > failed with errno=104
> > [adroit-30:03770] [0,0,3]-[0,0,0] mca_oob_tcp_peer_complete_connect:
> > connection failed (errno=111) - retrying (pid=3770)
> > --
> >
> > The complete standard error of an 1-processsor job follows:
> > --
> > Currently Loaded Modulefiles:
> >   1) intel/9.1/32/C/9.1.045 4) intel/9.1/32/default
> >   2) intel/9.1/32/Fortran/9.1.040   5) openmpi/intel/1.1.2/32
> >   3) intel/9.1/32/Iidb/9.1.045
> > Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> > Failing at addr:0x2
> > [0] 
> func:/usr/local/openmpi/1.1.2/intel/i386/lib/libopal.so.0 [0x27d847]
> > *** End of error message ***
> > ^@[adroit-31:08840] [0,0,1]-[0,0,0] 
> mca_oob_tcp_peer_complete_connect:
> > connection failed (errno=111) - retrying (pid=8840)
> > ---
> >
> > Any thoughts as to why this might be failing?
> >
> > Thanks,
> >Dennis
> >
> > Dennis McRitchie
> > Computational Science and Engineering Support (CSES)
> > Academic Services Department
> > Office of Information Technology
> > Princeton University
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>