Re: [OMPI users] segfault on finalize
You are right. An update fixes the problem. Sorry. Thomas Jeff Squyres wrote: It's a fairly strange place to get an error -- mca_base_param_finalize() is where we're tidying up command line parameters. There was some memory bugs that have been fixed since 21970. Can you update? On Sep 25, 2009, at 9:49 AM, Thomas Ropars wrote: Hi, I'm using r21970 of the trunk on Linux 2.6.18-3-amd64 and gcc version 4.2.3 (Debian 4.2.3-2). When I compile open mpi with the default options, it works. But if I use --with-platform=optimized option, then I get a segfault for every program I run. ==3073== Access not within mapped region at address 0x30 ==3073==at 0x535544D: mca_base_param_finalize (in /home/tropars/open-mpi/install/lib/libopen-pal.so.0.0.0) ==3073==by 0x5339D55: opal_finalize_util (in /home/tropars/open-mpi/install/lib/libopen-pal.so.0.0.0) ==3073==by 0x4E5A228: ompi_mpi_finalize (in /home/tropars/open-mpi/install/lib/libmpi.so.0.0.0) ==3073==by 0x400BF2: main (in /home/tropars/open-mpi/tests/ring) Regards, Thomas ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Debugging OpenMPI calls
Hello, I am new to OpenMPI library and I am trying to step through common MPI communication calls using gdb. I attach gdb to one of the processes (using the steps mentioned on the OpenMPI Debugging FAQ page) and set a breakpoint on 'MPI_Barrier' and expect gdb to jump into the definition of MPI_Barrier function. I've manually added -g3 compilation flag to the Makefiles in some of the directories that I thought relevant ({ROOT}/ompi/mpi/c etc). I also specified the source file paths in gdb using the 'dir' command. However, gdb is unable to jump into the appropriate source location when it hits the breakpoint. Could anyone please let me know if I am missing something here? Thanks for looking into my post. Regards, Aniruddha
Re: [OMPI users] MPI_Irecv segmentation fault
Yes I did, forgot to mention that in my last. Most of the example code I've seen online passes the buffer variable by reference... I think I've gotten past the segfault at this point, but it looks like MPI_Isend is never completing. I have an MPI_Test() that sets a flag immediately following the MPI_Irecv call, but the process seems to hang before it gets to it. Not really sure why it wouldn't complete. Everette On Tue, Sep 22, 2009 at 9:24 AM, jody wrote: > Did you also change the "&buffer" to buffer in your MPI_Send call? > > Jody > > On Tue, Sep 22, 2009 at 1:38 PM, Everette Clemmer wrote: >> Hmm, tried changing MPI_Irecv( &buffer) to MPI_Irecv( buffer...) >> and still no luck. Stack trace follows if that's helpful: >> >> prompt$ mpirun -np 2 ./display_test_debug >> Sending 'q' from node 0 to node 1 >> [COMPUTER:50898] *** Process received signal *** >> [COMPUTER:50898] Signal: Segmentation fault (11) >> [COMPUTER:50898] Signal code: (0) >> [COMPUTER:50898] Failing at address: 0x0 >> [COMPUTER:50898] [ 0] 2 libSystem.B.dylib >> 0x7fff87e280aa _sigtramp + 26 >> [COMPUTER:50898] [ 1] 3 ??? >> 0x 0x0 + 0 >> [COMPUTER:50898] [ 2] 4 GLUT >> 0x000100024a21 glutMainLoop + 261 >> [COMPUTER:50898] [ 3] 5 display_test_debug >> 0x00011444 xsMainLoop + 67 >> [COMPUTER:50898] [ 4] 6 display_test_debug >> 0x00011335 main + 59 >> [COMPUTER:50898] [ 5] 7 display_test_debug >> 0x00010d9c start + 52 >> [COMPUTER:50898] [ 6] 8 ??? >> 0x0001 0x0 + 1 >> [COMPUTER:50898] *** End of error message *** >> mpirun noticed that job rank 0 with PID 50897 on node COMPUTER.local >> exited on signal 15 (Terminated). >> 1 additional process aborted (not shown) >> >> Thanks, >> Everette >> >> >> On Tue, Sep 22, 2009 at 2:28 AM, Ake Sandgren >> wrote: >>> On Mon, 2009-09-21 at 19:26 -0400, Everette Clemmer wrote: Hey all, I'm getting a segmentation fault when I attempt to receive a single character via MPI_Irecv. Code follows: void recv_func() { if( !MASTER ) { char buffer[ 1 ]; int flag; MPI_Request request; MPI_Status status; MPI_Irecv( &buffer, 1, MPI_CHAR, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &request); >>> >>> It should be MPI_Irecv(buffer, 1, ...) >>> The segfault disappears if I comment out the MPI_Irecv call in recv_func so I'm assuming that there's something wrong with the parameters that I'm passing to it. Thoughts? >>> >>> -- >>> Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden >>> Internet: a...@hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90 7866126 >>> Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> >> -- >> - Everette >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- - Everette
Re: [OMPI users] How to create multi-thread parallel program using thread-safe send and recv?
On Sep 27, 2009, at 1:45 PM, guosong wrote: Hi Loh, I used MPI_Init_thread(&argc,&argv, MPI_THREAD_MULTIPLE, &provided); in my program and got provided = 0 which turns out to be the MPI_THREAD_SINGLE. Does this mean that I can not use MPI_THREAD_MULTIPLE model? Correct. To get Open MPI to support MPI_THREAD_MULTIPLE, you need to configure and build it with the --enable-mpi-threads switch to OMPI's ./ configure script. We don't build MPI_THREAD_MULTIPLE support by default because it does add some performance overhead. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] Debugging OpenMPI calls
You might want to just configure Open MPI with: ./configure CFLAGS=-g3 ... That will pass "-g3" to every Makefile in Open MPI. FWIW: I do variants on this technique and gdb is always able to jump to the right source location if I "break MPI_Barrier" (for example). We actually have a "--enable-debug" option to OMPI's configure, but it does turn on a bunch of other debugging code that will definitely result in performance degradation at run-time (one of its side effects is to add "-g" to every Makefile). On Sep 28, 2009, at 5:57 AM, Aniruddha Marathe wrote: Hello, I am new to OpenMPI library and I am trying to step through common MPI communication calls using gdb. I attach gdb to one of the processes (using the steps mentioned on the OpenMPI Debugging FAQ page) and set a breakpoint on 'MPI_Barrier' and expect gdb to jump into the definition of MPI_Barrier function. I've manually added -g3 compilation flag to the Makefiles in some of the directories that I thought relevant ({ROOT}/ompi/mpi/c etc). I also specified the source file paths in gdb using the 'dir' command. However, gdb is unable to jump into the appropriate source location when it hits the breakpoint. Could anyone please let me know if I am missing something here? Thanks for looking into my post. Regards, Aniruddha ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] How to create multi-thread parallel program using thread-safe send and recv?
Oh, thanks. I found that mpich2/gnu supports MPI_THREAD_MULTIPLE by default on my server. So if it supports MPI_THREAD_MULTIPLE, does it mean that I can run the program without segmentation fault (if there is no other bugs ^_^) > From: jsquy...@cisco.com > To: us...@open-mpi.org > Date: Mon, 28 Sep 2009 11:28:31 -0400 > Subject: Re: [OMPI users] How to create multi-thread parallel program using > thread-safe send and recv? > > On Sep 27, 2009, at 1:45 PM, guosong wrote: > > > Hi Loh, > > I used MPI_Init_thread(&argc,&argv, MPI_THREAD_MULTIPLE, &provided); > > in my program and got provided = 0 which turns out to be the > > MPI_THREAD_SINGLE. Does this mean that I can not use > > MPI_THREAD_MULTIPLE model? > > Correct. > > To get Open MPI to support MPI_THREAD_MULTIPLE, you need to configure > and build it with the --enable-mpi-threads switch to OMPI's ./ > configure script. We don't build MPI_THREAD_MULTIPLE support by > default because it does add some performance overhead. > > -- > Jeff Squyres > jsquy...@cisco.com > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users _ Messenger安全保护中心,免费修复系统漏洞,保护Messenger安全! http://im.live.cn/safe/
Re: [OMPI users] How to create multi-thread parallel program using thread-safe send and recv?
On Sep 28, 2009, at 11:48 AM, guosong wrote: Oh, thanks. I found that mpich2/gnu supports MPI_THREAD_MULTIPLE by default on my server. So if it supports MPI_THREAD_MULTIPLE, does it mean that I can run the program without segmentation fault (if there is no other bugs ^_^) Hypothetically, yes. :-) -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] How to create multi-thread parallel program using thread-safe send and recv?
Thanks. > From: jsquy...@cisco.com > To: us...@open-mpi.org > Date: Mon, 28 Sep 2009 11:49:36 -0400 > Subject: Re: [OMPI users] How to create multi-thread parallel program using > thread-safe send and recv? > > On Sep 28, 2009, at 11:48 AM, guosong wrote: > > > Oh, thanks. I found that mpich2/gnu supports MPI_THREAD_MULTIPLE by > > default on my server. So if it supports MPI_THREAD_MULTIPLE, does it > > mean that I can run the program without segmentation fault (if there > > is no other bugs ^_^) > > Hypothetically, yes. :-) > > -- > Jeff Squyres > jsquy...@cisco.com > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users _ Messenger安全保护中心,免费修复系统漏洞,保护Messenger安全! http://im.live.cn/safe/
[OMPI users] problem using openmpi with DMTCP
Dear All, I am trying to integrate DMTCP with openmpi. IF I run a c application, it works fine. But when I execute the program using mpirun, It checkpoints application but gives error when restarting the application. # [31007] WARNING at connection.cpp:303 in restore; REASON='JWARNING((_sockDomain == AF_INET || _sockDomain == AF_UNIX ) && _sockType == SOCK_STREAM) failed' id() = 2ab3f248-30933-4ac0d75a(99007) _sockDomain = 10 _sockType = 1 _sockProtocol = 0 Message: socket type not yet [fully] supported [31007] WARNING at connection.cpp:303 in restore; REASON='JWARNING((_sockDomain == AF_INET || _sockDomain == AF_UNIX ) && _sockType == SOCK_STREAM) failed' id() = 2ab3f248-30943-4ac0d75c(99007) _sockDomain = 10 _sockType = 1 _sockProtocol = 0 Message: socket type not yet [fully] supported [31013] WARNING at connection.cpp:87 in restartDup2; REASON='JWARNING(_real_dup2 ( oldFd, fd ) == fd) failed' oldFd = 537 fd = 1 (strerror((*__errno_location ( = Bad file descriptor [31013] WARNING at connectionmanager.cpp:627 in closeAll; REASON='JWARNING(_real_close ( i->second ) ==0) failed' i->second = 537 (strerror((*__errno_location ( = Bad file descriptor [31015] WARNING at connectionmanager.cpp:627 in closeAll; REASON='JWARNING(_real_close ( i->second ) ==0) failed' i->second = 537 (strerror((*__errno_location ( = Bad file descriptor [31017] WARNING at connectionmanager.cpp:627 in closeAll; REASON='JWARNING(_real_close ( i->second ) ==0) failed' i->second = 537 (strerror((*__errno_location ( = Bad file descriptor [31007] WARNING at connectionmanager.cpp:627 in closeAll; REASON='JWARNING(_real_close ( i->second ) ==0) failed' i->second = 537 (strerror((*__errno_location ( = Bad file descriptor MTCP: mtcp_restart_nolibc: mapping current version of /usr/lib/gconv/gconv-modules.cache into memory; _not_ file as it existed at time of checkpoint. Change mtcp_restart_nolibc.c:634 and re-compile, if you want different behavior. [31015] ERROR at connection.cpp:372 in restoreOptions; REASON='JASSERT(ret == 0) failed' (strerror((*__errno_location ( = Invalid argument fds[0] = 6 opt->first = 26 opt->second.size() = 4 Message: restoring setsockopt failed Terminating... # Any suggestions is very welcomed. regards, Raj
Re: [OMPI users] "Failed to find the following executable" problemunder Torque
Thanks for the reply. I looked harder at the command invocation and I think I stumbled across an answer. My actual mpirun command is invoked from a Python script using the subprocess module. When you create a subprocess, one of the options is "shell" and I had that set to False, causing the actual invocation to use spawn or exec (one of the variants) instead of system(). When I pass down the argument list as follows, mpirun fails with "cannot find executable named '--prefix /usr/mpi/intel/openmpi-1.2.8' " Command: ['mpirun', '--prefix /usr/mpi/intel/openmpi-1.2.8', '-np 8', '--mca btl ^tcp', ' --mca mpi_leave_pinned 1', '--mca mpool_base_use_mem_hooks 1', '-x LD_LIBRARY_PATH', '-x MPI_ENVIRONMENT=1', '/tmp/7852.fwnaeglingio/falconv4_ibm_openmpi', '-cycles', '10', '-ri', 'restart.5000', '-ro', '/tmp/7852.fwnaeglingio/restart.5000'] whereas if I take the additional step of removing spaces from the arguments, it works: Command: ['mpirun', '--prefix', '/usr/mpi/intel/openmpi-1.2.8', '--machinefile', '/var/spool/torque/aux/7854.fwnaeglingio', '-np', '8', '--mca', 'btl', '^tcp', '--mca', 'mpi_leave_pinned', '1', '--mca', 'mpool_base_use_mem_hooks', '1', '-x', 'LD_LIBRARY_PATH', '-x', 'MPI_ENVIRONMENT=1', '/tmp/7854.fwnaeglingio/falconv4_ibm_openmpi', '-cycles', '10', '-ri', 'restart.5010', '-ro', '/tmp/7854.fwnaeglingio/restart.5010'] Somehow the handling of the argv list by orterun has changed in 1.2.8 as compared to 1.2.2-1, as the spawned command used to execute just fine. I'm guessing the elements in argv used to be split on spaces first, before being parsed, whereas now they are not, resulting in the first string being reported as an unrecognized option. > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Jeff Squyres > Sent: Saturday, September 26, 2009 8:24 AM > To: Open MPI Users > Subject: Re: [OMPI users] "Failed to find the following executable" > problemunder Torque > > On Sep 25, 2009, at 7:55 AM, Blosch, Edwin L wrote: > > > I'm having a problem running OpenMPI under Torque. It complains > > like there is a command syntax problem, but the three variations > > below are all correct, best I can tell using mpirun -help. The > > environment in which the command executes, i.e. PATH and > > LD_LIBRARY_PATH, is correct. Torque is 2.3.x. OpenMPI is 1.2.8. > > OFED is 1.4. > > Is your mpirun a script, perchance? It's almost like the arguments > that end up being passed are getting munged / re-ordered, and Bad > Things happen such that the real mpirun under the covers gets confused. > > > /usr/mpi/intel/openmpi-1.2.8/bin/mpirun -np 28 /tmp/43.fwnaeglingio/ > > falconv4_ibm_openmpi -cycles 100 -ri restart.0 -ro /tmp/ > > 43.fwnaeglingio/restart.0 > > > -- > > Failed to find the following executable: > > > > Host: n8n26 > > Executable: -p > > I don't even see -p in that argument list. Where is it coming from? > > A little background: OMPI's mpirun analyzes the command line tokens > that are passed to it. The first one that it doesn't recognize, it > assumes is the executable to invoke. In this case, OMPI's mpirun > found a "-p" on the command line (not sure how that happened; perhaps / > usr/mpi/intel/openmpi-1.2.8/bin/mpirun is not actually OMPI's mpirun, > as I mentioned above...?) and tried to execute it. But then there was > no executable named "-p" to be found in the filesystem, then OMPI > printed the error. > > > mpirun --prefix /usr/mpi/intel/openmpi-1.2.8 --machinefile /var/ > > spool/torque/aux/45.fwnaeglingio -np 28 --mca btl ^tcp --mca > > mpi_leave_pinned 1 --mca mpool_base_use_mem_hooks 1 -x > > LD_LIBRARY_PATH -x MPI_ENVIRONMENT /tmp/45.fwnaeglingio/ > > falconv4_ibm_openmpi -cycles 100 -ri restart.0 -ro /tmp/ > > 45.fwnaeglingio/restart.0 > > > -- > > Failed to find or execute the following executable: > > > > Host: n8n27 > > Executable: --prefix /usr/mpi/intel/openmpi-1.2.8 > > Ditto on this one. --prefix is a valid mpirun command line argument, > so it should not have complained. > > But then again, I confess to not remembering all the 1.2.x command > line options; I don't remember if --prefix was introduced in the 1.2 > or 1.3 series... > > > /usr/mpi/intel/openmpi-1.2.8/bin/mpirun -x LD_LIBRARY_PATH -x > > MPI_ENVIRONMENT=1 /tmp/47.fwnaeglingio/falconv4_ibm_openmpi -cycles > > 100 -ri restart.0 -ro /tmp/47.fwnaeglingio/restart.0 > > > -- > > Failed to find the following executable: > > > > Host: n8n27 > > Executable: - > > > Ditto to #1. > > -- > Jeff Squyres > jsquy...@cisco.com > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] [btl_openib_component.c:1373:btl_openib_component_progress] error polling HP CQ with -2 errno says Success
I've verified that ulimit -l is unlimited everywhere. After further testing I think the errors are related to OFED not openmpi. I've uninstalled the OFED that comes with SLES (1.4.0) and installed OFED 1.4.2 and 1.5-beta and I don't get the errors. I got the idea to swap out OFED that after reading this: http://kerneltrap.org/mailarchive/openfabrics-general/2008/11/3/3903184 Under OFED 1.4.0 (from SLES 11) I had to set options mlx4_core msi_x=0 in /etc/modprobe.conf.local to even get the mlx4 module to load. I found that advice here: http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1254161827534+28353475&threadId=1361415 (Under 1.4.2 and 1.5-Beta the modules load fine without mlx4_core msi_x=0 being set) Now my problem is that with OFED 1.4.2 and 1.5-beta the system hang and the GigE network stops working and I have to power cycle nodes to login. I'm going to try to get some help from the OFED mailing list now. Pavel Shamis (Pasha) wrote: Very strange. MPI tries to access CQ context and it get immediate error. Please make sure that you limits configuration is ok, take a look on this FAQ - http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Pasha. Charles Wright wrote: Hello, I just got some new cluster hardware :) :( I can't seem to overcome an openib problem I get this at run time error polling HP CQ with -2 errno says Success I've tried 2 different IB switches and multiple sets of nodes all on one switch or the other to try to eliminate the hardware. (IPoIB pings work and IB switches ree I've tried both v1.3.3 and v1.2.9 and get the same errors.I'm not really sure what these errors mean or how to get rid of them. My MPI application work if all the CPUs are on the same node (self btl only probably) Any advice would be appreciated. Thanks. asnrcw@dmc:~> qsub -I -l nodes=32,partition=dmc,feature=qc226 -q sysadm qsub: waiting for job 232035.mds1.asc.edu to start qsub: job 232035.mds1.asc.edu ready # Alabama Supercomputer Center - PBS Prologue # Your job id is : 232035 # Your job name is : STDIN # Your job's queue is : sysadm # Your username for this job is : asnrcw # Your groupfor this job is : analyst # Your job used : # 8 CPUs on dmc101 # 8 CPUs on dmc102 # 8 CPUs on dmc103 # 8 CPUs on dmc104 # Your job started at : Fri Sep 25 10:20:05 CDT 2009 asnrcw@dmc101:~> asnrcw@dmc101:~> asnrcw@dmc101:~> asnrcw@dmc101:~> asnrcw@dmc101:~> cd mpiprintrank asnrcw@dmc101:~/mpiprintrank> which mpirun /apps/openmpi-1.3.3-intel/bin/mpirun asnrcw@dmc101:~/mpiprintrank> mpirun ./mpiprintrank-dmc-1.3.3-intel [dmc103][[46071,1],19][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc103][[46071,1],16][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc103][[46071,1],17][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc103][[46071,1],18][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc103][[46071,1],20][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc103][[46071,1],21][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc103][[46071,1],23][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc101][[46071,1],6][btl_openib_component.c:3047:poll_device] [dmc102][[46071,1],14][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success error polling HP CQ with -2 errno says Success [dmc101][[46071,1],7][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc103][[46071,1],22][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc102][[46071,1],15][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc102][[46071,1],11][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc102][[46071,1],11][btl_openib_component.c:3047:poll_device] [dmc102][[46071,1],12][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc102][[46071,1],12][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success error polling HP CQ with -2 errno says Success [dmc101][[46071,1],3][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc101][[46071,1],4][btl_openib_component.c:3047:poll_device] [dmc102][[46071,1],8][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success [dmc101][[46071,1],0][btl_openib_component.c:3047:poll_device] error polling HP CQ with -2 errno says Success error
Re: [OMPI users] Debugging OpenMPI calls
Hi Jeff, Thanks for the pointers. I tried with both CFLAGS=-g3 and --enable-debug (separately), however, I am still unable to jump into the MPI source. It seems I am missing a small step(s) somewhere. I compiled my MPI application with the new library built with above flags, ran it and attached gdb to one of the processes. Following are the steps that I performed with gdb: ... ... 0x00110416 in __kernel_vsyscall () Missing separate debuginfos, use: debuginfo-install glibc.i686 (gdb) dir /home/amarathe/mpi/svn_openmpi/ompi-trunk/ompi/mpi/c Source directories searched: /home/amarathe/mpi/svn_openmpi/ompi-trunk/ompi/mpi/c:$cdir:$cwd (gdb) break MPI_Barrier Breakpoint 1 at 0x155596 When gdb hits breakpoint 1, it jumps at the address but cannot find the source file for 'MPI_Barrier' definition. Breakpoint 1, 0x00155596 in PMPI_Barrier () from /home/amarathe/mpi/openmpi/openmpi-1.3.3_install/lib/libmpi.so.0 (gdb) s Single stepping until exit from function PMPI_Barrier, which has no line number information. main (argc=1, argv=0xbf9a1484) at smg2000.c:114 114 P = num_procs; (gdb) Is this the right approach? Thanks, Aniruddha On Mon, Sep 28, 2009 at 8:40 AM, Jeff Squyres wrote: > You might want to just configure Open MPI with: > > ./configure CFLAGS=-g3 ... > > That will pass "-g3" to every Makefile in Open MPI. > > FWIW: I do variants on this technique and gdb is always able to jump to the > right source location if I "break MPI_Barrier" (for example). We actually > have a "--enable-debug" option to OMPI's configure, but it does turn on a > bunch of other debugging code that will definitely result in performance > degradation at run-time (one of its side effects is to add "-g" to every > Makefile). > > > > On Sep 28, 2009, at 5:57 AM, Aniruddha Marathe wrote: > > Hello, >> >> I am new to OpenMPI library and I am trying to step through common MPI >> communication calls using gdb. I attach gdb to one of the processes >> (using the steps mentioned on the OpenMPI Debugging FAQ page) and set >> a breakpoint on 'MPI_Barrier' and expect gdb to jump into the >> definition of MPI_Barrier function. >> >> I've manually added -g3 compilation flag to the Makefiles in some of >> the directories that I thought relevant ({ROOT}/ompi/mpi/c etc). I >> also specified the source file paths in gdb using the 'dir' command. >> However, gdb is unable to jump into the appropriate source location >> when it hits the breakpoint. >> >> Could anyone please let me know if I am missing something here? >> >> Thanks for looking into my post. >> >> Regards, >> Aniruddha >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > -- > Jeff Squyres > jsquy...@cisco.com > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] use additional interface for openmpi
Hi folks, I want to use for openmpi communication the additional ethernet interfaces on node and head node. its is eth1 on nodes and eth4 on head node. So how can I configure openmpi? If I add in config file btl_base_include=tcp,sm,self. btl_tcp_if_include=eth1 will it work or not? And how is it working with torque batch system (daemons listen eth0 on all nodes). Thanx.
[OMPI users] Openmpi - Mac OS X SnowLeopard linking error
Hi, when compiling openmpi-1.3.3 with GNU or PGI compilers, the following occurs : ibtool: link: gcc-4.2 -O3 -DNDEBUG -m64 -finline-functions -fno-strict- aliasing -fvisibility=hidden -o orte-iof orte-iof.o ../../../ orte/.libs/libopen-rte.a /Users/podallaire/Downloads/openmpi-1.3.3/ opal/.libs/libopen-pal.a -lutil Undefined symbols: "_orte_iof", referenced from: _main in orte-iof.o _abort_exit_callback in orte-iof.o "_orte_routed", referenced from: _orte_read_hnp_contact_file in libopen-rte.a(hnp_contact.o) _orte_rml_base_update_contact_info in libopen-rte.a(rml_base_contact.o) _orte_rml_base_update_contact_info in libopen-rte.a(rml_base_contact.o) ld: symbol(s) not found collect2: ld returned 1 exit status make[2]: *** [orte-iof] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 From the following thread, it seems that an extra linking flag shoud be added : -all_load See : http://www.pgroup.com/userforum/viewtopic.php?t=1594&sid=a9139f8d260d438afc74b5243e06679a Anybody else had this problem ? Thanks PO
Re: [OMPI users] Openmpi - Mac OS X SnowLeopard linking error
Nope - I've been running on Snow Leopard almost since the day it came out without problem. Key was that I had to re-install all my 3rd party software (e.g., compilers) from Macports or wherever as none of the stuff I had installed on Leopard would work properly after the upgrade. Didn't realize that until I found a thread on the Macports list where it was pointed out that you have to completely reinstall all such software after every major Mac OSX upgrade (i.e., from Tiger to Leopard to Snow Leopard). On Sep 28, 2009, at 4:42 PM, Pierre-Olivier Dallaire wrote: Hi, when compiling openmpi-1.3.3 with GNU or PGI compilers, the following occurs : ibtool: link: gcc-4.2 -O3 -DNDEBUG -m64 -finline-functions -fno- strict-aliasing -fvisibility=hidden -o orte-iof orte-iof.o ../../../ orte/.libs/libopen-rte.a /Users/podallaire/Downloads/openmpi-1.3.3/ opal/.libs/libopen-pal.a -lutil Undefined symbols: "_orte_iof", referenced from: _main in orte-iof.o _abort_exit_callback in orte-iof.o "_orte_routed", referenced from: _orte_read_hnp_contact_file in libopen-rte.a(hnp_contact.o) _orte_rml_base_update_contact_info in libopen-rte.a (rml_base_contact.o) _orte_rml_base_update_contact_info in libopen-rte.a (rml_base_contact.o) ld: symbol(s) not found collect2: ld returned 1 exit status make[2]: *** [orte-iof] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 From the following thread, it seems that an extra linking flag shoud be added : -all_load See : http://www.pgroup.com/userforum/viewtopic.php?t=1594&sid=a9139f8d260d438afc74b5243e06679a Anybody else had this problem ? Thanks PO ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Openmpi - Mac OS X SnowLeopard linking error
This error only comes out when I try to build the fortran wrappers / will not fail if only building with gcc/g++ I had to included -all_load in several individual Makefile / using the env variable LIBS with ./configure does not work. Thanks ! PO On 2009-09-28, at 6:57 PM, Ralph Castain wrote: Nope - I've been running on Snow Leopard almost since the day it came out without problem. Key was that I had to re-install all my 3rd party software (e.g., compilers) from Macports or wherever as none of the stuff I had installed on Leopard would work properly after the upgrade. Didn't realize that until I found a thread on the Macports list where it was pointed out that you have to completely reinstall all such software after every major Mac OSX upgrade (i.e., from Tiger to Leopard to Snow Leopard). On Sep 28, 2009, at 4:42 PM, Pierre-Olivier Dallaire wrote: Hi, when compiling openmpi-1.3.3 with GNU or PGI compilers, the following occurs : ibtool: link: gcc-4.2 -O3 -DNDEBUG -m64 -finline-functions -fno- strict-aliasing -fvisibility=hidden -o orte-iof orte-iof.o ../../../ orte/.libs/libopen-rte.a /Users/podallaire/Downloads/openmpi-1.3.3/ opal/.libs/libopen-pal.a -lutil Undefined symbols: "_orte_iof", referenced from: _main in orte-iof.o _abort_exit_callback in orte-iof.o "_orte_routed", referenced from: _orte_read_hnp_contact_file in libopen-rte.a(hnp_contact.o) _orte_rml_base_update_contact_info in libopen-rte.a (rml_base_contact.o) _orte_rml_base_update_contact_info in libopen-rte.a (rml_base_contact.o) ld: symbol(s) not found collect2: ld returned 1 exit status make[2]: *** [orte-iof] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 From the following thread, it seems that an extra linking flag shoud be added : -all_load See : http://www.pgroup.com/userforum/viewtopic.php?t=1594&sid=a9139f8d260d438afc74b5243e06679a Anybody else had this problem ? Thanks PO ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Openmpi - Mac OS X SnowLeopard linking error
That may explain it - I never build fortran (thank goodness). On Sep 28, 2009, at 5:06 PM, Pierre-Olivier Dallaire wrote: This error only comes out when I try to build the fortran wrappers / will not fail if only building with gcc/g++ I had to included -all_load in several individual Makefile / using the env variable LIBS with ./configure does not work. Thanks ! PO On 2009-09-28, at 6:57 PM, Ralph Castain wrote: Nope - I've been running on Snow Leopard almost since the day it came out without problem. Key was that I had to re-install all my 3rd party software (e.g., compilers) from Macports or wherever as none of the stuff I had installed on Leopard would work properly after the upgrade. Didn't realize that until I found a thread on the Macports list where it was pointed out that you have to completely reinstall all such software after every major Mac OSX upgrade (i.e., from Tiger to Leopard to Snow Leopard). On Sep 28, 2009, at 4:42 PM, Pierre-Olivier Dallaire wrote: Hi, when compiling openmpi-1.3.3 with GNU or PGI compilers, the following occurs : ibtool: link: gcc-4.2 -O3 -DNDEBUG -m64 -finline-functions -fno- strict-aliasing -fvisibility=hidden -o orte-iof orte- iof.o ../../../orte/.libs/libopen-rte.a /Users/podallaire/ Downloads/openmpi-1.3.3/opal/.libs/libopen-pal.a -lutil Undefined symbols: "_orte_iof", referenced from: _main in orte-iof.o _abort_exit_callback in orte-iof.o "_orte_routed", referenced from: _orte_read_hnp_contact_file in libopen-rte.a(hnp_contact.o) _orte_rml_base_update_contact_info in libopen-rte.a (rml_base_contact.o) _orte_rml_base_update_contact_info in libopen-rte.a (rml_base_contact.o) ld: symbol(s) not found collect2: ld returned 1 exit status make[2]: *** [orte-iof] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 From the following thread, it seems that an extra linking flag shoud be added : -all_load See : http://www.pgroup.com/userforum/viewtopic.php?t=1594&sid=a9139f8d260d438afc74b5243e06679a Anybody else had this problem ? Thanks PO ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Debugging OpenMPI calls
OK, it turned out to be a really stupid mistake. Sorry for spamming and thanks for the help! Regards, Aniruddha On Mon, Sep 28, 2009 at 11:28 AM, Aniruddha Marathe < marathe.anirud...@gmail.com> wrote: > Hi Jeff, > > Thanks for the pointers. I tried with both CFLAGS=-g3 and --enable-debug > (separately), however, I am still unable to jump into the MPI source. It > seems I am missing a small step(s) somewhere. > > I compiled my MPI application with the new library built with above flags, > ran it and attached gdb to one of the processes. Following are the steps > that I performed with gdb: > > ... > ... > 0x00110416 in __kernel_vsyscall () > Missing separate debuginfos, use: debuginfo-install glibc.i686 > (gdb) dir /home/amarathe/mpi/svn_openmpi/ompi-trunk/ompi/mpi/c > Source directories searched: > /home/amarathe/mpi/svn_openmpi/ompi-trunk/ompi/mpi/c:$cdir:$cwd > (gdb) break MPI_Barrier > Breakpoint 1 at 0x155596 > > > When gdb hits breakpoint 1, it jumps at the address but cannot find the > source file for 'MPI_Barrier' definition. > > > Breakpoint 1, 0x00155596 in PMPI_Barrier () from > /home/amarathe/mpi/openmpi/openmpi-1.3.3_install/lib/libmpi.so.0 > (gdb) s > Single stepping until exit from function PMPI_Barrier, > which has no line number information. > main (argc=1, argv=0xbf9a1484) at smg2000.c:114 > 114 P = num_procs; > (gdb) > > > Is this the right approach? > > Thanks, > Aniruddha > > > On Mon, Sep 28, 2009 at 8:40 AM, Jeff Squyres wrote: > >> You might want to just configure Open MPI with: >> >> ./configure CFLAGS=-g3 ... >> >> That will pass "-g3" to every Makefile in Open MPI. >> >> FWIW: I do variants on this technique and gdb is always able to jump to >> the right source location if I "break MPI_Barrier" (for example). We >> actually have a "--enable-debug" option to OMPI's configure, but it does >> turn on a bunch of other debugging code that will definitely result in >> performance degradation at run-time (one of its side effects is to add "-g" >> to every Makefile). >> >> >> >> On Sep 28, 2009, at 5:57 AM, Aniruddha Marathe wrote: >> >> Hello, >>> >>> I am new to OpenMPI library and I am trying to step through common MPI >>> communication calls using gdb. I attach gdb to one of the processes >>> (using the steps mentioned on the OpenMPI Debugging FAQ page) and set >>> a breakpoint on 'MPI_Barrier' and expect gdb to jump into the >>> definition of MPI_Barrier function. >>> >>> I've manually added -g3 compilation flag to the Makefiles in some of >>> the directories that I thought relevant ({ROOT}/ompi/mpi/c etc). I >>> also specified the source file paths in gdb using the 'dir' command. >>> However, gdb is unable to jump into the appropriate source location >>> when it hits the breakpoint. >>> >>> Could anyone please let me know if I am missing something here? >>> >>> Thanks for looking into my post. >>> >>> Regards, >>> Aniruddha >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >