Re: [OMPI users] OpenMPI, debugging, and Portland Group's pgdbg

2006-07-06 Thread Jeff Squyres (jsquyres)
Thanks for looking into this!

I'm going to file a feature enhancement for OMPI to add this option once
the PGI debugger works with Open MPI (I don't want to add it before
then, because it may be misleading to users).


> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Andrew J Caird
> Sent: Wednesday, July 05, 2006 9:16 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI, debugging, and Portland 
> Group's pgdbg
> 
> 
> This took a long time for me to get to, but once I did, what 
> I found was 
> that the closest thing to working for the PGI compilers with 
> OpenMPI is 
> this command:
>mpirun --debugger "pgdbg @mpirun@ @mpirun_args@" --debug 
> -np 2 ./cpi
> 
> It appears to work, that is, you can select a process with the "proc" 
> command in pgdbg and set break points and all, but pgdbg 
> prints a lot of 
> error messages that are all the same:
> db_set_code_brk : DiBreakpointSet fails
> which is sort of annoying, but didn't impede my debugging of 
> my 100-line 
> MPI test program.
> 
> I posted this to the PGI Debugger Forum:
>http://www.pgroup.com/userforum/viewtopic.php?p=1969
> and got a response saying (hopefully Mat doesn't mind me 
> quoting him)::
> 
> >  Hi Andy,
> >  Actually I'm pleasantly surprised that PGDBG works at all 
> with OpenMPI
> >  since PGDBG currently only supports MPICH. While we're planning on
> >  adding OpenMPI and MPICH-2 support later this year, in the 
> immediate
> >  future, there isn't a work around this problem, other than to use
> >  MPICH.
> >  Thanks,
> >  Mat
> 
> So I guess the short answer is that is might sort of work if 
> you really 
> need it, otherwise it's best to wait a little while.
> 
> --andy
> 
> On Fri, 16 Jun 2006, Jeff Squyres (jsquyres) wrote:
> 
> > I'm afraid that I'm not familiar with the PG debugger, so I 
> don't know
> > how it is supposed to be launched.
> >
> > The intent with --debugger / --debug is that you could do a single
> > invocation of some command and it launches both the 
> parallel debugger
> > and tells that debugger to launch your parallel MPI process 
> (assumedly
> > allowing the parallel debugger to attach to your parallel 
> MPI process).
> > This is what fx2 and Totalview allow, for example.
> >
> > As such, the "--debug" option is simply syntactic sugar for invoking
> > another [perhaps non-obvious] command.  We figured it was 
> simpler for
> > users to add "--debug" to the already-familiar mpirun 
> command line than
> > to learn a new syntax for invoking a debugger (although both would
> > certainly work equally well).
> >
> > As such, when OMPI's mpirun sees "--debug", it ends up exec'ing
> > something else -- the parallel debugger command.  In the 
> example that I
> > gave in 
> http://www.open-mpi.org/community/lists/users/2005/11/0370.php,
> > mpirun looked for two things in your path: totalview and fx2.
> >
> > For example, if you did this:
> >
> > mpirun --debug -np 4 a.out
> >
> > If it found totalview, it would end up exec'ing:
> >
> > totalview @mpirun@ -a @mpirun_args@
> > which would get substituted to
> > totalview mpirun -a -np 4 a.out
> >
> > (note the additional "-a") Which is the totalview command 
> line syntax to
> > launch their debugger and tell it to launch your parallel 
> process.  If
> > totalview is not found in your path, it'll look for fx2.  If fx2 is
> > found, it'll invoke:
> >
> > fx2 @mpirun@ -a @mpirun_args@
> > which would get substitued to
> > fx2 mpirun -a -np 4 a.out
> >
> > You can see that fx2's syntax was probably influenced by 
> totalview's.
> >
> > So what you need is the command line that tells pgdbg to do the same
> > thing -- launch your app and attach to it.  You can then 
> substitute that
> > into the "--debugger" option (using the @mpirun@ and @mpirun_args@
> > tokens), or set the MCA parameter 
> "orte_base_user_debugger", and then
> > use --debug.  For example, if the pgdbg syntax is similar to that of
> > totalview and fx2, then you could do the following:
> >
> > mpirun --debugger pgdbg @mpirun@ -a @mpirun_args@ --debug -np 4
> > a.out
> > or (assuming tcsh)
> > shell% setenv OMPI_MCA_orte_base_user_debugger "pgdbg @mpirun@
> > -a @mpirun_args@"
> > shell% mpirun --debug -np 4 a.out
> >
> > Make sense?
> >
> > If you find a fixed format for pgdb, we'd be happy to add it to the 
> > default value of the orte_base_user_debugger MCA parameter.
> >
> > Note that OMPI currently only supports the Totalview API 
> for attaching 
> > to MPI processes -- I don't know if pgdbg requires something else.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 



Re: [OMPI users] OpenMPI, debugging, and Portland Group's pgdbg

2006-06-16 Thread Jeff Squyres (jsquyres)
I'm afraid that I'm not familiar with the PG debugger, so I don't know
how it is supposed to be launched.

The intent with --debugger / --debug is that you could do a single
invocation of some command and it launches both the parallel debugger
and tells that debugger to launch your parallel MPI process (assumedly
allowing the parallel debugger to attach to your parallel MPI process).
This is what fx2 and Totalview allow, for example.  

As such, the "--debug" option is simply syntactic sugar for invoking
another [perhaps non-obvious] command.  We figured it was simpler for
users to add "--debug" to the already-familiar mpirun command line than
to learn a new syntax for invoking a debugger (although both would
certainly work equally well).

As such, when OMPI's mpirun sees "--debug", it ends up exec'ing
something else -- the parallel debugger command.  In the example that I
gave in http://www.open-mpi.org/community/lists/users/2005/11/0370.php,
mpirun looked for two things in your path: totalview and fx2.

For example, if you did this:

mpirun --debug -np 4 a.out

If it found totalview, it would end up exec'ing:

totalview @mpirun@ -a @mpirun_args@ 
which would get substituted to
totalview mpirun -a -np 4 a.out

(note the additional "-a") Which is the totalview command line syntax to
launch their debugger and tell it to launch your parallel process.  If
totalview is not found in your path, it'll look for fx2.  If fx2 is
found, it'll invoke:

fx2 @mpirun@ -a @mpirun_args@ 
which would get substitued to
fx2 mpirun -a -np 4 a.out

You can see that fx2's syntax was probably influenced by totalview's.  

So what you need is the command line that tells pgdbg to do the same
thing -- launch your app and attach to it.  You can then substitute that
into the "--debugger" option (using the @mpirun@ and @mpirun_args@
tokens), or set the MCA parameter "orte_base_user_debugger", and then
use --debug.  For example, if the pgdbg syntax is similar to that of
totalview and fx2, then you could do the following:

mpirun --debugger pgdbg @mpirun@ -a @mpirun_args@ --debug -np 4
a.out
or (assuming tcsh)
shell% setenv OMPI_MCA_orte_base_user_debugger "pgdbg @mpirun@
-a @mpirun_args@"
shell% mpirun --debug -np 4 a.out

Make sense?

If you find a fixed format for pgdb, we'd be happy to add it to the
default value of the orte_base_user_debugger MCA parameter.

Note that OMPI currently only supports the Totalview API for attaching
to MPI processes -- I don't know if pgdbg requires something else.


> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Caird, Andrew J
> Sent: Tuesday, June 13, 2006 4:38 PM
> To: us...@open-mpi.org
> Subject: [OMPI users] OpenMPI, debugging, and Portland Group's pgdbg
> 
> Hello all,
> 
> I've read the thread "OpenMPI debugging support"
> (http://www.open-mpi.org/community/lists/users/2005/11/0370.ph
> p) and it
> looks like there is improved debugging support for debuggers 
> other than
> TV in the 1.1 series.
> 
> I'd like to use Portland Groups pgdbg.  It's a parallel debugger,
> there's more information at http://www.pgroup.com/resources/docs.htm.
> 
> >From the previous thread on this topic, it looks to me like 
> the plan for
> 1.1 and forward is to support the ability to launch the 
> debugger "along
> side" the application.  I don't know enough about either pgdbg or
> OpenMPI to know if this is the best plan, but assuming that it is, is
> there a way to see if it is happening?
> 
> I've tried this two ways, the first way doesn't seem to attach to
> anything:
> --
> --
> 
> [acaird@nyx-login ~]$ ompi_info | head -2
> Open MPI: 1.1a9r10177
>Open MPI SVN revision: r10177
> [acaird@nyx-login ~]$ mpirun --debugger pgdbg --debug  -np 2 cpi
> PGDBG 6.1-3 x86-64 (Cluster, 64 CPU)
> Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
> Copyright 2000-2005, STMicroelectronics, Inc. All Rights Reserved.
> PGDBG cannot open a window; check the DISPLAY environment variable.
> Entering text mode.
> 
> pgdbg> list
> ERROR: No current thread.
> 
> pgdbg> quit
> --
> --
> 
> 
> and I've tried running the whole thing under pgdbg:
> --
> --
> 
> [acaird@nyx-login ~]$ pgdbg mpirun -np 2 cpi -s pgdbgscript
>   { lots of mca_* loaded by ld-linux messages }
> pgserv 8726: attach : attach 8720 fails
> ERROR: New Process (PID 8720, HOST localhost) ATTACH FAILED.
> ERROR: New P

[OMPI users] OpenMPI, debugging, and Portland Group's pgdbg

2006-06-13 Thread Caird, Andrew J
Hello all,

I've read the thread "OpenMPI debugging support"
(http://www.open-mpi.org/community/lists/users/2005/11/0370.php) and it
looks like there is improved debugging support for debuggers other than
TV in the 1.1 series.

I'd like to use Portland Groups pgdbg.  It's a parallel debugger,
there's more information at http://www.pgroup.com/resources/docs.htm.

>From the previous thread on this topic, it looks to me like the plan for
1.1 and forward is to support the ability to launch the debugger "along
side" the application.  I don't know enough about either pgdbg or
OpenMPI to know if this is the best plan, but assuming that it is, is
there a way to see if it is happening?

I've tried this two ways, the first way doesn't seem to attach to
anything:


[acaird@nyx-login ~]$ ompi_info | head -2
Open MPI: 1.1a9r10177
   Open MPI SVN revision: r10177
[acaird@nyx-login ~]$ mpirun --debugger pgdbg --debug  -np 2 cpi
PGDBG 6.1-3 x86-64 (Cluster, 64 CPU)
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2005, STMicroelectronics, Inc. All Rights Reserved.
PGDBG cannot open a window; check the DISPLAY environment variable.
Entering text mode.

pgdbg> list
ERROR: No current thread.

pgdbg> quit



and I've tried running the whole thing under pgdbg:


[acaird@nyx-login ~]$ pgdbg mpirun -np 2 cpi -s pgdbgscript
  { lots of mca_* loaded by ld-linux messages }
pgserv 8726: attach : attach 8720 fails
ERROR: New Process (PID 8720, HOST localhost) ATTACH FAILED.
ERROR: New Process (PID 8720, HOST localhost) IGNORED.
ERROR: cannot read value at address 0x59BFE8.
ERROR: cannot read value at address 0x59BFF0.
ERROR: cannot read value at address 0x59BFF8.
ERROR: New Process (PID 0, HOST unknown) IGNORED.
ERROR: cannot read value at address 0x2A959BBEC8.


and it hangs right there until I kill it.  The two variables in this
scenario are:
PGRSH=ssh and the contents of pgdbgscript are:


pgienv exe force
pgienv mode process
ignore 12
run



So, the short list of questions are:

1. Has anyone done this successfully before?
2. Am I making the right assumptions about how the debugger attaches to
the processes?
3. Is this the expected behavior for this set of options to mpirun?
4. Does anyone have any suggestions for other things I might try?

Thanks a lot.
--andy