On Wednesday 02 April 2008 08:04:10 pm Ralph Castain wrote:
> Hmmm...something isn't making sense. Can I see the command line you used to
> generate this?

mpirun --n 2 --host vic12,vic20 -mca btl openib,self --mca 
btl_openib_receive_queues P,65536,256,128,128 -d xterm -e 
gdb /usr/mpi/gcc/openmpi-trunk/tests/IMB-2.3/IMB-MPI1  

> I'll tell you why I'm puzzled. If orte_debug_flag is set, then the
> "--daemonize" should NOT be there, and you should see "--debug" on that
> command line. What I see is the reverse, which implies to me that
> orte_debug_flag is NOT being set to "true".
>
> When I tested here and on odin, though, I found that the -d option
> correctly set the flag and everything works just fine.
>
> So there is something in your environment or setup that is messing up that
> orte_debug_flag. I have no idea what it could be - the command line should
> override anything in your environment, but you could check. Otherwise, if
> this diagnostic output came from a command line that included -d or
> --debug-devel, or had OMPI_MCA_orte_debug=1 in the environment, then I am
> at a loss - everywhere I've tried it, it works fine.

I'll double check and do a completely fresh svn pull and install and see where 
that gets me.

Thanks for the help,
Jon


> Ralph
>
> On 4/2/08 5:41 PM, "Jon Mason" <j...@opengridcomputing.com> wrote:
> > On Wednesday 02 April 2008 05:04:47 pm Ralph Castain wrote:
> >> Here's a real simple diagnostic you can do: set -mca plm_base_verbose 1
> >> and look at the cmd line being executed (send it here). It will look
> >> like:
> >>
> >> [[xxx,1],0] plm:rsh: executing: jjkljks;jldfsaj;
> >>
> >> If the cmd line has --daemonize on it, then the ssh will close and xterm
> >> won't work.
> >
> > [vic20:01863] [[40388,0],0] plm:rsh: executing: (//usr/bin/ssh)
> > [/usr/bin/ssh vic12 orted --daemonize -mca ess env -mca orte_ess_jobid
> > 2646867968 -mca orte_ess_vpid 1 -mca orte_ess_num_procs
> > 2 --hnp-uri
> > "2646867968.0;tcp://192.168.70.150:39057;tcp://10.10.0.150:39057;tcp://86
> >.75.3 0.10:39057" --nodename
> > vic12 -mca btl openib,self --mca btl_openib_receive_queues
> > P,65536,256,128,128 -mca plm_base_verbose 1 -mca
> > mca_base_param_file_path
> > /usr/mpi/gcc/ompi-trunk/share/openmpi/amca-param-sets:/root -mca
> > mca_base_param_file_path_force /root]
> >
> >
> > It looks like what you say is happening.  Is this configured somewhere,
> > so that I can remove it?
> >
> > Thanks,
> > Jon
> >
> >> Ralph
> >>
> >> On 4/2/08 3:14 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote:
> >>> Can you diagnose a little further:
> >>>
> >>> 1. in the case where it works, can you verify that the ssh to launch
> >>> the orteds is still running?
> >>>
> >>> 2. in the case where it doesn't work, can you verify that the ssh to
> >>> launch the orteds has actually died?
> >>>
> >>> On Apr 2, 2008, at 4:58 PM, Jon Mason wrote:
> >>>> On Wednesday 02 April 2008 01:21:31 pm Jon Mason wrote:
> >>>>> On Wednesday 02 April 2008 11:54:50 am Ralph H Castain wrote:
> >>>>>> I remember that someone had found a bug that caused
> >>>>>> orte_debug_flag to not
> >>>>>> get properly set (local var covering over a global one) - could be
> >>>>>> that
> >>>>>> your tmp-public branch doesn't have that patch in it.
> >>>>>>
> >>>>>> You might try updating to the latest trunk
> >>>>>
> >>>>> I updated my ompi-trunk tree, did a clean build, and I still seem
> >>>>> the same
> >>>>> problem.  I regressed trunk to rev 17589 and everything works as I
> >>>>> expect.
> >>>>> So I think the problem is still there in the top of trunk.
> >>>>
> >>>> I stepped through the revs of trunk and found the first failing rev
> >>>> to be
> >>>> 17632.  Its a big patch, so I'll defer to those more in the know to
> >>>> determine
> >>>> what is breaking in there.
> >>>>
> >>>>> I don't discount user error, but I don't think I am doing anyting
> >>>>> different.
> >>>>> Did some setting change that perhaps I did not modify?
> >>>>>
> >>>>> Thanks,
> >>>>> Jon
> >>>>>
> >>>>>> On 4/2/08 10:41 AM, "George Bosilca" <bosi...@eecs.utk.edu> wrote:
> >>>>>>> I'm using this feature on the trunk with the version from
> >>>>>>> yesterday.
> >>>>>>> It works without problems ...
> >>>>>>>
> >>>>>>>   george.
> >>>>>>>
> >>>>>>> On Apr 2, 2008, at 12:14 PM, Jon Mason wrote:
> >>>>>>>> On Wednesday 02 April 2008 11:07:18 am Jeff Squyres wrote:
> >>>>>>>>> Are these r numbers relevant on the /tmp-public branch, or the
> >>>>>>>>> trunk?
> >>>>>>>>
> >>>>>>>> I pulled it out of the command used to update the branch, which
> >>>>>>>> was:
> >>>>>>>> svn merge -r 17590:17917 https://svn.open-mpi.org/svn/ompi/trunk .
> >>>>>>>>
> >>>>>>>> In the cpc tmp branch, it happened at r17920.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Jon
> >>>>>>>>
> >>>>>>>>> On Apr 2, 2008, at 11:59 AM, Jon Mason wrote:
> >>>>>>>>>> I regressed my tree and it looks like it happened between
> >>>>>>>>>> 17590:17917
> >>>>>>>>>>
> >>>>>>>>>> On Wednesday 02 April 2008 10:22:52 am Jon Mason wrote:
> >>>>>>>>>>> I am noticing that ssh seems to be broken on trunk (and my cpc
> >>>>>>>>>>> branch, as
> >>>>>>>>>>> it is based on trunk).  When I try to use xterm and gdb to
> >>>>>>>>>>> debug, I
> >>>>>>>>>>> only
> >>>>>>>>>>> successfully get 1 xterm.  I have tried this on 2 different
> >>>>>>>>>>> setups.  I can
> >>>>>>>>>>> successfully get the xterm's on the 1.2 svn branch.
> >>>>>>>>>>>
> >>>>>>>>>>> I am running the following command:
> >>>>>>>>>>> mpirun --n 2 --host vic12,vic20 -mca btl tcp,self -d xterm -e
> >>>>>>>>>>> gdb /usr/mpi/gcc/openmpi-1.2-svn/tests/IMB-3.0/IMB-MPI1
> >>>>>>>>>>>
> >>>>>>>>>>> Is anyone else seeing this problem?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Jon
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> devel mailing list
> >>>>>>>>>>> de...@open-mpi.org
> >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>>>>>>
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> devel mailing list
> >>>>>>>>>> de...@open-mpi.org
> >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> devel mailing list
> >>>>>>>> de...@open-mpi.org
> >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> devel mailing list
> >>>>>>> de...@open-mpi.org
> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> devel mailing list
> >>>>>> de...@open-mpi.org
> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>
> >>>>> _______________________________________________
> >>>>> devel mailing list
> >>>>> de...@open-mpi.org
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> de...@open-mpi.org
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to