On Wednesday 02 April 2008 08:04:10 pm Ralph Castain wrote: > Hmmm...something isn't making sense. Can I see the command line you used to > generate this?
mpirun --n 2 --host vic12,vic20 -mca btl openib,self --mca btl_openib_receive_queues P,65536,256,128,128 -d xterm -e gdb /usr/mpi/gcc/openmpi-trunk/tests/IMB-2.3/IMB-MPI1 > I'll tell you why I'm puzzled. If orte_debug_flag is set, then the > "--daemonize" should NOT be there, and you should see "--debug" on that > command line. What I see is the reverse, which implies to me that > orte_debug_flag is NOT being set to "true". > > When I tested here and on odin, though, I found that the -d option > correctly set the flag and everything works just fine. > > So there is something in your environment or setup that is messing up that > orte_debug_flag. I have no idea what it could be - the command line should > override anything in your environment, but you could check. Otherwise, if > this diagnostic output came from a command line that included -d or > --debug-devel, or had OMPI_MCA_orte_debug=1 in the environment, then I am > at a loss - everywhere I've tried it, it works fine. I'll double check and do a completely fresh svn pull and install and see where that gets me. Thanks for the help, Jon > Ralph > > On 4/2/08 5:41 PM, "Jon Mason" <j...@opengridcomputing.com> wrote: > > On Wednesday 02 April 2008 05:04:47 pm Ralph Castain wrote: > >> Here's a real simple diagnostic you can do: set -mca plm_base_verbose 1 > >> and look at the cmd line being executed (send it here). It will look > >> like: > >> > >> [[xxx,1],0] plm:rsh: executing: jjkljks;jldfsaj; > >> > >> If the cmd line has --daemonize on it, then the ssh will close and xterm > >> won't work. > > > > [vic20:01863] [[40388,0],0] plm:rsh: executing: (//usr/bin/ssh) > > [/usr/bin/ssh vic12 orted --daemonize -mca ess env -mca orte_ess_jobid > > 2646867968 -mca orte_ess_vpid 1 -mca orte_ess_num_procs > > 2 --hnp-uri > > "2646867968.0;tcp://192.168.70.150:39057;tcp://10.10.0.150:39057;tcp://86 > >.75.3 0.10:39057" --nodename > > vic12 -mca btl openib,self --mca btl_openib_receive_queues > > P,65536,256,128,128 -mca plm_base_verbose 1 -mca > > mca_base_param_file_path > > /usr/mpi/gcc/ompi-trunk/share/openmpi/amca-param-sets:/root -mca > > mca_base_param_file_path_force /root] > > > > > > It looks like what you say is happening. Is this configured somewhere, > > so that I can remove it? > > > > Thanks, > > Jon > > > >> Ralph > >> > >> On 4/2/08 3:14 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > >>> Can you diagnose a little further: > >>> > >>> 1. in the case where it works, can you verify that the ssh to launch > >>> the orteds is still running? > >>> > >>> 2. in the case where it doesn't work, can you verify that the ssh to > >>> launch the orteds has actually died? > >>> > >>> On Apr 2, 2008, at 4:58 PM, Jon Mason wrote: > >>>> On Wednesday 02 April 2008 01:21:31 pm Jon Mason wrote: > >>>>> On Wednesday 02 April 2008 11:54:50 am Ralph H Castain wrote: > >>>>>> I remember that someone had found a bug that caused > >>>>>> orte_debug_flag to not > >>>>>> get properly set (local var covering over a global one) - could be > >>>>>> that > >>>>>> your tmp-public branch doesn't have that patch in it. > >>>>>> > >>>>>> You might try updating to the latest trunk > >>>>> > >>>>> I updated my ompi-trunk tree, did a clean build, and I still seem > >>>>> the same > >>>>> problem. I regressed trunk to rev 17589 and everything works as I > >>>>> expect. > >>>>> So I think the problem is still there in the top of trunk. > >>>> > >>>> I stepped through the revs of trunk and found the first failing rev > >>>> to be > >>>> 17632. Its a big patch, so I'll defer to those more in the know to > >>>> determine > >>>> what is breaking in there. > >>>> > >>>>> I don't discount user error, but I don't think I am doing anyting > >>>>> different. > >>>>> Did some setting change that perhaps I did not modify? > >>>>> > >>>>> Thanks, > >>>>> Jon > >>>>> > >>>>>> On 4/2/08 10:41 AM, "George Bosilca" <bosi...@eecs.utk.edu> wrote: > >>>>>>> I'm using this feature on the trunk with the version from > >>>>>>> yesterday. > >>>>>>> It works without problems ... > >>>>>>> > >>>>>>> george. > >>>>>>> > >>>>>>> On Apr 2, 2008, at 12:14 PM, Jon Mason wrote: > >>>>>>>> On Wednesday 02 April 2008 11:07:18 am Jeff Squyres wrote: > >>>>>>>>> Are these r numbers relevant on the /tmp-public branch, or the > >>>>>>>>> trunk? > >>>>>>>> > >>>>>>>> I pulled it out of the command used to update the branch, which > >>>>>>>> was: > >>>>>>>> svn merge -r 17590:17917 https://svn.open-mpi.org/svn/ompi/trunk . > >>>>>>>> > >>>>>>>> In the cpc tmp branch, it happened at r17920. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Jon > >>>>>>>> > >>>>>>>>> On Apr 2, 2008, at 11:59 AM, Jon Mason wrote: > >>>>>>>>>> I regressed my tree and it looks like it happened between > >>>>>>>>>> 17590:17917 > >>>>>>>>>> > >>>>>>>>>> On Wednesday 02 April 2008 10:22:52 am Jon Mason wrote: > >>>>>>>>>>> I am noticing that ssh seems to be broken on trunk (and my cpc > >>>>>>>>>>> branch, as > >>>>>>>>>>> it is based on trunk). When I try to use xterm and gdb to > >>>>>>>>>>> debug, I > >>>>>>>>>>> only > >>>>>>>>>>> successfully get 1 xterm. I have tried this on 2 different > >>>>>>>>>>> setups. I can > >>>>>>>>>>> successfully get the xterm's on the 1.2 svn branch. > >>>>>>>>>>> > >>>>>>>>>>> I am running the following command: > >>>>>>>>>>> mpirun --n 2 --host vic12,vic20 -mca btl tcp,self -d xterm -e > >>>>>>>>>>> gdb /usr/mpi/gcc/openmpi-1.2-svn/tests/IMB-3.0/IMB-MPI1 > >>>>>>>>>>> > >>>>>>>>>>> Is anyone else seeing this problem? > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Jon > >>>>>>>>>>> _______________________________________________ > >>>>>>>>>>> devel mailing list > >>>>>>>>>>> de...@open-mpi.org > >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> devel mailing list > >>>>>>>>>> de...@open-mpi.org > >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> devel mailing list > >>>>>>>> de...@open-mpi.org > >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> devel mailing list > >>>>>>> de...@open-mpi.org > >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>>>> > >>>>>> _______________________________________________ > >>>>>> devel mailing list > >>>>>> de...@open-mpi.org > >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>>> > >>>>> _______________________________________________ > >>>>> devel mailing list > >>>>> de...@open-mpi.org > >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>> > >>>> _______________________________________________ > >>>> devel mailing list > >>>> de...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel