Ralph, Configuring using a proper --with-tm=... I find that I *can* run a singleton in an allocation ("qsub -I -l nodes=1 ...."). The case of a singleton on the front end is still failing.
The verbose output using "-mca state_base_verbose 5 -mca plm_base_verbose 5 -mca odls_base_verbose 5" is attached. -Paul On Fri, Jan 10, 2014 at 12:12 PM, Ralph Castain <r...@open-mpi.org> wrote: > > On Jan 10, 2014, at 11:04 AM, Paul Hargrove <phhargr...@lbl.gov> wrote: > > On Fri, Jan 10, 2014 at 10:41 AM, Paul Hargrove <phhargr...@lbl.gov>wrote: > >> >> On Fri, Jan 10, 2014 at 10:08 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> ??? that was it? Was this built with --enable-debug? >> >> >> Nope, I missed --enable-debug. Will try again. >> >> > OK, Take-2 below. > There is an obvious "recipient list is empty!" in the output. > > > That one is correct and expected - all it means is that you are running on > only one node, so mpirun doesn't need to relay messages to another daemon > > > -Paul > > $ mpirun -mca btl sm,self -np 2 -mca grpcomm_base_verbose 5 -mca > orte_nidmap_verbose 10 examples/ring_c' > [cvrsvc01:21200] mca:base:select:(grpcomm) Querying component [bad] > [cvrsvc01:21200] mca:base:select:(grpcomm) Query of component [bad] set > priority to 10 > [cvrsvc01:21200] mca:base:select:(grpcomm) Selected component [bad] > [cvrsvc01:21200] [[45961,0],0] grpcomm:base:receive start comm > [cvrsvc01:21200] [[45961,0],0] orte:util:encode_nidmap > [cvrsvc01:21200] [[45961,0],0] grpcomm:bad:xcast sent to job [45961,0] tag > 1 > [cvrsvc01:21200] [[45961,0],0] grpcomm:xcast:recv: with 1135 bytes > [cvrsvc01:21200] [[45961,0],0] orte:daemon:send_relay - recipient list is > empty! > [cvrsvc01:21200] [[45961,0],0] orte:util:encode_nidmap > [cvrsvc01:21200] [[45961,0],0] orte:util:build:daemon:nidmap packed 55 > bytes > [cvrsvc01:21200] [[45961,0],0] PROGRESSING COLL id 0 > [cvrsvc01:21200] [[45961,0],0] ALL LOCAL PROCS FOR JOB [45961,1] > CONTRIBUTE 2 > [cvrsvc01:21200] [[45961,0],0] PROGRESSING COLL id 1 > [cvrsvc01:21200] [[45961,0],0] ALL LOCAL PROCS FOR JOB [45961,1] > CONTRIBUTE 2 > [cvrsvc01:21200] [[45961,0],0] PROGRESSING COLL id 2 > [cvrsvc01:21200] [[45961,0],0] ALL LOCAL PROCS FOR JOB [45961,1] > CONTRIBUTE 2 > [cvrsvc01:21202] mca:base:select:(grpcomm) Querying component [bad] > [cvrsvc01:21202] mca:base:select:(grpcomm) Query of component [bad] set > priority to 10 > [cvrsvc01:21202] mca:base:select:(grpcomm) Selected component [bad] > [cvrsvc01:21202] [[45961,1],0] grpcomm:base:receive start comm > [cvrsvc01:21202] [[45961,1],0] ORTE_ERROR_LOG: Data for specified key not > found in file > /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-trunk-linux-x86_64-gcc/openmpi-1.9a1r30215/orte/runtime/orte_globals.c > at line 503 > [cvrsvc01:21203] mca:base:select:(grpcomm) Querying component [bad] > [cvrsvc01:21203] mca:base:select:(grpcomm) Query of component [bad] set > priority to 10 > [cvrsvc01:21203] mca:base:select:(grpcomm) Selected component [bad] > [cvrsvc01:21203] [[45961,1],1] grpcomm:base:receive start comm > [cvrsvc01:21203] [[45961,1],1] ORTE_ERROR_LOG: Data for specified key not > found in file > /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-trunk-linux-x86_64-gcc/openmpi-1.9a1r30215/orte/runtime/orte_globals.c > at line 503 > > > > This is very weird - it appears that your procs are looking for hostname > data prior to receiving the necessary data. Let's try jacking up the debug, > I guess - add "-mca state_base_verbose 5 -mca plm_base_verbose 5 -mca > odls_base_verbose 5" > > Sorry that will be rather wordy, but I don't understand the ordering you > show above. It's like your procs are skipping a bunch of steps in the > startup procedure. > > Out of curiosity, if you do have an allocation on run on it, does it work? > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
log-fe.bz2
Description: BZip2 compressed data