[OMPI devel] orte-dvm startup fails on HEAD

2015-08-21 Thread Mark Santcroos
Hi all,

I see the errors below on startup of orte-dvm on a Cray XE/XK hybrid.
Didn't track the commit that caused it yet, but maybe somebody has a clue from 
the error already.
Last known to work was on July 14. The 2.x branch works fine.

Please let me know if this should be a ticket.

Thanks

Mark


marksant@nid25254:~> orte-dvm 
VMURI: 2210136064.0;usock;tcp://10.128.99.109:52334
[nid25254:32107] OPAL dss:unpack: got type 110 when expecting type 9
[nid25254:32107] [[33724,0],0] ORTE_ERROR_LOG: Pack data mismatch in file 
../../../../orte/mca/odls/base/odls_base_default_fns.c at line 261
marksant@nid25254:~> orte-dvm -d
[nid25254:32172] procdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0/0
[nid25254:32172] jobdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0
[nid25254:32172] top: openmpi-sessions-45504@nid25254_0
[nid25254:32172] tmp: /tmp
[nid25254:32172] sess_dir_cleanup: job session dir does not exist
[nid25254:32172] procdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0/0
[nid25254:32172] jobdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0
[nid25254:32172] top: openmpi-sessions-45504@nid25254_0
[nid25254:32172] tmp: /tmp
VMURI: 2205876224.0;usock;tcp://10.128.99.109:39208
[nid25254:32172] plm:alps: final top-level argv:
[nid25254:32172] plm:alps: aprun -n 1 -N 1 -cc none -e 
PMI_NO_PREINITIALIZE=1 -e PMI_NO_FORK=1 -L 21959 orted -mca orte_debug 1 
--hnp-topo-sig 4N:2S:4L3:16L2:32L1:32C:32H:x86_64 -mca ess_base_jobid 
2205876224 -mca ess_base_vpid 1 -mca ess_base_num_procs 2 -mca orte_hnp_uri 
2205876224.0;usock;tcp://10.128.99.109:39208
[nid25254:32172] plm:alps: Set prefix:/u/sciteam/marksant/openmpi/installed/HEAD
[nid25254:32172] plm:alps: reset PATH: 
/u/sciteam/marksant/openmpi/installed/HEAD/bin:/u/sciteam/marksant/openmpi/installed/HEAD/bin:/u/sciteam/marksant/openmpi/tools/bin:/opt/cray/pmi/5.0.6-1..10439.140.3.gem/bin:/opt/gcc/4.8.2/bin:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/bin:/sw/admin/scripts:/sw/user/scripts:/sw/xe/altd/bin:/opt/moab/8.1/bin:/opt/moab/8.1/sbin:/opt/torque/5.0.2/sbin:/opt/torque/5.0.2/bin:/opt/cray/mpt/7.2.0/gni/bin:/opt/cray/craype/2.3.0/bin:/opt/cray/llm/default/bin:/opt/cray/llm/default/etc:/opt/cray/xpmem/0.1-2.0502.55507.3.2.gem/bin:/opt/cray/dmapp/7.0.1-1.0502.9501.5.211.gem/bin:/opt/cray/ugni/5.0-1.0502.9685.4.24.gem/bin:/opt/cray/udreg/2.3.2-1.0502.9275.1.25.gem/bin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.31.1_1.0502.8394.15.1-1.0502.19859.16.1/sbin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.31.1_1.0502.8394.15.1-1.0502.19859.16.1/bin:/opt/cray/alps/5.2.1-2.0502.9649.23.1.gem/sbin:/opt/cray/alps/5.2.1-2.0502.9649.23.1.gem/bin:/opt/cray/sdb/1.0-1.0502.55976.5.27.gem/bin:/opt/cray/nodestat/2.2-1.0502.53712.3.109.gem/bin:/opt/modules/3.2.10.3/bin:/u/sciteam/marksant/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:.:/usr/lib/qt3/bin:/opt/cray/bin
[nid25254:32172] plm:alps: reset LD_LIBRARY_PATH: 
/u/sciteam/marksant/openmpi/installed/HEAD/lib:/u/sciteam/marksant/openmpi/installed/HEAD/lib:/opt/gcc/mpc/0.8.1/lib:/opt/gcc/mpfr/2.4.2/lib:/opt/gcc/gmp/4.3.2/lib:/opt/gcc/4.8.2/snos/lib64:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/lib
[nid21959:01177] procdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0/1
[nid21959:01177] jobdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0
[nid21959:01177] top: openmpi-sessions-45504@nid21959_0
[nid21959:01177] tmp: /tmp
[nid21959:01177] sess_dir_cleanup: job session dir does not exist
[nid21959:01177] procdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0/1
[nid21959:01177] jobdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0
[nid21959:01177] top: openmpi-sessions-45504@nid21959_0
[nid21959:01177] tmp: /tmp
[nid25254:32172] [[33659,0],0] orted:comm:process_commands() Processing 
Command: ORTE_DAEMON_ADD_LOCAL_PROCS
[nid25254:32172] OPAL dss:unpack: got type 110 when expecting type 9
[nid25254:32172] [[33659,0],0] ORTE_ERROR_LOG: Pack data mismatch in file 
../../../../orte/mca/odls/base/odls_base_default_fns.c at line 261
[nid25254:32172] [[33659,0],0] orted:comm:add_procs failed to launch on error 
Pack data mismatch
[nid25254:32172] [[33659,0],0] orted:comm:process_commands() Processing 
Command: ORTE_DAEMON_EXIT_CMD
[nid21959:01177] 
[[33659,0],1]:../../../../../orte/mca/errmgr/default_orted/errmgr_default_orted.c(251)
 updating exit status to 1
[nid25254:32172] sess_dir_finalize: proc session dir does not exist
[nid25254:32172] sess_dir_cleanup: job session dir does not exist
exiting with status 0
marksant@nid25254:~> [nid21959:01177] sess_dir_finalize: proc session dir does 
not exist
[nid21959:01177] sess_dir_cleanup: job session dir does not exist
exiting with status 1
Application 25938733 exit codes: 1
Application 25938733 resources: utime ~0s, stime ~1s, Rss ~21456, inblocks 
~4629, outblocks ~104



Re: [OMPI devel] orte-dvm startup fails on HEAD

2015-08-21 Thread Ralph Castain
I’ll take a look at it

> On Aug 20, 2015, at 11:34 PM, Mark Santcroos  
> wrote:
> 
> Hi all,
> 
> I see the errors below on startup of orte-dvm on a Cray XE/XK hybrid.
> Didn't track the commit that caused it yet, but maybe somebody has a clue 
> from the error already.
> Last known to work was on July 14. The 2.x branch works fine.
> 
> Please let me know if this should be a ticket.
> 
> Thanks
> 
> Mark
> 
> 
> marksant@nid25254:~> orte-dvm 
> VMURI: 2210136064.0;usock;tcp://10.128.99.109:52334
> [nid25254:32107] OPAL dss:unpack: got type 110 when expecting type 9
> [nid25254:32107] [[33724,0],0] ORTE_ERROR_LOG: Pack data mismatch in file 
> ../../../../orte/mca/odls/base/odls_base_default_fns.c at line 261
> marksant@nid25254:~> orte-dvm -d
> [nid25254:32172] procdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0/0
> [nid25254:32172] jobdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0
> [nid25254:32172] top: openmpi-sessions-45504@nid25254_0
> [nid25254:32172] tmp: /tmp
> [nid25254:32172] sess_dir_cleanup: job session dir does not exist
> [nid25254:32172] procdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0/0
> [nid25254:32172] jobdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0
> [nid25254:32172] top: openmpi-sessions-45504@nid25254_0
> [nid25254:32172] tmp: /tmp
> VMURI: 2205876224.0;usock;tcp://10.128.99.109:39208
> [nid25254:32172] plm:alps: final top-level argv:
> [nid25254:32172] plm:alps: aprun -n 1 -N 1 -cc none -e 
> PMI_NO_PREINITIALIZE=1 -e PMI_NO_FORK=1 -L 21959 orted -mca orte_debug 1 
> --hnp-topo-sig 4N:2S:4L3:16L2:32L1:32C:32H:x86_64 -mca ess_base_jobid 
> 2205876224 -mca ess_base_vpid 1 -mca ess_base_num_procs 2 -mca orte_hnp_uri 
> 2205876224.0;usock;tcp://10.128.99.109:39208
> [nid25254:32172] plm:alps: Set 
> prefix:/u/sciteam/marksant/openmpi/installed/HEAD
> [nid25254:32172] plm:alps: reset PATH: 
> /u/sciteam/marksant/openmpi/installed/HEAD/bin:/u/sciteam/marksant/openmpi/installed/HEAD/bin:/u/sciteam/marksant/openmpi/tools/bin:/opt/cray/pmi/5.0.6-1..10439.140.3.gem/bin:/opt/gcc/4.8.2/bin:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/bin:/sw/admin/scripts:/sw/user/scripts:/sw/xe/altd/bin:/opt/moab/8.1/bin:/opt/moab/8.1/sbin:/opt/torque/5.0.2/sbin:/opt/torque/5.0.2/bin:/opt/cray/mpt/7.2.0/gni/bin:/opt/cray/craype/2.3.0/bin:/opt/cray/llm/default/bin:/opt/cray/llm/default/etc:/opt/cray/xpmem/0.1-2.0502.55507.3.2.gem/bin:/opt/cray/dmapp/7.0.1-1.0502.9501.5.211.gem/bin:/opt/cray/ugni/5.0-1.0502.9685.4.24.gem/bin:/opt/cray/udreg/2.3.2-1.0502.9275.1.25.gem/bin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.31.1_1.0502.8394.15.1-1.0502.19859.16.1/sbin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.31.1_1.0502.8394.15.1-1.0502.19859.16.1/bin:/opt/cray/alps/5.2.1-2.0502.9649.23.1.gem/sbin:/opt/cray/alps/5.2.1-2.0502.9649.23.1.gem/bin:/opt/cray/sdb/1.0-
> 1.0502.55976.5.27.gem/bin:/opt/cray/nodestat/2.2-1.0502.53712.3.109.gem/bin:/opt/modules/3.2.10.3/bin:/u/sciteam/marksant/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:.:/usr/lib/qt3/bin:/opt/cray/bin
> [nid25254:32172] plm:alps: reset LD_LIBRARY_PATH: 
> /u/sciteam/marksant/openmpi/installed/HEAD/lib:/u/sciteam/marksant/openmpi/installed/HEAD/lib:/opt/gcc/mpc/0.8.1/lib:/opt/gcc/mpfr/2.4.2/lib:/opt/gcc/gmp/4.3.2/lib:/opt/gcc/4.8.2/snos/lib64:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/lib
> [nid21959:01177] procdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0/1
> [nid21959:01177] jobdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0
> [nid21959:01177] top: openmpi-sessions-45504@nid21959_0
> [nid21959:01177] tmp: /tmp
> [nid21959:01177] sess_dir_cleanup: job session dir does not exist
> [nid21959:01177] procdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0/1
> [nid21959:01177] jobdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0
> [nid21959:01177] top: openmpi-sessions-45504@nid21959_0
> [nid21959:01177] tmp: /tmp
> [nid25254:32172] [[33659,0],0] orted:comm:process_commands() Processing 
> Command: ORTE_DAEMON_ADD_LOCAL_PROCS
> [nid25254:32172] OPAL dss:unpack: got type 110 when expecting type 9
> [nid25254:32172] [[33659,0],0] ORTE_ERROR_LOG: Pack data mismatch in file 
> ../../../../orte/mca/odls/base/odls_base_default_fns.c at line 261
> [nid25254:32172] [[33659,0],0] orted:comm:add_procs failed to launch on error 
> Pack data mismatch
> [nid25254:32172] [[33659,0],0] orted:comm:process_commands() Processing 
> Command: ORTE_DAEMON_EXIT_CMD
> [nid21959:01177] 
> [[33659,0],1]:../../../../../orte/mca/errmgr/default_orted/errmgr_default_orted.c(251)
>  updating exit status to 1
> [nid25254:32172] sess_dir_finalize: proc session dir does not exist
> [nid25254:32172] sess_dir_cleanup: job session dir does not exist
> exiting with status 0
> marksant@nid25254:~> [nid21959:01177] sess_dir_finalize: proc session dir 
> does not exist
> [nid21959:01177] sess_dir_cleanup: job session dir does not exist
> exiting with status 1
> Application 25938733 exit codes: 1
> Application 

Re: [OMPI devel] orte-dvm startup fails on HEAD

2015-08-21 Thread Howard Pritchard
I will check if i can reproduce on nersc systems.

--

sent from my smart phonr so no good type.

Howard
On Aug 21, 2015 7:51 AM, "Ralph Castain"  wrote:

> I’ll take a look at it
>
> > On Aug 20, 2015, at 11:34 PM, Mark Santcroos 
> wrote:
> >
> > Hi all,
> >
> > I see the errors below on startup of orte-dvm on a Cray XE/XK hybrid.
> > Didn't track the commit that caused it yet, but maybe somebody has a
> clue from the error already.
> > Last known to work was on July 14. The 2.x branch works fine.
> >
> > Please let me know if this should be a ticket.
> >
> > Thanks
> >
> > Mark
> >
> >
> > marksant@nid25254:~> orte-dvm
> > VMURI: 2210136064.0;usock;tcp://10.128.99.109:52334
> > [nid25254:32107] OPAL dss:unpack: got type 110 when expecting type 9
> > [nid25254:32107] [[33724,0],0] ORTE_ERROR_LOG: Pack data mismatch in
> file ../../../../orte/mca/odls/base/odls_base_default_fns.c at line 261
> > marksant@nid25254:~> orte-dvm -d
> > [nid25254:32172] procdir: /tmp/openmpi-sessions-45504@nid25254_0
> /33659/0/0
> > [nid25254:32172] jobdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0
> > [nid25254:32172] top: openmpi-sessions-45504@nid25254_0
> > [nid25254:32172] tmp: /tmp
> > [nid25254:32172] sess_dir_cleanup: job session dir does not exist
> > [nid25254:32172] procdir: /tmp/openmpi-sessions-45504@nid25254_0
> /33659/0/0
> > [nid25254:32172] jobdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0
> > [nid25254:32172] top: openmpi-sessions-45504@nid25254_0
> > [nid25254:32172] tmp: /tmp
> > VMURI: 2205876224.0;usock;tcp://10.128.99.109:39208
> > [nid25254:32172] plm:alps: final top-level argv:
> > [nid25254:32172] plm:alps: aprun -n 1 -N 1 -cc none -e
> PMI_NO_PREINITIALIZE=1 -e PMI_NO_FORK=1 -L 21959 orted -mca orte_debug 1
> --hnp-topo-sig 4N:2S:4L3:16L2:32L1:32C:32H:x86_64 -mca ess_base_jobid
> 2205876224 -mca ess_base_vpid 1 -mca ess_base_num_procs 2 -mca orte_hnp_uri
> 2205876224.0;usock;tcp://10.128.99.109:39208
> > [nid25254:32172] plm:alps: Set
> prefix:/u/sciteam/marksant/openmpi/installed/HEAD
> > [nid25254:32172] plm:alps: reset PATH:
> /u/sciteam/marksant/openmpi/installed/HEAD/bin:/u/sciteam/marksant/openmpi/installed/HEAD/bin:/u/sciteam/marksant/openmpi/tools/bin:/opt/cray/pmi/5.0.6-1..10439.140.3.gem/bin:/opt/gcc/4.8.2/bin:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/bin:/sw/admin/scripts:/sw/user/scripts:/sw/xe/altd/bin:/opt/moab/8.1/bin:/opt/moab/8.1/sbin:/opt/torque/5.0.2/sbin:/opt/torque/5.0.2/bin:/opt/cray/mpt/7.2.0/gni/bin:/opt/cray/craype/2.3.0/bin:/opt/cray/llm/default/bin:/opt/cray/llm/default/etc:/opt/cray/xpmem/0.1-2.0502.55507.3.2.gem/bin:/opt/cray/dmapp/7.0.1-1.0502.9501.5.211.gem/bin:/opt/cray/ugni/5.0-1.0502.9685.4.24.gem/bin:/opt/cray/udreg/2.3.2-1.0502.9275.1.25.gem/bin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.31.1_1.0502.8394.15.1-1.0502.19859.16.1/sbin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.31.1_1.0502.8394.15.1-1.0502.19859.16.1/bin:/opt/cray/alps/5.2.1-2.0502.9649.23.1.gem/sbin:/opt/cray/alps/5.2.1-2.0502.9649.23.1.gem/bin:/opt/cray/sdb/1.0-
> >
> 1.0502.55976.5.27.gem/bin:/opt/cray/nodestat/2.2-1.0502.53712.3.109.gem/bin:/opt/modules/
> 3.2.10.3/bin:/u/sciteam/marksant/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:.:/usr/lib/qt3/bin:/opt/cray/bin
> > [nid25254:32172] plm:alps: reset LD_LIBRARY_PATH:
> /u/sciteam/marksant/openmpi/installed/HEAD/lib:/u/sciteam/marksant/openmpi/installed/HEAD/lib:/opt/gcc/mpc/0.8.1/lib:/opt/gcc/mpfr/2.4.2/lib:/opt/gcc/gmp/4.3.2/lib:/opt/gcc/4.8.2/snos/lib64:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/lib
> > [nid21959:01177] procdir: /tmp/openmpi-sessions-45504@nid21959_0
> /33659/0/1
> > [nid21959:01177] jobdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0
> > [nid21959:01177] top: openmpi-sessions-45504@nid21959_0
> > [nid21959:01177] tmp: /tmp
> > [nid21959:01177] sess_dir_cleanup: job session dir does not exist
> > [nid21959:01177] procdir: /tmp/openmpi-sessions-45504@nid21959_0
> /33659/0/1
> > [nid21959:01177] jobdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0
> > [nid21959:01177] top: openmpi-sessions-45504@nid21959_0
> > [nid21959:01177] tmp: /tmp
> > [nid25254:32172] [[33659,0],0] orted:comm:process_commands() Processing
> Command: ORTE_DAEMON_ADD_LOCAL_PROCS
> > [nid25254:32172] OPAL dss:unpack: got type 110 when expecting type 9
> > [nid25254:32172] [[33659,0],0] ORTE_ERROR_LOG: Pack data mismatch in
> file ../../../../orte/mca/odls/base/odls_base_default_fns.c at line 261
> > [nid25254:32172] [[33659,0],0] orted:comm:add_procs failed to launch on
> error Pack data mismatch
> > [nid25254:32172] [[33659,0],0] orted:comm:process_commands() Processing
> Command: ORTE_DAEMON_EXIT_CMD
> > [nid21959:01177]
> [[33659,0],1]:../../../../../orte/mca/errmgr/default_orted/errmgr_default_orted.c(251)
> updating exit status to 1
> > [nid25254:32172] sess_dir_finalize: proc session dir does not exist
> > [nid25254:32172] sess_dir_cleanup: job session dir does not ex

Re: [OMPI devel] orte-dvm startup fails on HEAD

2015-08-21 Thread Ralph Castain
I found the problem, Howard - has nothing to do with the Cray, but is a 
selection issue on the state framework.


> On Aug 21, 2015, at 7:37 AM, Howard Pritchard  wrote:
> 
> I will check if i can reproduce on nersc systems.
> 
> --
> 
> sent from my smart phonr so no good type.
> 
> Howard
> 
> On Aug 21, 2015 7:51 AM, "Ralph Castain"  > wrote:
> I’ll take a look at it
> 
> > On Aug 20, 2015, at 11:34 PM, Mark Santcroos  > > wrote:
> >
> > Hi all,
> >
> > I see the errors below on startup of orte-dvm on a Cray XE/XK hybrid.
> > Didn't track the commit that caused it yet, but maybe somebody has a clue 
> > from the error already.
> > Last known to work was on July 14. The 2.x branch works fine.
> >
> > Please let me know if this should be a ticket.
> >
> > Thanks
> >
> > Mark
> >
> >
> > marksant@nid25254:~> orte-dvm
> > VMURI: 2210136064.0;usock;tcp://10.128.99.109:52334 
> > 
> > [nid25254:32107] OPAL dss:unpack: got type 110 when expecting type 9
> > [nid25254:32107] [[33724,0],0] ORTE_ERROR_LOG: Pack data mismatch in file 
> > ../../../../orte/mca/odls/base/odls_base_default_fns.c at line 261
> > marksant@nid25254:~> orte-dvm -d
> > [nid25254:32172] procdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0/0
> > [nid25254:32172] jobdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0
> > [nid25254:32172] top: openmpi-sessions-45504@nid25254_0
> > [nid25254:32172] tmp: /tmp
> > [nid25254:32172] sess_dir_cleanup: job session dir does not exist
> > [nid25254:32172] procdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0/0
> > [nid25254:32172] jobdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0
> > [nid25254:32172] top: openmpi-sessions-45504@nid25254_0
> > [nid25254:32172] tmp: /tmp
> > VMURI: 2205876224 .0;usock;tcp://10.128.99.109:39208 
> > 
> > [nid25254:32172] plm:alps: final top-level argv:
> > [nid25254:32172] plm:alps: aprun -n 1 -N 1 -cc none -e 
> > PMI_NO_PREINITIALIZE=1 -e PMI_NO_FORK=1 -L 21959 orted -mca orte_debug 1 
> > --hnp-topo-sig 4N:2S:4L3:16L2:32L1:32C:32H:x86_64 -mca ess_base_jobid 
> > 2205876224 -mca ess_base_vpid 1 -mca ess_base_num_procs 2 -mca orte_hnp_uri 
> > 2205876224.0;usock;tcp://10.128.99.109:39208 
> > [nid25254:32172] plm:alps: Set 
> > prefix:/u/sciteam/marksant/openmpi/installed/HEAD
> > [nid25254:32172] plm:alps: reset PATH: 
> > /u/sciteam/marksant/openmpi/installed/HEAD/bin:/u/sciteam/marksant/openmpi/installed/HEAD/bin:/u/sciteam/marksant/openmpi/tools/bin:/opt/cray/pmi/5.0.6-1..10439.140.3.gem/bin:/opt/gcc/4.8.2/bin:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/bin:/sw/admin/scripts:/sw/user/scripts:/sw/xe/altd/bin:/opt/moab/8.1/bin:/opt/moab/8.1/sbin:/opt/torque/5.0.2/sbin:/opt/torque/5.0.2/bin:/opt/cray/mpt/7.2.0/gni/bin:/opt/cray/craype/2.3.0/bin:/opt/cray/llm/default/bin:/opt/cray/llm/default/etc:/opt/cray/xpmem/0.1-2.0502.55507.3.2.gem/bin:/opt/cray/dmapp/7.0.1-1.0502.9501.5.211.gem/bin:/opt/cray/ugni/5.0-1.0502.9685.4.24.gem/bin:/opt/cray/udreg/2.3.2-1.0502.9275.1.25.gem/bin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.31.1_1.0502.8394.15.1-1.0502.19859.16.1/sbin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.31.1_1.0502.8394.15.1-1.0502.19859.16.1/bin:/opt/cray/alps/5.2.1-2.0502.9649.23.1.gem/sbin:/opt/cray/alps/5.2.1-2.0502.9649.23.1.gem/bin:/opt/cray/sdb/1.0-
> > 1.0502.55976.5.27.gem/bin:/opt/cray/nodestat/2.2-1.0502.53712.3.109.gem/bin:/opt/modules/3.2.10.3/bin:/u/sciteam/marksant/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:.:/usr/lib/qt3/bin:/opt/cray/bin
> >  
> > 
> > [nid25254:32172] plm:alps: reset LD_LIBRARY_PATH: 
> > /u/sciteam/marksant/openmpi/installed/HEAD/lib:/u/sciteam/marksant/openmpi/installed/HEAD/lib:/opt/gcc/mpc/0.8.1/lib:/opt/gcc/mpfr/2.4.2/lib:/opt/gcc/gmp/4.3.2/lib:/opt/gcc/4.8.2/snos/lib64:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/lib
> > [nid21959:01177] procdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0/1
> > [nid21959:01177] jobdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0
> > [nid21959:01177] top: openmpi-sessions-45504@nid21959_0
> > [nid21959:01177] tmp: /tmp
> > [nid21959:01177] sess_dir_cleanup: job session dir does not exist
> > [nid21959:01177] procdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0/1
> > [nid21959:01177] jobdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0
> > [nid21959:01177] top: openmpi-sessions-45504@nid21959_0
> > [nid21959:01177] tmp: /tmp
> > [nid25254:32172] [[33659,0],0] orted:comm:process_commands() Processing 
> > Command: ORTE_DAEMON_ADD_LOCAL_PROCS
> > [nid25254:32172] OPAL dss:unpack: got type 110 when expecting type 9
> > [nid25254:32172] [[33659,0],0] ORTE_ERROR_LOG: Pack data mismatch 

Re: [OMPI devel] orte-dvm startup fails on HEAD

2015-08-21 Thread Ralph Castain
Okay Mark, I just pushed a fix. Sorry for the problem


> On Aug 21, 2015, at 7:39 AM, Ralph Castain  wrote:
> 
> I found the problem, Howard - has nothing to do with the Cray, but is a 
> selection issue on the state framework.
> 
> 
>> On Aug 21, 2015, at 7:37 AM, Howard Pritchard > > wrote:
>> 
>> I will check if i can reproduce on nersc systems.
>> 
>> --
>> 
>> sent from my smart phonr so no good type.
>> 
>> Howard
>> 
>> On Aug 21, 2015 7:51 AM, "Ralph Castain" > > wrote:
>> I’ll take a look at it
>> 
>> > On Aug 20, 2015, at 11:34 PM, Mark Santcroos > > > wrote:
>> >
>> > Hi all,
>> >
>> > I see the errors below on startup of orte-dvm on a Cray XE/XK hybrid.
>> > Didn't track the commit that caused it yet, but maybe somebody has a clue 
>> > from the error already.
>> > Last known to work was on July 14. The 2.x branch works fine.
>> >
>> > Please let me know if this should be a ticket.
>> >
>> > Thanks
>> >
>> > Mark
>> >
>> >
>> > marksant@nid25254:~> orte-dvm
>> > VMURI: 2210136064.0;usock;tcp://10.128.99.109:52334 
>> > 
>> > [nid25254:32107] OPAL dss:unpack: got type 110 when expecting type 9
>> > [nid25254:32107] [[33724,0],0] ORTE_ERROR_LOG: Pack data mismatch in file 
>> > ../../../../orte/mca/odls/base/odls_base_default_fns.c at line 261
>> > marksant@nid25254:~> orte-dvm -d
>> > [nid25254:32172] procdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0/0
>> > [nid25254:32172] jobdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0
>> > [nid25254:32172] top: openmpi-sessions-45504@nid25254_0
>> > [nid25254:32172] tmp: /tmp
>> > [nid25254:32172] sess_dir_cleanup: job session dir does not exist
>> > [nid25254:32172] procdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0/0
>> > [nid25254:32172] jobdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0
>> > [nid25254:32172] top: openmpi-sessions-45504@nid25254_0
>> > [nid25254:32172] tmp: /tmp
>> > VMURI: 2205876224 .0;usock;tcp://10.128.99.109:39208 
>> > 
>> > [nid25254:32172] plm:alps: final top-level argv:
>> > [nid25254:32172] plm:alps: aprun -n 1 -N 1 -cc none -e 
>> > PMI_NO_PREINITIALIZE=1 -e PMI_NO_FORK=1 -L 21959 orted -mca orte_debug 1 
>> > --hnp-topo-sig 4N:2S:4L3:16L2:32L1:32C:32H:x86_64 -mca ess_base_jobid 
>> > 2205876224 -mca ess_base_vpid 1 -mca ess_base_num_procs 2 -mca 
>> > orte_hnp_uri 2205876224.0;usock;tcp://10.128.99.109:39208 
>> > 
>> > [nid25254:32172] plm:alps: Set 
>> > prefix:/u/sciteam/marksant/openmpi/installed/HEAD
>> > [nid25254:32172] plm:alps: reset PATH: 
>> > /u/sciteam/marksant/openmpi/installed/HEAD/bin:/u/sciteam/marksant/openmpi/installed/HEAD/bin:/u/sciteam/marksant/openmpi/tools/bin:/opt/cray/pmi/5.0.6-1..10439.140.3.gem/bin:/opt/gcc/4.8.2/bin:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/bin:/sw/admin/scripts:/sw/user/scripts:/sw/xe/altd/bin:/opt/moab/8.1/bin:/opt/moab/8.1/sbin:/opt/torque/5.0.2/sbin:/opt/torque/5.0.2/bin:/opt/cray/mpt/7.2.0/gni/bin:/opt/cray/craype/2.3.0/bin:/opt/cray/llm/default/bin:/opt/cray/llm/default/etc:/opt/cray/xpmem/0.1-2.0502.55507.3.2.gem/bin:/opt/cray/dmapp/7.0.1-1.0502.9501.5.211.gem/bin:/opt/cray/ugni/5.0-1.0502.9685.4.24.gem/bin:/opt/cray/udreg/2.3.2-1.0502.9275.1.25.gem/bin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.31.1_1.0502.8394.15.1-1.0502.19859.16.1/sbin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.31.1_1.0502.8394.15.1-1.0502.19859.16.1/bin:/opt/cray/alps/5.2.1-2.0502.9649.23.1.gem/sbin:/opt/cray/alps/5.2.1-2.0502.9649.23.1.gem/bin:/opt/cray/sdb/1.0-
>> > 1.0502.55976.5.27.gem/bin:/opt/cray/nodestat/2.2-1.0502.53712.3.109.gem/bin:/opt/modules/3.2.10.3/bin:/u/sciteam/marksant/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:.:/usr/lib/qt3/bin:/opt/cray/bin
>> >  
>> > 
>> > [nid25254:32172] plm:alps: reset LD_LIBRARY_PATH: 
>> > /u/sciteam/marksant/openmpi/installed/HEAD/lib:/u/sciteam/marksant/openmpi/installed/HEAD/lib:/opt/gcc/mpc/0.8.1/lib:/opt/gcc/mpfr/2.4.2/lib:/opt/gcc/gmp/4.3.2/lib:/opt/gcc/4.8.2/snos/lib64:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/lib
>> > [nid21959:01177] procdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0/1
>> > [nid21959:01177] jobdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0
>> > [nid21959:01177] top: openmpi-sessions-45504@nid21959_0
>> > [nid21959:01177] tmp: /tmp
>> > [nid21959:01177] sess_dir_cleanup: job session dir does not exist
>> > [nid21959:01177] procdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0/1
>> > [nid21959:01177] jobdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0
>> > [nid21959:01177] top: openmpi-sessions-45504@nid21959_0
>> > [nid21959:01177] tmp: /tmp
>> > [nid25254:32172] [[33659,0

Re: [OMPI devel] 1.10.0rc3 - second Solaris build failure

2015-08-21 Thread Jeff Squyres (jsquyres)
Followups are occurring on https://github.com/open-mpi/ompi-release/pull/529

> On Aug 20, 2015, at 6:50 PM, Paul Hargrove  wrote:
> 
> 
> On Thu, Aug 20, 2015 at 1:14 PM, Jeff Squyres (jsquyres)  
> wrote:
> > And therefore it didn't generate libmpi_mpifh_sizeof.a (gfortran  > generate an effectively "empty" libmpi_mpifh_sizeof.a).  Hence, a 
> > subsequent link that depended on that library failed.
> 
> Paul: can you verify my theory?
> 
> Do this in your existing build:
> 
> -
> rm -f ompi/mpi/fortran/base/gen-mpi-sizeof.pl
> wget \
>   
> https://raw.githubusercontent.com/open-mpi/ompi/master/ompi/mpi/fortran/base/gen-mpi-sizeof.pl
>  \
>   -O ompi/mpi/fortran/base/gen-mpi-sizeof.pl
> chmod +x ompi/mpi/fortran/base/gen-mpi-sizeof.pl
> rm ompi/mpi/fortran/mpif-h/profile/psizeof_f.f90
> make -j 32
> 
> I made changes to your instruction appropriate to my VPATH build (cd $BLDDIR 
> after the chmod).
> Solaris make has no '-j' option, but since I am running in a VM on a 
> dual-core laptop I chose to omit "-j 32" even after switching to gmake.
> 
> Good-natured-nit-picking aside, the solution does NOT work (it may be 
> necessary, but is not sufficient).
> There is a new generated psizeof_f.f90, containing a dummy subroutine, but my 
> pandas are still sad.
> In fact, these pandas are so despondent that they started chewing on your 
> .gitconfig file (but I asked them to be --quiet about it).
> 
> A log from "gmake clean all V=1" in the mpif-h directory is (again) attached.
> 
> I direct your attention to the following line:
> /bin/sh ../../../../libtool  --tag=FC   --mode=link f90  -m32 -g   -o 
> libmpi_mpifh_sizeof.la-lm -lsocket -lnsl  
> Somebody appears to have specified no linker inputs!
> On other platforms I see a "sizeof_f.lo" immediately before the -l options.
> I am pretty sure this is a contributing factor. ;-)
> 
> -Paul 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/08/17780.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] orte-dvm startup fails on HEAD

2015-08-21 Thread Mark Santcroos
Thanks Ralph.
The machine in question is in maintenance currently, so can't check, will get 
back to you as soon as I can.

> On 21 Aug 2015, at 16:51 , Ralph Castain  wrote:
> 
> Okay Mark, I just pushed a fix. Sorry for the problem
> 
> 
>> On Aug 21, 2015, at 7:39 AM, Ralph Castain  wrote:
>> 
>> I found the problem, Howard - has nothing to do with the Cray, but is a 
>> selection issue on the state framework.
>> 
>> 
>>> On Aug 21, 2015, at 7:37 AM, Howard Pritchard  wrote:
>>> 
>>> I will check if i can reproduce on nersc systems.
>>> 
>>> --
>>> 
>>> sent from my smart phonr so no good type.
>>> 
>>> Howard
>>> 
>>> On Aug 21, 2015 7:51 AM, "Ralph Castain"  wrote:
>>> I’ll take a look at it
>>> 
>>> > On Aug 20, 2015, at 11:34 PM, Mark Santcroos  
>>> > wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I see the errors below on startup of orte-dvm on a Cray XE/XK hybrid.
>>> > Didn't track the commit that caused it yet, but maybe somebody has a clue 
>>> > from the error already.
>>> > Last known to work was on July 14. The 2.x branch works fine.
>>> >
>>> > Please let me know if this should be a ticket.
>>> >
>>> > Thanks
>>> >
>>> > Mark
>>> >
>>> >
>>> > marksant@nid25254:~> orte-dvm
>>> > VMURI: 2210136064.0;usock;tcp://10.128.99.109:52334
>>> > [nid25254:32107] OPAL dss:unpack: got type 110 when expecting type 9
>>> > [nid25254:32107] [[33724,0],0] ORTE_ERROR_LOG: Pack data mismatch in file 
>>> > ../../../../orte/mca/odls/base/odls_base_default_fns.c at line 261
>>> > marksant@nid25254:~> orte-dvm -d
>>> > [nid25254:32172] procdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0/0
>>> > [nid25254:32172] jobdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0
>>> > [nid25254:32172] top: openmpi-sessions-45504@nid25254_0
>>> > [nid25254:32172] tmp: /tmp
>>> > [nid25254:32172] sess_dir_cleanup: job session dir does not exist
>>> > [nid25254:32172] procdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0/0
>>> > [nid25254:32172] jobdir: /tmp/openmpi-sessions-45504@nid25254_0/33659/0
>>> > [nid25254:32172] top: openmpi-sessions-45504@nid25254_0
>>> > [nid25254:32172] tmp: /tmp
>>> > VMURI: 2205876224.0;usock;tcp://10.128.99.109:39208
>>> > [nid25254:32172] plm:alps: final top-level argv:
>>> > [nid25254:32172] plm:alps: aprun -n 1 -N 1 -cc none -e 
>>> > PMI_NO_PREINITIALIZE=1 -e PMI_NO_FORK=1 -L 21959 orted -mca orte_debug 1 
>>> > --hnp-topo-sig 4N:2S:4L3:16L2:32L1:32C:32H:x86_64 -mca ess_base_jobid 
>>> > 2205876224 -mca ess_base_vpid 1 -mca ess_base_num_procs 2 -mca 
>>> > orte_hnp_uri 2205876224.0;usock;tcp://10.128.99.109:39208
>>> > [nid25254:32172] plm:alps: Set 
>>> > prefix:/u/sciteam/marksant/openmpi/installed/HEAD
>>> > [nid25254:32172] plm:alps: reset PATH: 
>>> > /u/sciteam/marksant/openmpi/installed/HEAD/bin:/u/sciteam/marksant/openmpi/installed/HEAD/bin:/u/sciteam/marksant/openmpi/tools/bin:/opt/cray/pmi/5.0.6-1..10439.140.3.gem/bin:/opt/gcc/4.8.2/bin:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/bin:/sw/admin/scripts:/sw/user/scripts:/sw/xe/altd/bin:/opt/moab/8.1/bin:/opt/moab/8.1/sbin:/opt/torque/5.0.2/sbin:/opt/torque/5.0.2/bin:/opt/cray/mpt/7.2.0/gni/bin:/opt/cray/craype/2.3.0/bin:/opt/cray/llm/default/bin:/opt/cray/llm/default/etc:/opt/cray/xpmem/0.1-2.0502.55507.3.2.gem/bin:/opt/cray/dmapp/7.0.1-1.0502.9501.5.211.gem/bin:/opt/cray/ugni/5.0-1.0502.9685.4.24.gem/bin:/opt/cray/udreg/2.3.2-1.0502.9275.1.25.gem/bin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.31.1_1.0502.8394.15.1-1.0502.19859.16.1/sbin:/opt/cray/lustre-cray_gem_s/2.5_3.0.101_0.31.1_1.0502.8394.15.1-1.0502.19859.16.1/bin:/opt/cray/alps/5.2.1-2.0502.9649.23.1.gem/sbin:/opt/cray/alps/5.2.1-2.0502.9649.23.1.gem/bin:/opt/cray/sdb/1.0-
>>> > 1.0502.55976.5.27.gem/bin:/opt/cray/nodestat/2.2-1.0502.53712.3.109.gem/bin:/opt/modules/3.2.10.3/bin:/u/sciteam/marksant/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:.:/usr/lib/qt3/bin:/opt/cray/bin
>>> > [nid25254:32172] plm:alps: reset LD_LIBRARY_PATH: 
>>> > /u/sciteam/marksant/openmpi/installed/HEAD/lib:/u/sciteam/marksant/openmpi/installed/HEAD/lib:/opt/gcc/mpc/0.8.1/lib:/opt/gcc/mpfr/2.4.2/lib:/opt/gcc/gmp/4.3.2/lib:/opt/gcc/4.8.2/snos/lib64:/sw/xe/darshan/2.3.0/darshan-2.3.0_cle52/lib
>>> > [nid21959:01177] procdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0/1
>>> > [nid21959:01177] jobdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0
>>> > [nid21959:01177] top: openmpi-sessions-45504@nid21959_0
>>> > [nid21959:01177] tmp: /tmp
>>> > [nid21959:01177] sess_dir_cleanup: job session dir does not exist
>>> > [nid21959:01177] procdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0/1
>>> > [nid21959:01177] jobdir: /tmp/openmpi-sessions-45504@nid21959_0/33659/0
>>> > [nid21959:01177] top: openmpi-sessions-45504@nid21959_0
>>> > [nid21959:01177] tmp: /tmp
>>> > [nid25254:32172] [[33659,0],0] orted:comm:process_commands() Processing 
>>> > Command: ORTE_DAEMON_ADD_LOCAL_PROCS
>>> > [nid25254:32172] OPAL dss:unpack: got type