Re: [OMPI devel] [OMPI svn] svn:open-mpi r20740
And reverted both r20739 and r20740. With best regards, Rainer On Thursday 05 March 2009 04:25:18 pm Ralph Castain wrote: > This is what we expressly said NOT to do in Louisville -- Rainer Keller, PhD Tel: +1 (865) 241-6293 Oak Ridge National Lab Fax: +1 (865) 241-4811 PO Box 2008 MS 6164 Email: kel...@ornl.gov Oak Ridge, TN 37831-2008AIM/Skype: rusraink
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20739
Hi Jeff, On Thursday 05 March 2009 04:06:53 pm Jeff Squyres wrote: > Can you please explain this change? It seems like a very large code > change for such a trivial name change. Why was it necessary to change > orte_process_info to orte_proc_info and change all these files? Yes, I know, as the previous patches, this touches a lot of files, as was the second one, as we discussed in Louisville. > It feels like we're getting into "I like this name better than that > name" kinds of changes... :-( That's not what it is supposed to be about ;-) This patch was to ease the handling within redefinition files and scripts that we need to do to handle the transition. Of course, I can revert the change. With best regards, Rainer -- Rainer Keller, PhD Tel: +1 (865) 241-6293 Oak Ridge National Lab Fax: +1 (865) 241-4811 PO Box 2008 MS 6164 Email: kel...@ornl.gov Oak Ridge, TN 37831-2008AIM/Skype: rusraink
Re: [OMPI devel] [OMPI svn] svn:open-mpi r20740
This is what we expressly said NOT to do in Louisville Please remit this Ralph On Mar 5, 2009, at 2:14 PM, rusra...@osl.iu.edu wrote: Author: rusraink Date: 2009-03-05 16:14:18 EST (Thu, 05 Mar 2009) New Revision: 20740 URL: https://svn.open-mpi.org/trac/ompi/changeset/20740 Log: - Second patch, as discussed in Louisville. Replace short macros in orte/util/name_fns.h to the actual fct. call. - Compiles on linux/x86-64 Text files modified: trunk/ompi/mca/bml/r2/bml_r2.c | 4 trunk/ompi/mca/btl/base/btl_base_error.c | 2 trunk/ompi/mca/btl/base/btl_base_error.h | 8 trunk/ompi/mca/btl/gm/btl_gm_component.c | 2 trunk/ompi/mca/btl/gm/btl_gm_proc.c| 6 trunk/ompi/mca/btl/mx/btl_mx_proc.c| 4 trunk/ompi/mca/btl/ofud/btl_ofud_proc.c| 4 trunk/ompi/mca/btl/openib/btl_openib_proc.c| 2 trunk/ompi/mca/btl/pcie/btl_pcie_proc.c| 2 trunk/ompi/mca/btl/tcp/btl_tcp_endpoint.c | 2 trunk/ompi/mca/btl/udapl/btl_udapl_proc.c | 4 trunk/ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c | 196 trunk/ompi/mca/dpm/orte/dpm_orte.c |46 trunk/ompi/mca/mpool/base/mpool_base_lookup.c | 2 trunk/ompi/mca/mpool/base/mpool_base_tree.c| 4 trunk/ompi/mca/mpool/rdma/mpool_rdma_module.c | 2 trunk/ompi/mca/pml/base/pml_base_select.c | 8 trunk/ompi/mca/pubsub/orte/pubsub_orte.c |12 +- trunk/ompi/tools/ompi-server/ompi-server.c | 4 trunk/orte/mca/errmgr/base/errmgr_base_fns.c | 2 trunk/orte/mca/errmgr/default/errmgr_default.c |16 +- trunk/orte/mca/ess/alps/ess_alps_module.c |36 +++--- trunk/orte/mca/ess/base/ess_base_std_app.c | 4 trunk/orte/mca/ess/base/ess_base_std_orted.c | 2 trunk/orte/mca/ess/bproc/ess_bproc_module.c|28 ++-- trunk/orte/mca/ess/env/ess_env_module.c|38 +++--- trunk/orte/mca/ess/hnp/ess_hnp_module.c|42 trunk/orte/mca/ess/lsf/ess_lsf_module.c|34 +++--- trunk/orte/mca/ess/singleton/ess_singleton_module.c|34 +++--- trunk/orte/mca/ess/slave/ess_slave_module.c| 6 trunk/orte/mca/ess/slurm/ess_slurm_module.c|40 trunk/orte/mca/ess/slurmd/ess_slurmd_module.c |42 trunk/orte/mca/filem/base/filem_base_receive.c |14 +- trunk/orte/mca/filem/rsh/filem_rsh_module.c|52 +- trunk/orte/mca/grpcomm/bad/grpcomm_bad_module.c|46 trunk/orte/mca/grpcomm/base/grpcomm_base_allgather.c |22 ++-- trunk/orte/mca/grpcomm/base/grpcomm_base_coll.c|32 +++--- trunk/orte/mca/grpcomm/base/grpcomm_base_modex.c |78 +++--- trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c|76 +++--- trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c |36 +++--- trunk/orte/mca/iof/base/base.h | 4 trunk/orte/mca/iof/base/iof_base_open.c|10 +- trunk/orte/mca/iof/base/iof_base_output.c |20 ++-- trunk/orte/mca/iof/hnp/iof_hnp.c |18 +- trunk/orte/mca/iof/hnp/iof_hnp_component.c | 2 trunk/orte/mca/iof/hnp/iof_hnp_read.c |20 ++-- trunk/orte/mca/iof/hnp/iof_hnp_receive.c |20 ++-- trunk/orte/mca/iof/orted/iof_orted.c |18 +- trunk/orte/mca/iof/orted/iof_orted_read.c |10 +- trunk/orte/mca/iof/orted/iof_orted_receive.c |14 +- trunk/orte/mca/iof/tool/iof_tool.c | 8 trunk/orte/mca/iof/tool/iof_tool_receive.c |12 +- trunk/orte/mca/notifier/syslog/notifier_syslog_module.c| 4 trunk/orte/mca/odls/base/odls_base_default_fns.c | 148 +++--- trunk/orte/mca/odls/base/odls_base_state.c | 4 trunk/orte/mca/odls/bproc/odls_bproc.c | 4 trunk/orte/mca/odls/default/odls_default_module.c |10 +- trunk/orte/mca/odls/process/odls_process_module.c | 4 trunk/orte/mca/oob/tcp/oob_tcp.c |46 trunk/orte/mca/oob/tcp/oob_tcp_msg.c |26 ++-- trunk/orte/mca/oob/tcp/oob_tcp_peer.c
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20739
Rainer -- Can you please explain this change? It seems like a very large code change for such a trivial name change. Why was it necessary to change orte_process_info to orte_proc_info and change all these files? It feels like we're getting into "I like this name better than that name" kinds of changes... :-( On Mar 5, 2009, at 3:36 PM,wrote: Author: rusraink Date: 2009-03-05 15:36:44 EST (Thu, 05 Mar 2009) New Revision: 20739 URL: https://svn.open-mpi.org/trac/ompi/changeset/20739 Log: - First of two or three patches, in orte/util/proc_info.h: Adapt orte_process_info to orte_proc_info, and change orte_proc_info() to orte_proc_info_init(). - Compiled on linux-x86-64 - Discussed with Ralph Text files modified: trunk/ompi/attribute/ attribute_predefined.c | 4 trunk/ompi/errhandler/ errhandler_predefined.c | 4 trunk/ompi/mca/btl/base/ btl_base_error.c| 2 trunk/ompi/mca/btl/base/ btl_base_error.h|14 ++-- trunk/ompi/mca/btl/elan/ btl_elan.c | 2 trunk/ompi/mca/btl/openib/ btl_openib.c | 4 trunk/ompi/mca/btl/openib/ btl_openib_async.c| 4 trunk/ompi/mca/btl/openib/ btl_openib_component.c|56 +++ --- trunk/ompi/mca/btl/openib/ btl_openib_endpoint.c | 2 trunk/ompi/mca/btl/openib/ btl_openib_mca.c | 6 +- trunk/ompi/mca/btl/openib/ btl_openib_xrc.c | 2 trunk/ompi/mca/btl/openib/connect/ btl_openib_connect_base.c | 6 +- trunk/ompi/mca/btl/openib/connect/ btl_openib_connect_ibcm.c |10 ++-- trunk/ompi/mca/btl/openib/connect/ btl_openib_connect_oob.c | 2 trunk/ompi/mca/btl/openib/connect/ btl_openib_connect_rdmacm.c |10 ++-- trunk/ompi/mca/btl/openib/connect/ btl_openib_connect_xoob.c | 2 trunk/ompi/mca/btl/sm/ btl_sm.c |10 ++-- trunk/ompi/mca/btl/sm/ btl_sm_component.c| 2 trunk/ompi/mca/btl/udapl/ btl_udapl.c| 6 +- trunk/ompi/mca/btl/udapl/ btl_udapl_component.c | 2 trunk/ompi/mca/btl/udapl/ btl_udapl_proc.c | 4 trunk/ompi/mca/coll/sm/ coll_sm_module.c | 4 trunk/ompi/mca/coll/sm2/ coll_sm2_module.c | 8 +- trunk/ompi/mca/coll/sync/ coll_sync_module.c | 2 trunk/ompi/mca/crcp/bkmrk/ crcp_bkmrk_pml.c | 4 trunk/ompi/mca/dpm/orte/ dpm_orte.c | 6 +- trunk/ompi/mca/mpool/base/ mpool_base_lookup.c | 2 trunk/ompi/mca/mpool/base/ mpool_base_tree.c | 8 +- trunk/ompi/mca/mpool/sm/ mpool_sm_component.c| 4 trunk/ompi/mca/mpool/sm/ mpool_sm_module.c | 4 trunk/ompi/mca/pml/v/mca/vprotocol/pessimist/ vprotocol_pessimist_sender_based.c | 2 trunk/ompi/proc/ proc.c | 22 trunk/ompi/runtime/ ompi_mpi_abort.c | 2 trunk/ompi/runtime/ ompi_mpi_init.c | 6 +- trunk/ompi/tools/ompi_info/ components.cc| 4 trunk/orte/mca/errmgr/default/ errmgr_default_component.c| 2 trunk/orte/mca/ess/alps/ ess_alps_module.c |12 ++-- trunk/orte/mca/ess/base/ ess_base_get.c | 2 trunk/orte/mca/ess/base/ ess_base_std_app.c |14 ++-- trunk/orte/mca/ess/base/ ess_base_std_orted.c|10 ++-- trunk/orte/mca/ess/base/ ess_base_std_tool.c | 8 +- trunk/orte/mca/ess/bproc/ ess_bproc_module.c |16 +++--- trunk/orte/mca/ess/cnos/ ess_cnos_module.c | 4 trunk/orte/mca/ess/env/ ess_env_component.c | 2 trunk/orte/mca/ess/env/
Re: [OMPI devel] trunk problem for large-SMP startup?
Ralph Castain wrote: I just ran a 64ppn job without problem. Couple of possibilities come to mind: 1. you might have some stale lib around - try blowing things away and rebuilding 2. there may be a problem in your specific situation. Can you provide some info on what you are doing (e.g., what environment)? I think it was indeed something in the trunk. Rolf vandevaart had the same problem. But, I think it's resolved: (long ago) works ... 20655 broken 20669 broken 20687 works 20728 works 20738 works So, something broke awhile back and got fixed between 20687 and 20728. Okay, I'm back in business and will charge off into the next concrete wall.
Re: [OMPI devel] VT compile error: Fwd: [ofa-general] OFED 1.4.1 (rc1) is available
Adding pointer to OFED bugzilla ticket for more information: https://bugs.openfabrics.org/show_bug.cgi?id=1537 Jeff Squyres wrote: VT guys -- It looks like we still have a compile bug in OMPI 1.3.1rc4... See below. Do you think you can get a fix ASAP for OMPI 1.3.1final? Begin forwarded message: *From: *"PN"> *Date: *March 5, 2009 12:51:28 AM EST *To: *"Tziporet Koren" > *Cc: * >, > *Subject:** Re: [ofa-general] OFED 1.4.1 is available* HI, I have a build error of OFED-1.4.1-rc1 under CentOS 5.2: . Build openmpi_gcc RPM Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' --define 'dist %{nil}' --target x86_64 --define '_name openmpi_gcc' --define 'mpi_selector /usr/bin/mpi-selector' --define 'use_mpi_selector 1' --define 'install_shell_scripts 1' --define 'shell_scripts_basename mpivars' --define '_usr /usr' --define 'ofed 0' --define '_prefix /usr/mpi/gcc/openmpi-1.3.1rc4' --define '_defaultdocdir /usr/mpi/gcc/openmpi-1.3.1rc4' --define '_mandir %{_prefix}/share/man' --define 'mflags -j 4' --define 'configure_options --with-openib=/usr --with-openib-libdir=/usr/lib64 CC=gcc CXX=g++ F77=gfortran FC=gfortran --enable-mpirun-prefix-by-default' --define 'use_default_rpm_opt_flags 1' /opt/software/packages/ofed/OFED-1.4.1-rc1/OFED-1.4.1-rc1/SRPMS/openmpi-1.3.1rc4-1.src.rpm Failed to build openmpi RPM See /tmp/OFED.28377.logs/openmpi.rpmbuild.log In /tmp/OFED.28377.logs/openmpi.rpmbuild.log: . gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib -I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE -DBINDIR=\"/usr/mpi/gcc/openmpi-1.3.1rc4/bin\" -DDATADIR=\"/usr/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG -DVT_MEMHOOK -DVT_IOWRAP -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -MT vt_iowrap_helper.o -MD -MP -MF .deps/vt_iowrap_helper.Tpo -c -o vt_iowrap_helper.o vt_iowrap_helper.c mv -f .deps/vt_memhook.Tpo .deps/vt_memhook.Po gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib -I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE -DBINDIR=\"/usr/mpi/gcc/openmpi-1.3.1rc4/bin\" -DDATADIR=\"/usr/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG -DVT_MEMHOOK -DVT_IOWRAP -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -MT rfg_regions.o -MD -MP -MF .deps/rfg_regions.Tpo -c -o rfg_regions.o rfg_regions.c vt_iowrap.c:1242: error: expected declaration specifiers or '...' before numeric constant vt_iowrap.c:1243: error: conflicting types for '__fprintf_chk' mv -f .deps/vt_iowrap_helper.Tpo .deps/vt_iowrap_helper.Po make[5]: *** [vt_iowrap.o] Error 1 make[5]: *** Waiting for unfinished jobs mv -f .deps/vt_comp_gnu.Tpo .deps/vt_comp_gnu.Po mv -f .deps/rfg_regions.Tpo .deps/rfg_regions.Po make[5]: Leaving directory `/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt/vtlib' make[4]: *** [all-recursive] Error 1 make[4]: Leaving directory `/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt' make[3]: *** [all] Error 2 make[3]: Leaving directory `/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi' make: *** [all-recursive] Error 1 error: Bad exit status from /var/tmp/rpm-tmp.40739 (%build) RPM build errors: user jsquyres does not exist - using root group eng10 does not exist - using root user jsquyres does not exist - using root group eng10 does not exist - using root Bad exit status from /var/tmp/rpm-tmp.40739 (%build) The error seems similar to http://www.open-mpi.org/community/lists/devel/2009/01/5230.php Regards, PN 2009/3/5 Tziporet Koren > Hi, OFED-1.4.1-rc1 release is available on _http://www.openfabrics.org/downloads/OFED/ofed-1.4.1/OFED-1.4.1-rc1.tgz_ To get BUILD_ID run ofed_info Please report any issues in bugzilla _https://bugs.openfabrics.org/_ for OFED 1.4.1 Vladimir & Tziporet Release information: -- Linux Operating Systems: - RedHat EL4 up4: 2.6.9-42.ELsmp * - RedHat EL4 up5: 2.6.9-55.ELsmp - RedHat EL4 up6: 2.6.9-67.ELsmp - RedHat EL4 up7:2.6.9-78.ELsmp - RedHat EL5:2.6.18-8.el5 - RedHat EL5 up1: 2.6.18-53.el5 - RedHat EL5 up2:
Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out
I tried to build latest OFED with new ompi rc4, but is looks that vtune code is broken again ? gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib -I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE -DBINDIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/bin\" -DDATADIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG -DVT_MEMHOOK -DVT_IOWRAP -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables -MT vt_iowrap_helper.o -MD -MP -MF .deps/vt_iowrap_helper.Tpo -c -o vt_iowrap_helper.o vt_iowrap_helper.c gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib -I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE -DBINDIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/bin\" -DDATADIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG -DVT_MEMHOOK -DVT_IOWRAP -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables -MT rfg_regions.o -MD -MP -MF .deps/rfg_regions.Tpo -c -o rfg_regions.o rfg_regions.c *vt_iowrap.c:1242: error: expected declaration specifiers or '...' before numeric constant vt_iowrap.c:1243: error: conflicting types for '__fprintf_chk' *gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib -I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE -DBINDIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/bin\" -DDATADIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/share\" -DRFG -DVT_MEMHOOK -DVT_IOWRAP -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic -fasynchronous-unwind-tables -MT rfg_filter.o -MD -MP -MF .deps/rfg_filter.Tpo -c -o rfg_filter.o rfg_filter.c make[5]: *** [vt_iowrap.o] Error 1 make[5]: *** Waiting for unfinished jobs mv -f .deps/vt_iowrap_helper.Tpo .deps/vt_iowrap_helper.Po mv -f .deps/rfg_filter.Tpo .deps/rfg_filter.Po mv -f .deps/rfg_regions.Tpo .deps/rfg_regions.Po make[5]: Leaving directory `/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt/vtlib' make[4]: *** [all-recursive] Error 1 make[4]: Leaving directory `/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt' make[3]: *** [all] Error 2 make[3]: Leaving directory `/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt/vt' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi/contrib/vt' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/var/tmp/OFED_topdir/BUILD/openmpi-1.3.1rc4/ompi' make: *** [all-recursive] Error 1 error: Bad exit status from /var/tmp/rpm-tmp.43206 (%build) Ralph H. Castain wrote: Looks okay to me Brian - I went ahead and filed the CMR and sent it on to Brad for approval. Ralph On Tue, 3 Mar 2009, Brian W. Barrett wrote: On Tue, 3 Mar 2009, Jeff Squyres wrote: 1.3.1rc3 had a race condition in the ORTE shutdown sequence. The only difference between rc3 and rc4 was a fix for that race condition. Please test ASAP: http://www.open-mpi.org/software/ompi/v1.3/ I'm sorry, I've failed to test rc1 & rc2 on Catamount. I'm getting a compile failure in the ORTE code. I'll do a bit more testing and send Ralph an e-mail this afternoon. Attached is a patch against v1.3 branch that makes it work on Red Storm. I'm not sure it's right, so I'm just e-mailing it rather than committing.. Sorry Ralph, but can you take a look? :( Brian___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel