Re: [ofa-general] RE: [ewg] RFC: Do we wish to take MPI out of OFED?
I can see from the mails and from my personal experience that most of the end users do not need/use the MPI coming as part of OFED (they have many different MPI installed in their clusters), as we can see distro are not using it and also some (if not all) of OFED binary package providers (i.e. companies) do not use it. We think that the simple clear answer is: take the MPI packages out of OFED It is not so simple and clear for me after all this discussion on the thread. Some OFED member want to remove MPIs and some strongly against it (the same correct for OFED user community too). As I mentioned previously I sure that this step will push users from OFED distributions towards vendors packages, like it was before first OFED release. What will you answer on user's question: How can I install MPI with OFED support ? Will you send user to read 10 pages of MPI Installation FAQ ? :-) My 0.02 $ Pasha ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RFC: Do we wish to take MPI out of OFED?
- Need to synchronize between different projects The synchronization between OFED and MPIs are minimal. In most cases OFED just take last available MPI version. It happens that OFED discovers some really critical bug and we ask MPI to provide new version. But for critical bugs, MPI team anyway releases new bugfix version ASAP. - MPI is an important RDMA ULP and although it is not developed in OFA it is widely used by OFED customers I guess that MPI users are the widest OFED user community and including MPI as part of OFED definitely simplify end-user live. I'm personally against excluding MPI from OFED package. Pasha. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Bug in IMB included in OFED
Jeff, I'm maintainer of mpitest package. But I only collect code from different vendors: Intel, OSU, Presta. Maybe Alexander may help to forward this bugfix to correct person in Intel. Thanks. Pasha Jeff Squyres wrote: There is a bug in the IMB MPI test suite that is included in OFED (the mpitests SRPM). Who is the maintainer for that? I would file a bug on bugzilla, but there's no category for the MPI tests. It needs the following patch: --- IMB-3.1/src/IMB_window.c.~1~2009-05-13 14:44:42.0 -0700 +++ IMB-3.1/src/IMB_window.c2009-05-13 14:45:44.0 -0700 @@ -140,6 +140,10 @@ c_info-rank, 0, 1, c_info-r_data_type, c_info-WIN); MPI_ERRHAND(ierr); } + /* JMS Added a call to MPI_WIN_FENCE here, per MPI-2.1 + 11.2.1 */ + ierr = MPI_Win_fence(0, c_info-WIN); + MPI_ERRHAND(ierr); ierr = MPI_Win_free(c_info-WIN); MPI_ERRHAND(ierr); } I'm attaching the patch as well because IMB_window.c has DOS-style EOLs; hopefully the attached patch should handle it properly. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
***SPAM*** Re: [OMPI devel] [ewg] Seg fault running OpenMPI-1.3.1rc4
I think you problem is related to this bug: https://svn.open-mpi.org/trac/ompi/ticket/1823 And it is resolved on the ompi-trunk. Pasha. Steve Wise wrote: When this happens, that node logs this type of message also in /var/log/messages: IMB-MPI1[8859]: segfault at 0018 rip 2b7bfc880800 rsp 7fffb1021330 error 4 Steve Wise wrote: Hey Jeff, Have you seen this? I'm hitting this regularly running on ofed-1.4.1-rc2. Test: [o...@vic12 ~]$ cat doit-ompi #!/bin/sh while : ; do mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g --mca btl openib,self,sm --mca btl_openib_max_btls 1 /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16 bcast scatter sendrecv exchange /dev/null done Seg Fault output: [vic21:04047] *** Process received signal *** [vic21:04047] Signal: Segmentation fault (11) [vic21:04047] Signal code: Address not mapped (1) [vic21:04047] Failing at address: 0x18 [vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0] [vic21:04047] [ 1] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so [0x2b911bc33800] [vic21:04047] [ 2] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so [0x2b911bc38c2d] [vic21:04047] [ 3] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so [0x2b911bc33fcb] [vic21:04047] [ 4] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so [0x2b911bc22af8] [vic21:04047] [ 5] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83) [0x2b911933da33] [vic21:04047] [ 6] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0) [0x2b9118ea3fb0] [vic21:04047] [ 7] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so [0x2b911ba1938f] [vic21:04047] [ 8] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so [0x2b911b601cde] [vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0 [0x2b9118e7241b] [vic21:04047] [10] /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178) [0x403498] [vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3ddd61d974] [vic21:04047] [12] /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269] [vic21:04047] *** End of error message *** ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
***SPAM*** Re: ***SPAM*** Re: [OMPI devel] [ewg] Seg fault running OpenMPI-1.3.1rc4
Steve, If you will compile OMPI code with CFLAGS=-g ,generate segfault core_file and send the core + IMB-MPI1 to me I will be able to understand the problem better. Regards, Pasha Steve Wise wrote: Hey Pasha, I just applied r20872 and retested, and I still hit this seg fault. So I think this is a new bug. Lemme pull the trunk and try that. Pavel Shamis (Pasha) wrote: I think you problem is related to this bug: https://svn.open-mpi.org/trac/ompi/ticket/1823 And it is resolved on the ompi-trunk. Pasha. Steve Wise wrote: When this happens, that node logs this type of message also in /var/log/messages: IMB-MPI1[8859]: segfault at 0018 rip 2b7bfc880800 rsp 7fffb1021330 error 4 Steve Wise wrote: Hey Jeff, Have you seen this? I'm hitting this regularly running on ofed-1.4.1-rc2. Test: [o...@vic12 ~]$ cat doit-ompi #!/bin/sh while : ; do mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g --mca btl openib,self,sm --mca btl_openib_max_btls 1 /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16 bcast scatter sendrecv exchange /dev/null done Seg Fault output: [vic21:04047] *** Process received signal *** [vic21:04047] Signal: Segmentation fault (11) [vic21:04047] Signal code: Address not mapped (1) [vic21:04047] Failing at address: 0x18 [vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0] [vic21:04047] [ 1] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so [0x2b911bc33800] [vic21:04047] [ 2] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so [0x2b911bc38c2d] [vic21:04047] [ 3] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so [0x2b911bc33fcb] [vic21:04047] [ 4] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so [0x2b911bc22af8] [vic21:04047] [ 5] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83) [0x2b911933da33] [vic21:04047] [ 6] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0) [0x2b9118ea3fb0] [vic21:04047] [ 7] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so [0x2b911ba1938f] [vic21:04047] [ 8] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so [0x2b911b601cde] [vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0 [0x2b9118e7241b] [vic21:04047] [10] /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178) [0x403498] [vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3ddd61d974] [vic21:04047] [12] /usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269] [vic21:04047] *** End of error message *** ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] MVAPICH1 1.1.0 SRPM available
New srpm (mvapich-1.1.0-3048.src.rpm) for MVAPICH1 was uploaded. Please check ~pasha/ofed_1_4/ (see latest.txt for the build number) It include bug fixes for: #1241, #1244 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] MVAPICH1 1.0.0 SRPM available
New srpm for MVAPICH1 was uploaded. Please check ~pasha/ofed_1_3/ (see latest.txt for the build number) It include bug fixes for: 916, 922. -- Pavel Shamis (Pasha) Mellanox Technologies ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] MVAPICH1 1.0.0 SRPM available
New srpm for MVAPICH1 was uploaded. Please check ~pasha/ofed_1_3/ (see latest.txt for the build number) Bugfix for: 883, 884, 888, 887, 889, 893 -- Pavel Shamis (Pasha) Mellanox Technologies ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] MVAPICH1 1.0.0 SRPM available
New srpm for MVAPICH1 was uploaded. Please check ~pasha/ofed_1_3/ (see latest.txt for the build number) New build include documentation and tuning fixes. -- Pavel Shamis (Pasha) Mellanox Technologies ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] RE: [ofa-general] OFED Jan 14 meeting summary on RC2readiness
On Friday 18 January 2008 03:25, Roland Dreier wrote: I guess you mean just implement XRC without allowing multiple processes to share an XRC domain? That actually seems like a sensible thing to implement as well... This is part of the current XRC implementation -- just give -1 as the fd value in ibv_open_xrc_domain(). I guess Gleb talked about one of the possible XRC usages described in the paper: http://www.cs.sandia.gov/~rbbrigh/papers/ompi-ib-pvmmpi07.pdf -- Pavel Shamis (Pasha) Mellanox Technologies ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] MVAPICH1 1.0.0 SRPM available
New srpm for MVAPICH1 was uploaded. Please check ~pasha/ofed_1_3/ (see latest.txt for the build number) It include bug fix for f90 problem on ppc platforms. -- Pavel Shamis (Pasha) Mellanox Technologies ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Problems building mvapich on ia64 on RedHat EL5.1
Hi, I don't have access to such platform in Mellanox and from the log I don't really understand the problem. Can I get remote access to the machine ? Regards, Pasha. Woodruff, Robert J wrote: I get the following build error trying to build mvapich on ia64 on RedHat EL5.1 using today's OFED daily build. gcc -DHAVE_CONFIG_H -I. -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2 -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2 -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/util -DMPID_DEVICE_CODE -DHAVE_UNAME=1 -DHAVE_NETDB_H=1 -DHAVE_GETHOSTBYNAME=1 -DMPID_DEBUG_NONE -DMPID_STAT_NONE -fPIC -O3 -fno-strict-aliasing -g -D_GNU_SOURCE -DCH_GEN2 -DMEMORY_SCALE -D_AFFINITY_ -DCOMPAT_MODE -Wall -D_SMP_ -D_SMP_RNDV_ -DVIADEV_RPUT_SUPPORT -DEARLY_SEND_COMPLETION -DLAZY_MEM_UNREGISTER -D_IA64_ -I/usr/include -DHAVE_MPICHCONF_H -D_GNU_SOURCE -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639 -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2 -I.-c viainit.c viainit.c: In function 'viainit_setaffinity': viainit.c:140: warning: passing argument 3 of 'sched_setaffinity' from incompatible pointer type viainit.c: In function 'viainit_exchange': viainit.c:752: warning: unused variable 'other_qp_list' viainit.c: In function 'MPID_VIA_Init': viainit.c:952: warning: implicit declaration of function 'init_apm_lock' viainit.c:788: warning: unused variable 'smpi_ptr' viainit.c:784: warning: unused variable 'j' viainit.c:784: warning: unused variable 'i' viainit.c: In function 'ib_qp_enable': viainit.c:1379: warning: implicit declaration of function 'reload_alternate_path' viainit.c: In function 'ib_rank_lid_table': viainit.c:1917: warning: pointer targets in assignment differ in signedness viainit.c: At top level: viainit.c:1907: warning: 'ib_rank_lid_table' defined but not used gcc -DHAVE_CONFIG_H -I. -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2 -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2 -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/util -DMPID_DEVICE_CODE -DHAVE_UNAME=1 -DHAVE_NETDB_H=1 -DHAVE_GETHOSTBYNAME=1 -DMPID_DEBUG_NONE -DMPID_STAT_NONE -fPIC -O3 -fno-strict-aliasing -g -D_GNU_SOURCE -DCH_GEN2 -DMEMORY_SCALE -D_AFFINITY_ -DCOMPAT_MODE -Wall -D_SMP_ -D_SMP_RNDV_ -DVIADEV_RPUT_SUPPORT -DEARLY_SEND_COMPLETION -DLAZY_MEM_UNREGISTER -D_IA64_ -I/usr/include -DHAVE_MPICHCONF_H -D_GNU_SOURCE -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639 -I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2 -I.-c viasend.c In file included from /usr/include/netdb.h:28, from cm.h:25, from viapriv.h:44, from viasend.c:29: /usr/include/netinet/in.h:355: error: expected ')' before '__netshort' /usr/include/netinet/in.h:355: error: expected ')' before '' token /usr/include/netinet/in.h:355: error: expected ')' before '' token ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] MVAPICH SRPM Update
I updated the MVAPICH 0.9.9 SRPM. The new version is mvapich-0.9.9-1315.src.rpm Bug fixes: #642 Regards, Pasha (Pavel Shamis) ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg