Re: [ofa-general] RE: [ewg] RFC: Do we wish to take MPI out of OFED?

2009-06-15 Thread Pavel Shamis (Pasha)




I can see from the mails and from my personal experience that most of
the end users do not need/use the MPI coming as part of OFED (they
have many different MPI installed in their clusters), as we can see
distro are not using it and also some (if not all) of OFED binary
package providers (i.e. companies) do not use it.

We think that the simple  clear answer is: take the MPI packages out
of OFED
  


It is not so simple and clear for me after all this discussion on the 
thread.
Some OFED member want to remove MPIs and some strongly against it (the 
same correct for OFED user community

too).

As I mentioned previously I sure that this step will push users from 
OFED distributions

towards vendors packages, like it was before first OFED release.

What will you answer on user's question: How can I install MPI with 
OFED support ?

Will you send user to read 10 pages of MPI Installation FAQ ? :-)

My 0.02 $

Pasha




___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RFC: Do we wish to take MPI out of OFED?

2009-06-07 Thread Pavel Shamis (Pasha)



- Need to synchronize between different projects

The synchronization between OFED and MPIs are minimal.
In most cases OFED just take last available MPI version.

It happens that OFED discovers some really critical bug
and we ask MPI to provide new version. But for critical
bugs, MPI team anyway releases new bugfix version ASAP.



- MPI is an important RDMA ULP and although it is not developed in OFA 
it is widely used by OFED customers
I guess that MPI users are the widest OFED user community and including 
MPI as part of OFED definitely simplify

end-user live.

I'm personally against excluding MPI from OFED package.

Pasha.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Bug in IMB included in OFED

2009-05-17 Thread Pavel Shamis (Pasha)

Jeff,
I'm maintainer of mpitest package. But I only collect code from 
different vendors: Intel, OSU, Presta.

Maybe Alexander may help to forward this bugfix to correct person in Intel.

Thanks.
Pasha
Jeff Squyres wrote:
There is a bug in the IMB MPI test suite that is included in OFED (the 
mpitests SRPM).  Who is the maintainer for that?  I would file a bug 
on bugzilla, but there's no category for the MPI tests.


It needs the following patch:

--- IMB-3.1/src/IMB_window.c.~1~2009-05-13 14:44:42.0 -0700
+++ IMB-3.1/src/IMB_window.c2009-05-13 14:45:44.0 -0700
@@ -140,6 +140,10 @@
  c_info-rank, 0, 1, c_info-r_data_type, 
c_info-WIN);

   MPI_ERRHAND(ierr);
   }
+  /* JMS Added a call to MPI_WIN_FENCE here, per MPI-2.1
+ 11.2.1 */
+  ierr = MPI_Win_fence(0, c_info-WIN);
+  MPI_ERRHAND(ierr);
   ierr = MPI_Win_free(c_info-WIN);
   MPI_ERRHAND(ierr);
 }

I'm attaching the patch as well because IMB_window.c has DOS-style 
EOLs; hopefully the attached patch should handle it properly.




___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


***SPAM*** Re: [OMPI devel] [ewg] Seg fault running OpenMPI-1.3.1rc4

2009-03-30 Thread Pavel Shamis (Pasha)
I think you problem is related to this bug: 
https://svn.open-mpi.org/trac/ompi/ticket/1823


And it is resolved on the ompi-trunk.

Pasha.

Steve Wise wrote:
When this happens, that node logs this type of message also in 
/var/log/messages:


IMB-MPI1[8859]: segfault at 0018 rip 2b7bfc880800 rsp 
7fffb1021330 error 4


Steve Wise wrote:

Hey Jeff,

Have you seen this?  I'm hitting this regularly running on 
ofed-1.4.1-rc2.


Test:
[o...@vic12 ~]$ cat doit-ompi
#!/bin/sh
while : ; do
   mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g 
--mca btl openib,self,sm  --mca btl_openib_max_btls 1 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16  bcast 
scatter sendrecv exchange /dev/null

done


Seg Fault output:

[vic21:04047] *** Process received signal ***
[vic21:04047] Signal: Segmentation fault (11)
[vic21:04047] Signal code: Address not mapped (1)
[vic21:04047] Failing at address: 0x18
[vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
[vic21:04047] [ 1] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc33800]
[vic21:04047] [ 2] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc38c2d]
[vic21:04047] [ 3] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc33fcb]
[vic21:04047] [ 4] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc22af8]
[vic21:04047] [ 5] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83) 
[0x2b911933da33]
[vic21:04047] [ 6] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0) 
[0x2b9118ea3fb0]
[vic21:04047] [ 7] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so 
[0x2b911ba1938f]
[vic21:04047] [ 8] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so 
[0x2b911b601cde]
[vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0 
[0x2b9118e7241b]
[vic21:04047] [10] 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178) 
[0x403498]
[vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x3ddd61d974]
[vic21:04047] [12] 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269]

[vic21:04047] *** End of error message ***

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


***SPAM*** Re: ***SPAM*** Re: [OMPI devel] [ewg] Seg fault running OpenMPI-1.3.1rc4

2009-03-30 Thread Pavel Shamis (Pasha)

Steve,
If you will compile OMPI code with CFLAGS=-g ,generate segfault 
core_file and send the core + IMB-MPI1 to me I will be able to 
understand the problem better.


Regards,
Pasha

Steve Wise wrote:


Hey Pasha,


I just applied r20872 and retested, and I still hit this seg fault.  
So I think this is a new bug.


Lemme pull the trunk and try that.



Pavel Shamis (Pasha) wrote:
I think you problem is related to this bug: 
https://svn.open-mpi.org/trac/ompi/ticket/1823


And it is resolved on the ompi-trunk.

Pasha.

Steve Wise wrote:
When this happens, that node logs this type of message also in 
/var/log/messages:


IMB-MPI1[8859]: segfault at 0018 rip 2b7bfc880800 
rsp 7fffb1021330 error 4


Steve Wise wrote:

Hey Jeff,

Have you seen this?  I'm hitting this regularly running on 
ofed-1.4.1-rc2.


Test:
[o...@vic12 ~]$ cat doit-ompi
#!/bin/sh
while : ; do
   mpirun -np 16 --host vic12-10g,vic20-10g,vic9-10g,vic21-10g 
--mca btl openib,self,sm  --mca btl_openib_max_btls 1 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 -npmin 16  
bcast scatter sendrecv exchange /dev/null

done


Seg Fault output:

[vic21:04047] *** Process received signal ***
[vic21:04047] Signal: Segmentation fault (11)
[vic21:04047] Signal code: Address not mapped (1)
[vic21:04047] Failing at address: 0x18
[vic21:04047] [ 0] /lib64/libpthread.so.0 [0x3dde20e4c0]
[vic21:04047] [ 1] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc33800]
[vic21:04047] [ 2] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc38c2d]
[vic21:04047] [ 3] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc33fcb]
[vic21:04047] [ 4] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_btl_openib.so 
[0x2b911bc22af8]
[vic21:04047] [ 5] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libopen-pal.so.0(mca_base_components_close+0x83) 
[0x2b911933da33]
[vic21:04047] [ 6] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0(mca_btl_base_close+0xe0) 
[0x2b9118ea3fb0]
[vic21:04047] [ 7] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_bml_r2.so 
[0x2b911ba1938f]
[vic21:04047] [ 8] 
/usr/mpi/gcc/openmpi-1.3.1rc4/lib64/openmpi/mca_pml_ob1.so 
[0x2b911b601cde]
[vic21:04047] [ 9] /usr/mpi/gcc/openmpi-1.3.1rc4/lib64/libmpi.so.0 
[0x2b9118e7241b]
[vic21:04047] [10] 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1(main+0x178) 
[0x403498]
[vic21:04047] [11] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x3ddd61d974]
[vic21:04047] [12] 
/usr/mpi/gcc/openmpi-1.3.1rc4/tests/IMB-3.1/IMB-MPI1 [0x403269]

[vic21:04047] *** End of error message ***

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] MVAPICH1 1.1.0 SRPM available

2008-10-06 Thread Pavel Shamis (Pasha)

New srpm (mvapich-1.1.0-3048.src.rpm) for MVAPICH1 was uploaded.
Please check ~pasha/ofed_1_4/ (see latest.txt for the build number)
It include bug fixes for: #1241, #1244
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] MVAPICH1 1.0.0 SRPM available

2008-02-14 Thread Pavel Shamis (Pasha)

New srpm for MVAPICH1 was uploaded.
Please check ~pasha/ofed_1_3/ (see latest.txt for the build number)
It include bug fixes for: 916, 922.

--
Pavel Shamis (Pasha)
Mellanox Technologies

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] MVAPICH1 1.0.0 SRPM available

2008-02-07 Thread Pavel Shamis (Pasha)

New srpm for MVAPICH1 was uploaded.
Please check ~pasha/ofed_1_3/ (see latest.txt for the build number)
Bugfix for: 883, 884, 888, 887, 889, 893

--
Pavel Shamis (Pasha)
Mellanox Technologies

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] MVAPICH1 1.0.0 SRPM available

2008-01-29 Thread Pavel Shamis (Pasha)

New srpm for MVAPICH1 was uploaded.
Please check ~pasha/ofed_1_3/ (see latest.txt for the build number)
New build include documentation and tuning fixes.

--
Pavel Shamis (Pasha)
Mellanox Technologies

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RE: [ofa-general] OFED Jan 14 meeting summary on RC2readiness

2008-01-20 Thread Pavel Shamis (Pasha)




On Friday 18 January 2008 03:25, Roland Dreier wrote:
  

I guess you mean just implement XRC without allowing multiple
processes to share an XRC domain?  That actually seems like a sensible
thing to implement as well...



This is part of the current XRC implementation -- just give -1 as the fd value
in ibv_open_xrc_domain().
  
I guess Gleb talked about one of the possible XRC usages described in 
the paper: http://www.cs.sandia.gov/~rbbrigh/papers/ompi-ib-pvmmpi07.pdf


--
Pavel Shamis (Pasha)
Mellanox Technologies

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] MVAPICH1 1.0.0 SRPM available

2008-01-20 Thread Pavel Shamis (Pasha)

New srpm for MVAPICH1 was uploaded.
Please check ~pasha/ofed_1_3/ (see latest.txt for the build number)
It include bug fix for f90 problem on ppc platforms.

--
Pavel Shamis (Pasha)
Mellanox Technologies

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Problems building mvapich on ia64 on RedHat EL5.1

2007-12-04 Thread Pavel Shamis (Pasha)

Hi,
I don't have access to such platform in Mellanox and from the log I 
don't really understand the problem.


Can I get remote access to the machine ?

Regards,
Pasha.

Woodruff, Robert J wrote:

I get the following build error trying to build
mvapich on ia64 on RedHat EL5.1 using today's OFED daily build.

gcc -DHAVE_CONFIG_H -I.
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/util
-DMPID_DEVICE_CODE  -DHAVE_UNAME=1 -DHAVE_NETDB_H=1
-DHAVE_GETHOSTBYNAME=1  -DMPID_DEBUG_NONE -DMPID_STAT_NONE  -fPIC -O3
-fno-strict-aliasing -g -D_GNU_SOURCE -DCH_GEN2 -DMEMORY_SCALE
-D_AFFINITY_ -DCOMPAT_MODE -Wall -D_SMP_ -D_SMP_RNDV_
-DVIADEV_RPUT_SUPPORT -DEARLY_SEND_COMPLETION -DLAZY_MEM_UNREGISTER
-D_IA64_ -I/usr/include -DHAVE_MPICHCONF_H -D_GNU_SOURCE
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2 -I.-c
viainit.c
viainit.c: In function 'viainit_setaffinity': 
viainit.c:140: warning: passing argument 3 of 'sched_setaffinity' from

incompatible pointer type
viainit.c: In function 'viainit_exchange':
viainit.c:752: warning: unused variable 'other_qp_list'
viainit.c: In function 'MPID_VIA_Init':
viainit.c:952: warning: implicit declaration of function 'init_apm_lock'
viainit.c:788: warning: unused variable 'smpi_ptr'
viainit.c:784: warning: unused variable 'j'
viainit.c:784: warning: unused variable 'i'
viainit.c: In function 'ib_qp_enable':
viainit.c:1379: warning: implicit declaration of function
'reload_alternate_path'
viainit.c: In function 'ib_rank_lid_table': 
viainit.c:1917: warning: pointer targets in assignment differ in

signedness
viainit.c: At top level: 
viainit.c:1907: warning: 'ib_rank_lid_table' defined but not used

gcc -DHAVE_CONFIG_H -I.
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/util
-DMPID_DEVICE_CODE  -DHAVE_UNAME=1 -DHAVE_NETDB_H=1
-DHAVE_GETHOSTBYNAME=1  -DMPID_DEBUG_NONE -DMPID_STAT_NONE  -fPIC -O3
-fno-strict-aliasing -g -D_GNU_SOURCE -DCH_GEN2 -DMEMORY_SCALE
-D_AFFINITY_ -DCOMPAT_MODE -Wall -D_SMP_ -D_SMP_RNDV_
-DVIADEV_RPUT_SUPPORT -DEARLY_SEND_COMPLETION -DLAZY_MEM_UNREGISTER
-D_IA64_ -I/usr/include -DHAVE_MPICHCONF_H -D_GNU_SOURCE
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2 -I.-c
viasend.c 
In file included from /usr/include/netdb.h:28,

 from cm.h:25,
 from viapriv.h:44,
 from viasend.c:29:
/usr/include/netinet/in.h:355: error: expected ')' before '__netshort'
/usr/include/netinet/in.h:355: error: expected ')' before '' token
/usr/include/netinet/in.h:355: error: expected ')' before '' token
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

  


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] MVAPICH SRPM Update

2007-06-13 Thread Pavel Shamis (Pasha)

I updated the MVAPICH 0.9.9 SRPM.
The new version is mvapich-0.9.9-1315.src.rpm

Bug fixes:
#642

Regards,
Pasha (Pavel Shamis)
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg