Re: [ewg] OFED 1.2.5.3

2007-12-04 Thread Or Gerlitz

Tziporet Koren wrote:

Or Gerlitz wrote:
In one of our tests that is doing up/down of the driver we see some 
deadlock when this patch is used, and it does not happened when its 
removed.
Thus we decided to remove this patch till we get to the root cause of 
the problem
Note that we do not think the issue with the patch but seems that it 
bring the issue up to the surface


I see. Note that even though with the fact that internal review makes 
you think the patch is not the cause for the hang, its still possible 
that external reviewers will come into a different conclusion, that's 
big part of the the story behind open source development and processes.


I understand that this patch does not exist in OFED 1.3, correct?


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] OFED 1.2.5.4 is ready on the ofa server

2007-12-04 Thread Tziporet Koren
OFED-1.2.5.4 is ready: 
http://www.openfabrics.org/downloads/OFED/ofed-1.2.5/OFED-1.2.5.4.tgz

Changes since OFED 1.2.5 
 
- RDS: 
  - Performance enhancements 
  - GA for Oracle 11 
- IPoIB: 
  - Use NAPI by default 
  - For small received packets, allocate a new, smaller SKB to relief
accounting 
on the socket. 
- mlx4: 
  - Enable changing default max HCA resource limits using module
options. 
  - Support opening of more resources then the default by increasing
command 
timeout for INIT_HCA to 10 seconds 
- PPC64 support: 
  - Fixed compilation problems on SLES10 SP1 

Changes from OFED 1.2.5.3: 
== 
- Low level drivers update:
  - cxgb3: Pull in latest fixes.
  - ipath: Pull in latest fixes.
- OSes support:
  - Added support for SLES9 SP4 (no QA was done)
  - Added support for RHEL5 up1 (no QA was done)
- IPOIB:
  - Removed the usage of unsignalled QP in Tx due to deadlock.
- RDS:
  - Relax the header consistency check on fragment reassembly


Tziporet & Vlad 




Tziporet Koren
Software Director
Mellanox Technologies
mailto: [EMAIL PROTECTED]
Tel +972-4-9097200, ext 380

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED 1.2.5.3

2007-12-04 Thread Tziporet Koren

Or Gerlitz wrote:


I understand that this patch does not exist in OFED 1.3, correct?

correct - but don't worry for the bugs we see new oops even with IPoIB 
vanilla from 2.6.24-rc3 :-(

Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] OFED Dec 3 meeting summary on beta release status

2007-12-04 Thread Tziporet Koren

Meeting Summary:
1. We must get bugzilla fixed ASAP to track OFED 1.3 bugs - Jeff B.
2. RC1 is delayed to next week since there were no builds for almost a week
3. Beta release testing status:

   * Voltaire: See issues with:
 o TCP performance of ConnectX (30% worst then Arbel on UD mode)
   Note: I verified that interrupt moderation was not activated
   on ConnectX as should.  Should be fixed today on the daily
   build.
   Also LRO is not yet enabled - to be done till RC1.
 o Performance is harmed when working with IPoIB partitioning
   (different PKey)
 o iSCSI over IPoIB
   * Cisco:
 o Test so far only x86_64; RHEL4, RHEL5, SLES10
 o Focus on MPI testing: Intel MPI 3.1; HP MPI 2.5.1
 o Will test new compilers: Intel 10.1 and PGI 7.1
   * Intel:
 o Beta is working fine on 16 nodes cluster
 o Tests also small ia64 cluster - see problem with MVAPICH
   compilation
   * Qlogic:
 o See SDP issues
 o Mainly test basic verbs (libibverbs)
 o Will have a code update in 1-2 weeks
   * Mellanox:
 o Covers x86, x86_64, PPC, all OSes on the matrix
 o SDP is still broken - should be fixed by end of this week
 o Test Open MPI and MVAPICH
 o See issues with IPoIB performance - under work


Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED Dec 3 meeting summary on beta release status

2007-12-04 Thread Hoang-Nam Nguyen
Hello Tziporet!
This is our test status:
* Tested ehca, ehca2 on SLES10 ppc64 and upstream kernel
* RHEL4.5, RHEL5.1 and other backport test will be next
* Build process works great (basic, custom, 32/64-bit libs)
Thanks
Nam

[EMAIL PROTECTED] wrote on 04.12.2007 10:34:55:

> Meeting Summary:
> 1. We must get bugzilla fixed ASAP to track OFED 1.3 bugs - Jeff B.
> 2. RC1 is delayed to next week since there were no builds for almost a
week
> 3. Beta release testing status:
>
> * Voltaire: See issues with:
>   o TCP performance of ConnectX (30% worst then Arbel on UD mode)
> Note: I verified that interrupt moderation was not activated
> on ConnectX as should.  Should be fixed today on the daily
> build.
> Also LRO is not yet enabled - to be done till RC1.
>   o Performance is harmed when working with IPoIB partitioning
> (different PKey)
>   o iSCSI over IPoIB
> * Cisco:
>   o Test so far only x86_64; RHEL4, RHEL5, SLES10
>   o Focus on MPI testing: Intel MPI 3.1; HP MPI 2.5.1
>   o Will test new compilers: Intel 10.1 and PGI 7.1
> * Intel:
>   o Beta is working fine on 16 nodes cluster
>   o Tests also small ia64 cluster - see problem with MVAPICH
> compilation
> * Qlogic:
>   o See SDP issues
>   o Mainly test basic verbs (libibverbs)
>   o Will have a code update in 1-2 weeks
> * Mellanox:
>   o Covers x86, x86_64, PPC, all OSes on the matrix
>   o SDP is still broken - should be fixed by end of this week
>   o Test Open MPI and MVAPICH
>   o See issues with IPoIB performance - under work
>
>
> Tziporet
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Problems building mvapich on ia64 on RedHat EL5.1

2007-12-04 Thread Pavel Shamis (Pasha)

Hi,
I don't have access to such platform in Mellanox and from the log I 
don't really understand the problem.


Can I get remote access to the machine ?

Regards,
Pasha.

Woodruff, Robert J wrote:

I get the following build error trying to build
mvapich on ia64 on RedHat EL5.1 using today's OFED daily build.

gcc -DHAVE_CONFIG_H -I.
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/util
-DMPID_DEVICE_CODE  -DHAVE_UNAME=1 -DHAVE_NETDB_H=1
-DHAVE_GETHOSTBYNAME=1  -DMPID_DEBUG_NONE -DMPID_STAT_NONE  -fPIC -O3
-fno-strict-aliasing -g -D_GNU_SOURCE -DCH_GEN2 -DMEMORY_SCALE
-D_AFFINITY_ -DCOMPAT_MODE -Wall -D_SMP_ -D_SMP_RNDV_
-DVIADEV_RPUT_SUPPORT -DEARLY_SEND_COMPLETION -DLAZY_MEM_UNREGISTER
-D_IA64_ -I/usr/include -DHAVE_MPICHCONF_H -D_GNU_SOURCE
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2 -I.-c
viainit.c
viainit.c: In function 'viainit_setaffinity': 
viainit.c:140: warning: passing argument 3 of 'sched_setaffinity' from

incompatible pointer type
viainit.c: In function 'viainit_exchange':
viainit.c:752: warning: unused variable 'other_qp_list'
viainit.c: In function 'MPID_VIA_Init':
viainit.c:952: warning: implicit declaration of function 'init_apm_lock'
viainit.c:788: warning: unused variable 'smpi_ptr'
viainit.c:784: warning: unused variable 'j'
viainit.c:784: warning: unused variable 'i'
viainit.c: In function 'ib_qp_enable':
viainit.c:1379: warning: implicit declaration of function
'reload_alternate_path'
viainit.c: In function 'ib_rank_lid_table': 
viainit.c:1917: warning: pointer targets in assignment differ in

signedness
viainit.c: At top level: 
viainit.c:1907: warning: 'ib_rank_lid_table' defined but not used

gcc -DHAVE_CONFIG_H -I.
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/util
-DMPID_DEVICE_CODE  -DHAVE_UNAME=1 -DHAVE_NETDB_H=1
-DHAVE_GETHOSTBYNAME=1  -DMPID_DEBUG_NONE -DMPID_STAT_NONE  -fPIC -O3
-fno-strict-aliasing -g -D_GNU_SOURCE -DCH_GEN2 -DMEMORY_SCALE
-D_AFFINITY_ -DCOMPAT_MODE -Wall -D_SMP_ -D_SMP_RNDV_
-DVIADEV_RPUT_SUPPORT -DEARLY_SEND_COMPLETION -DLAZY_MEM_UNREGISTER
-D_IA64_ -I/usr/include -DHAVE_MPICHCONF_H -D_GNU_SOURCE
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2 -I.-c
viasend.c 
In file included from /usr/include/netdb.h:28,

 from cm.h:25,
 from viapriv.h:44,
 from viasend.c:29:
/usr/include/netinet/in.h:355: error: expected ')' before '__netshort'
/usr/include/netinet/in.h:355: error: expected ')' before '>>' token
/usr/include/netinet/in.h:355: error: expected ')' before '&' token
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

  


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] OFED 1.2.5.3

2007-12-04 Thread Or Gerlitz

Tziporet Koren wrote:

Or Gerlitz wrote:

I understand that this patch does not exist in OFED 1.3, correct?
correct - but don't worry for the bugs we see new oops even with IPoIB 
vanilla from 2.6.24-rc3 :-(


I see, how about sending the oop trace to the general list?

Or.


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Problems building mvapich on ia64 on RedHat EL5.1

2007-12-04 Thread Jonathan L. Perkins
It also might be useful to see if a simple program including netdb.h can 
compile on the system.  We could find out if the error is really in the 
system header files or due to some other interaction.


> #include 
>
> int foo(int i) {
> return i > 1 ? i * foo(i - 1) : 1;
> }

Pavel Shamis (Pasha) wrote:

Hi,
I don't have access to such platform in Mellanox and from the log I 
don't really understand the problem.


Can I get remote access to the machine ?

Regards,
Pasha.

Woodruff, Robert J wrote:

I get the following build error trying to build
mvapich on ia64 on RedHat EL5.1 using today's OFED daily build.

gcc -DHAVE_CONFIG_H -I.
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/util
-DMPID_DEVICE_CODE  -DHAVE_UNAME=1 -DHAVE_NETDB_H=1
-DHAVE_GETHOSTBYNAME=1  -DMPID_DEBUG_NONE -DMPID_STAT_NONE  -fPIC -O3
-fno-strict-aliasing -g -D_GNU_SOURCE -DCH_GEN2 -DMEMORY_SCALE
-D_AFFINITY_ -DCOMPAT_MODE -Wall -D_SMP_ -D_SMP_RNDV_
-DVIADEV_RPUT_SUPPORT -DEARLY_SEND_COMPLETION -DLAZY_MEM_UNREGISTER
-D_IA64_ -I/usr/include -DHAVE_MPICHCONF_H -D_GNU_SOURCE
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2 -I.-c
viainit.c
viainit.c: In function 'viainit_setaffinity': viainit.c:140: warning: 
passing argument 3 of 'sched_setaffinity' from

incompatible pointer type
viainit.c: In function 'viainit_exchange':
viainit.c:752: warning: unused variable 'other_qp_list'
viainit.c: In function 'MPID_VIA_Init':
viainit.c:952: warning: implicit declaration of function 'init_apm_lock'
viainit.c:788: warning: unused variable 'smpi_ptr'
viainit.c:784: warning: unused variable 'j'
viainit.c:784: warning: unused variable 'i'
viainit.c: In function 'ib_qp_enable':
viainit.c:1379: warning: implicit declaration of function
'reload_alternate_path'
viainit.c: In function 'ib_rank_lid_table': viainit.c:1917: warning: 
pointer targets in assignment differ in

signedness
viainit.c: At top level: viainit.c:1907: warning: 'ib_rank_lid_table' 
defined but not used

gcc -DHAVE_CONFIG_H -I.
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/include
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/util
-DMPID_DEVICE_CODE  -DHAVE_UNAME=1 -DHAVE_NETDB_H=1
-DHAVE_GETHOSTBYNAME=1  -DMPID_DEBUG_NONE -DMPID_STAT_NONE  -fPIC -O3
-fno-strict-aliasing -g -D_GNU_SOURCE -DCH_GEN2 -DMEMORY_SCALE
-D_AFFINITY_ -DCOMPAT_MODE -Wall -D_SMP_ -D_SMP_RNDV_
-DVIADEV_RPUT_SUPPORT -DEARLY_SEND_COMPLETION -DLAZY_MEM_UNREGISTER
-D_IA64_ -I/usr/include -DHAVE_MPICHCONF_H -D_GNU_SOURCE
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639
-I/var/tmp/OFED_topdir/BUILD/mvapich-1.0.0-1639/mpid/ch_gen2 -I.-c
viasend.c In file included from /usr/include/netdb.h:28,
 from cm.h:25,
 from viapriv.h:44,
 from viasend.c:29:
/usr/include/netinet/in.h:355: error: expected ')' before '__netshort'
/usr/include/netinet/in.h:355: error: expected ')' before '>>' token
/usr/include/netinet/in.h:355: error: expected ')' before '&' token
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

  


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg



--
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] EarnestMacroFuckstick

2007-12-04 Thread Gonzalo Bolden
PhallusProdigiousDarrellhttp://chemhg.com___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] [PATCH 0/6] nes: Cosmetic changes; support virtual WQs and PPC

2007-12-04 Thread Roland Dreier
OK, I just pushed out a few more small cleanups (running unifdef,
fixing signedness warnings, and fixing a locking bug on an error
path).  One question: what is the point of the monkeying with
SPIN_BUG_ON on in nes.c?

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [ofa-general] [PATCH 5/5] nes: napi interface fix

2007-12-04 Thread Roland Dreier
 > > #ifdef NES_NAPI
 > 
 > Is #ifdef napi sprinkled throughout the code common for most drivers?  Is 
 > there
 > a better way to handle this?  (Is this OFED only for backports, or for
 > upstream?)

Is there any reason why we want the upstream kernel to have both NAPI
and non-NAPI support?  If so, then this should probably be settable
through Kconfig rather than having to edit the Makefile to change the
NES_NAPI define.  However, what almost always seems to happen is that
no one uses the non-default code and it ends up bitrotting to the
point of not compiling.  So I would strongly suggest just having the
NAPI code and getting rid of the NES_NAPI tests entirely.  Is there
any reason not to do that?

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] [PATCH 0/6] nes: Cosmetic changes; support virtual WQs and PPC

2007-12-04 Thread Glenn Grundstrom
> 
> OK, I just pushed out a few more small cleanups (running unifdef,
> fixing signedness warnings, and fixing a locking bug on an error
> path).  One question: what is the point of the monkeying with
> SPIN_BUG_ON on in nes.c?
>

Probably just some leftover debugging.  I can remove it.

Thanks for letting me know that you've push content
to your branch.  I'll pick it up.  Btw, I'll be preparing
another set of patches that should be ready very soon.

Glenn.
 
>  - R.
> 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [ofa-general] OFED 1.2.5.4 is ready on the ofa server

2007-12-04 Thread Ira Weiny
It looks like there is something corrupt with the tarball.

13:29:50 > tar xzf OFED-1.2.5.4.tgz 

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now


Ira


On Tue, 4 Dec 2007 10:34:48 +0200
"Tziporet Koren" <[EMAIL PROTECTED]> wrote:

> OFED-1.2.5.4 is ready: 
> http://www.openfabrics.org/downloads/OFED/ofed-1.2.5/OFED-1.2.5.4.tgz
> 
> Changes since OFED 1.2.5 
>  
> - RDS: 
>   - Performance enhancements 
>   - GA for Oracle 11 
> - IPoIB: 
>   - Use NAPI by default 
>   - For small received packets, allocate a new, smaller SKB to relief
> accounting 
> on the socket. 
> - mlx4: 
>   - Enable changing default max HCA resource limits using module
> options. 
>   - Support opening of more resources then the default by increasing
> command 
> timeout for INIT_HCA to 10 seconds 
> - PPC64 support: 
>   - Fixed compilation problems on SLES10 SP1 
> 
> Changes from OFED 1.2.5.3: 
> == 
> - Low level drivers update:
>   - cxgb3: Pull in latest fixes.
>   - ipath: Pull in latest fixes.
> - OSes support:
>   - Added support for SLES9 SP4 (no QA was done)
>   - Added support for RHEL5 up1 (no QA was done)
> - IPOIB:
>   - Removed the usage of unsignalled QP in Tx due to deadlock.
> - RDS:
>   - Relax the header consistency check on fragment reassembly
> 
> 
> Tziporet & Vlad 
> 
> 
> 
> 
> Tziporet Koren
> Software Director
> Mellanox Technologies
> mailto: [EMAIL PROTECTED]
> Tel +972-4-9097200, ext 380
> 
> ___
> general mailing list
> [EMAIL PROTECTED]
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] OFED 1.3 Beta release is available

2007-12-04 Thread Tang, Changqing
Here is an issue we have:

struct ibv_context {
struct ibv_device  *device;
struct ibv_context_ops  ops;
int cmd_fd;
int async_fd;
int num_comp_vectors;
pthread_mutex_t mutex;
void   *abi_compat;
};

The binary is compiled with OFED 1.2 header files,  it tries to set async_fd to 
non-blocking, I get error:
Bad file descriptor.   If I compile the binary with OFED-1.3-beta header files 
(with XRC changes), it works fine.


Is this the expected behavior, or there will be a fix ?


Thanks.
--CQ Tang



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tziporet Koren
Sent: Thursday, November 22, 2007 9:46 AM
To: ewg@lists.openfabrics.org
Cc: [EMAIL PROTECTED]
Subject: [ofa-general] OFED 1.3 Beta release is available


Hi,

OFED 1.3 Beta release is available on
http://www.openfabrics.org/downloads/OFED/ofed-1.3/OFED-1.3-beta2.tgz
To get BUILD_ID run ofed_info

Please report any issues in bugzilla https://bugs.openfabrics.org/

The RC1 release is expected on December 5

Tziporet & Vlad



Release information:

OS support:
Novell:
- SLES10
- SLES10 SP1 and up1
Redhat:
- Redhat EL4 up4 and up5
- Redhat EL5 and up1
kernel.org:
- 2.6.23 and 2.6.24-rc2

Systems:
* x86_64
* x86
* ia64
* ppc64*

Main Changes from OFED 1.3-alpha


 *   Kernel code based on 2.6.24-rc2
 *   New packages:
*   SRP target
*   qperf test from Qlogic
*   ibsim package
*   uDAPL 2.0 library (1.0 & 2.0 are coexist)
 *   New OSes Support:
*   RHEL 5 up1
*   SLES10 SP1 up1
 *   Compilation issues resolved:
*   Open MPI compilation on SLES10 SP1
*   ibutils compiles on SLES10 PPC64 (64 bits)
*   Apply patches that fix warning of backport patches
*   Prefix is now supported properly
 *   RDS implementation for API version 2 was updated form 1.2.5 branch
 *   Fix binary compatibility of libibverbs caused by XRC implementation
 *   Uninstall is now working properly
 *   ib-bonding update to release 19
 *   MPI packages update:
*   mvapich-1.0.0-1625.src.rpm
*   mvapich2-1.0.1-1.src.rpm
*   openmpi-1.2.4-1.src.rpm

Mlx4 driver specific changes:

 *   Enable changing the default of HCA resource limits with module parameters
 *   Default number of maximum QPs is now 128K (was 64K)
 *   Fixing max_cqe's (not adding an extra cqe)
 *   Fix state check in mlx4_qp_modify
 *   Sanity check userspace send queue sizes
 *   Several bug fixes in XRC


Tasks that should be completed for the beta release:

1. 32-bit libraries to be supported on SLES10 SP1 Update1.
2. Fix SDP stability issues
3. IPoIB performance improvements for small messages
4. Fix bugs
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [ofa-general] OFED 1.3 Beta release is available

2007-12-04 Thread Roland Dreier
 > Here is an issue we have:
 > 
 > struct ibv_context {
 > struct ibv_device  *device;
 > struct ibv_context_ops  ops;
 > int cmd_fd;
 > int async_fd;
 > int num_comp_vectors;
 > pthread_mutex_t mutex;
 > void   *abi_compat;
 > };
 > 
 > The binary is compiled with OFED 1.2 header files,  it tries to set async_fd 
 > to non-blocking, I get error:
 > Bad file descriptor.   If I compile the binary with OFED-1.3-beta header 
 > files (with XRC changes), it works fine.
 > 
 > Is this the expected behavior, or there will be a fix ?

Unfortunately the XRC patches were put into OFED 1.3 before they went
into the upstream libibverbs tree, so I have not reviewed them in
detail.  If XRC support requires an ABI change, then we'll have to
create a new ABI and provide versioned symbols for backwards compatibility.

However your problem seems quite strange: I don't see any change 
to struct ibv_context caused by the XRC patches.  So I don't
understand exactly what is causing the problem you see.  Can you debug
further to see which structure layout change is the real issue?

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Re: [ofa-general] OFED 1.3 Beta release is available

2007-12-04 Thread Roland Dreier
BTW, sifting through the OFED 1.3 libibverbs tree, I do see that the
commit to add max_xrc_domains to struct ibv_device_attr did break
things by adding the member in the middle of the structure (so that an
app compiled against the old header will see bogus values for
local_ca_ack_delay and phys_port_count.

Actually looking at the commit again, it's worse than that... anything
compiled against the old header that calls ibv_query_device() may get
memory corrupted, because the new ibv_query_device() writes to a
bigger structure than the app passes in.

The perils of not reviewing properly I guess...

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Re: [ofa-general] OFED 1.3 Beta release is available

2007-12-04 Thread Roland Dreier
oops, sorry... I see that the very next OFED 1.3 commit reverted that
change, so things aren't as bad as I thought.

Never mind.

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RE: [ofa-general] OFED 1.3 Beta release is available

2007-12-04 Thread Tang, Changqing

I think the problem is that  sizeof "struct ibv_context_ops" has changed, so 
the new driver returns a
big "struct ibv_context", app compiled with older header file has a smaller 
"struct ibv_context" and
use the old offset to find fields after "ops".

--CQ


> -Original Message-
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, December 04, 2007 6:18 PM
> To: Tang, Changqing
> Cc: Tziporet Koren; ewg@lists.openfabrics.org;
> [EMAIL PROTECTED]
> Subject: Re: [ofa-general] OFED 1.3 Beta release is available
>
>  > Here is an issue we have:
>  >
>  > struct ibv_context {
>  > struct ibv_device  *device;
>  > struct ibv_context_ops  ops;
>  > int cmd_fd;
>  > int async_fd;
>  > int num_comp_vectors;
>  > pthread_mutex_t mutex;
>  > void   *abi_compat;
>  > };
>  >
>  > The binary is compiled with OFED 1.2 header files,  it
> tries to set async_fd to non-blocking, I get error:
>  > Bad file descriptor.   If I compile the binary with
> OFED-1.3-beta header files (with XRC changes), it works fine.
>  >
>  > Is this the expected behavior, or there will be a fix ?
>
> Unfortunately the XRC patches were put into OFED 1.3 before
> they went into the upstream libibverbs tree, so I have not
> reviewed them in detail.  If XRC support requires an ABI
> change, then we'll have to create a new ABI and provide
> versioned symbols for backwards compatibility.
>
> However your problem seems quite strange: I don't see any
> change to struct ibv_context caused by the XRC patches.  So I
> don't understand exactly what is causing the problem you see.
>  Can you debug further to see which structure layout change
> is the real issue?
>
>  - R.
>
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [ofa-general] OFED 1.3 Beta release is available

2007-12-04 Thread Roland Dreier
 > I think the problem is that sizeof "struct ibv_context_ops" has
 > changed, so the new driver returns a big "struct ibv_context", app
 > compiled with older header file has a smaller "struct ibv_context"
 > and use the old offset to find fields after "ops".

Oh crud, you're obviously right.  For some reason I kept missing that
when I looked over the code.

I think the only alternative we have to preserve backwards
compatibility is to leave struct ibv_context_ops alone and change the
structure to:

struct ibv_context {
struct ibv_device  *device;
struct ibv_context_ops  ops;
int cmd_fd;
int async_fd;
int num_comp_vectors;
pthread_mutex_t mutex;
void   *abi_compat;
struct ibv_xrc_op  *xrc_ops;
};

with xrc_ops added at the end.  It's my fault for not making the ops
member a pointer I guess.

Tziporet/Jack/whoever -- please fix up the libibverbs you ship for
OFED 1.3 to resolve this.

We can clean this up for libibverbs 1.2 when the ABI can change,
if/when we have something worth breaking the ABI for.

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg