Re: [ewg] [GIT PULL] RDMA/nes: OFED 1.5.1 rc3 update

2010-03-04 Thread Vladimir Sokolovsky
Tung, Chien Tin wrote:
 Vlad,
 
 Please pull from:
 
git://sofa.openfabrics.org/~ctung/ofed-1.5.git ofed_kernel_1_5
 
 for the following commits:
 
 
 Chien Tung (3):
   RDMA/nes: fix CX4 link detection in back-to-back configuration
   RDMA/nes: clear stall bit befor destroying nic qp
   RDMA/nes: update backports for OFED 1.5.1 rc3
 
 Faisal Latif (2):
   RDMA/nes: Set assume_alligned_header bit
   RDMA/nes: enhance ethtool stats
 
 Thanks,
 
 Chien
 
 
 --
 Chien Tung | chien.tin.t...@intel.com
 
 
 

Done,

Regards,
Vladimir
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [ANNOUNCE] libnes-1.0.1 release

2010-03-04 Thread Vladimir Sokolovsky
Tung, Chien Tin wrote:
 New release of libnes library v1.0.1 is available at:
 
 http://www.openfabrics.org/downloads/nes/
 
 sha1sum: d0d123d477a7a55f3f9b861e66ebcde0fc09bbff  libnes-1.0.1.tar.gz
 
 Changes since last release:
 
 Chien Tung (2):
   libnes: add support for new device id 0x0110
   libnes: update license and copyright
 
 Vlad, please pull this in for OFED 1.5.1 rc3.
 
 Thanks,
 
 Chien
 
 --
 Chien Tung | chien.tin.t...@intel.com
 
 

Done,

Regards,
Vladimir

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH OFED-151] ehca: bump version number

2010-03-04 Thread Alexander Schmidt
Hi Vlad,

please add this for OFED-1.5.1.

Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com

diff --git a/kernel_patches/fixes/ehca-0140-bump_version.patch 
b/kernel_patches/fixes/ehca-0140-bump_version.patch
new file mode 100644
index 000..d28e073
--- /dev/null
+++ b/kernel_patches/fixes/ehca-0140-bump_version.patch
@@ -0,0 +1,13 @@
+Index: ofa_kernel-1.5.1/drivers/infiniband/hw/ehca/ehca_main.c
+===
+--- ofa_kernel-1.5.1.orig/drivers/infiniband/hw/ehca/ehca_main.c
 ofa_kernel-1.5.1/drivers/infiniband/hw/ehca/ehca_main.c
+@@ -52,7 +52,7 @@
+ #include ehca_tools.h
+ #include hcp_if.h
+ 
+-#define HCAD_VERSION 0029
++#define HCAD_VERSION 0030
+ 
+ MODULE_LICENSE(Dual BSD/GPL);
+ MODULE_AUTHOR(Christoph Raisch rai...@de.ibm.com);
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] ofa_1_5_kernel 20100304-0200 daily build status

2010-03-04 Thread Vladimir Sokolovsky (Mellanox)
This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git
git_branch: ofed_kernel_1_5

Common build parameters: 

Passed:
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.24
Passed on i686 with linux-2.6.26
Passed on i686 with linux-2.6.22
Passed on i686 with linux-2.6.27
Passed on x86_64 with linux-2.6.16.60-0.54.5-smp
Passed on x86_64 with linux-2.6.16.60-0.21-smp
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18-128.el5
Passed on x86_64 with linux-2.6.18-164.el5
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18-93.el5
Passed on x86_64 with linux-2.6.24
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.22
Passed on x86_64 with linux-2.6.26
Passed on x86_64 with linux-2.6.27
Passed on x86_64 with linux-2.6.25
Passed on x86_64 with linux-2.6.27.19-5-smp
Passed on x86_64 with linux-2.6.9-89.ELsmp
Passed on x86_64 with linux-2.6.9-78.ELsmp
Passed on x86_64 with linux-2.6.9-67.ELsmp
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.23
Passed on ia64 with linux-2.6.24
Passed on ia64 with linux-2.6.22
Passed on ia64 with linux-2.6.26
Passed on ia64 with linux-2.6.25
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.19

Failed:
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Build Broken?

2010-03-04 Thread Tom Tucker

I'm having an issue with cma.c when running makedist.sh. It looks like 
EL5.5 is broken.

Does anyone else have this problem?

Thanks,
Tom

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Build Broken?

2010-03-04 Thread Steve Wise
Tom Tucker wrote:
 I'm having an issue with cma.c when running makedist.sh. It looks like 
 EL5.5 is broken.

 Does anyone else have this problem?
   
Looks like the 5.5 backport stuff is broken:

git clone -q -s -n /home/swise/newgit/linux-2.6 
/tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5
pushd /tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5
/tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5 ~/newgit/linux-2.6
/home/swise/newgit/linux-2.6/ofed_scripts/ofed_checkout.sh 
e0006c0b7cc5e378a210d1f00023a3be60d3c27c 
eaa5eec739637f32f8733d528ff0b94fd62b1214  
/tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5.log
/home/swise/newgit/linux-2.6/ofed_scripts/ofed_patch.sh 
--with-backport=2.6.18-EL5.5  
/tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5.log
Failed executing /home/swise/newgit/linux-2.6/ofed_scripts/ofed_patch.sh 
--with-backport=2.6.18-EL5.5  
/tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5.log
patching file drivers/infiniband/core/user_mad.c
patching file drivers/infiniband/core/uverbs_main.c
Hunk #1 succeeded at 802 (offset 1 line).

/tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5/kernel_patches/backport/2.6.18-EL5.5/core__v2_6_23_dev_get_by_index.patch
patching file drivers/infiniband/core/addr.c
patching file drivers/infiniband/core/cma.c
Hunk #1 FAILED at 1798.
Hunk #2 FAILED at 3009.
2 out of 2 hunks FAILED -- saving rejects to file 
drivers/infiniband/core/cma.c.rej
Failed to apply patch: 
/tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5/kernel_patches/backport/2.6.18-EL5.5/core__v2_6_23_dev_get_by_index.patch
Build failed in /tmp/build-ofed_kernel-a13499
See log file /tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5.log

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [GIT PULL] RDMA/nes: update backports for RHEL 5.5

2010-03-04 Thread Tung, Chien Tin
Vlad,

Please pull from:

 git://sofa.openfabrics.org/home/ctung/scm/ofed-1.5.git ofed_kernel_1_5

for the following commit:

commit 1ab03e6a07826382aa59f5a1b919e303ad92979d
Author: Chien Tung chien.tin.t...@intel.com
Date:   Thu Mar 4 10:24:23 2010 -0600

RDMA/nes: update backports for RHEL 5.5

Signed-off-by: Chien Tung chien.tin.t...@intel.com


I missed the RHEL 5.5 backports in the last commit.  Thanks to the build broken
message that alerted me to check and pull in RHEL 5.5 backport into my git.

Thanks,

Chien

--
Chien Tung | chien.tin.t...@intel.com


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] nfsrdma fails to write big file,

2010-03-04 Thread Mahesh Siddheshwar
Tom Tucker wrote:
 Mahesh Siddheshwar wrote:
 Hi Tom, Vu,

 Tom Tucker wrote:
 Roland Dreier wrote:
   +   /*   +* Add room for frmr 
 register and invalidate WRs
   +* Requests sometimes have two chunks, each chunk
   +* requires to have different frmr. The safest
   +* WRs required are max_send_wr * 6; however, we
   +* get send completions and poll fast enough, it
   +* is pretty safe to have max_send_wr * 4.   
 +*/
   +   ep-rep_attr.cap.max_send_wr *= 4;

 Seems like a bad design if there is a possibility of work queue
 overflow; if you're counting on events occurring in a particular order
 or completions being handled fast enough, then your design is 
 going to
 fail in some high load situations, which I don't think you want.   

 Vu,

 Would you please try the following:

 - Set the multiplier to 5
 While trying to test this between a Linux client and Solaris server,
 I made the following changes in :
 /usr/src/ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c

 diff verbs.c.org verbs.c
 653c653
ep-rep_attr.cap.max_send_wr *= 3;
 ---
ep-rep_attr.cap.max_send_wr *= 8;
 685c685
ep-rep_cqinit = ep-rep_attr.cap.max_send_wr/2 /*  - 1*/;
 ---
ep-rep_cqinit = ep-rep_attr.cap.max

 (I bumped it to 8)

 did make install.
 On reboot I see the errors on NFS READs as opposed to WRITEs
 as seen before, when I try to read a 10G file from the server.

 The client is running: RHEL 5.3 (2.6.18-128.el5PAE) with
 OFED-1.5.1-20100223-0740 bits. The client has an Sun IB
 HCA: SUN0070130001, MT25418, 2.7.0 firmware, hw_rev = a0.
 The server is running Solaris based on snv_128.

 rpcdebug output from the client:

 ==
 RPC:85 call_bind (status 0)
 RPC:85 call_connect xprt ec78d800 is connected
 RPC:85 call_transmit (status 0)
 RPC:85 xprt_prepare_transmit
 RPC:85 xprt_cwnd_limited cong = 0 cwnd = 8192
 RPC:85 rpc_xdr_encode (status 0)
 RPC:85 marshaling UNIX cred eddb4dc0
 RPC:85 using AUTH_UNIX cred eddb4dc0 to wrap rpc data
 RPC:85 xprt_transmit(164)
 RPC:   rpcrdma_inline_pullup: pad 0 destp 0xf1dd1410 len 164 
 hdrlen 164
 RPC:   rpcrdma_register_frmr_external: Using frmr ec7da920 to map 
 4 segments
 RPC:   rpcrdma_create_chunks: write chunk elem 
 16...@0x38536d000:0xa601 (more)
 RPC:   rpcrdma_register_frmr_external: Using frmr ec7da960 to map 
 1 segments
 RPC:   rpcrdma_create_chunks: write chunk elem 
 1...@0x31dd153c:0xaa01 (last)
 RPC:   rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 164 
 padlen 0 headerp 0xf1dd124c base 0xf1dd136c lkey 0x500
 RPC:85 xmit complete
 RPC:85 sleep_on(queue xprt_pending time 4683109)
 RPC:85 added to queue ec78d994 xprt_pending
 RPC:85 setting alarm for 6 ms
 RPC:   wake_up_next(ec78d944 xprt_resend)
 RPC:   wake_up_next(ec78d8f4 xprt_sending)
 RPC:   rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 
 ep ec78db40
 RPC:85 __rpc_wake_up_task (now 4683110)
 RPC:85 disabling timer
 RPC:85 removed from queue ec78d994 xprt_pending
 RPC:   __rpc_wake_up_task done
 RPC:85 __rpc_execute flags=0x1
 RPC:85 call_status (status -107)
 RPC:85 call_bind (status 0)
 RPC:85 call_connect xprt ec78d800 is not connected
 RPC:85 xprt_connect xprt ec78d800 is not connected
 RPC:85 sleep_on(queue xprt_pending time 4683110)
 RPC:85 added to queue ec78d994 xprt_pending
 RPC:85 setting alarm for 6 ms
 RPC:   rpcrdma_event_process: event rep ec116800 status 5 opcode 
 80 length 2493606
 RPC:   rpcrdma_event_process: recv WC status 5, connection lost
 RPC:   rpcrdma_conn_upcall: disconnected: ec78dbccI4:20049 (ep 
 0xec78db40 event 0xa)
 RPC:   rpcrdma_conn_upcall: disconnected
 rpcrdma: connection to ec78dbccI4:20049 closed (-103)
 RPC:   xprt_rdma_connect_worker: reconnect
 ==

 On the server I see:

 Mar  3 17:45:16 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
 hermon0: Device Error: CQE remote access error
 Mar  3 17:45:16 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
 bad sendreply
 Mar  3 17:45:21 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
 hermon0: Device Error: CQE remote access error
 Mar  3 17:45:21 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
 bad sendreply

 The remote access error is actually seen on RDMA_WRITE.
 Doing some more debug on the server with DTrace, I see that
 the destination address and length matches the write chunk
 element in the Linux debug output above.


  0   9385  rib_write:entry daddr 38536d000, len 4000, 
 hdl a601
  0   9358 rib_init_sendwait:return ff44a715d308
  1   9296   rib_svc_scq_handler:return 1f7
  1   9356  rib_sendwait:return 14
  1   9386 rib_write:return 14

 ^^^ that is RDMA_FAILED in
  1  63295

[ewg] [GIT PULL ofed-1.5.1] cxgb3 fixes

2010-03-04 Thread Steve Wise
Hey Vlad,

Please pull these upstream fixes from:

ssh://v...@sofa.openfabrics.org/~swise/scm/ofed_kernel.git ofed_1_5

Steve.

-


commit 1e49d727e4a6ac012625ceb4f550783c632c11a9
Author: Steve Wise sw...@opengridcomputing.com
Date:   Thu Mar 4 10:38:48 2010 -0600

RDMA/cxgb3: wait at least one schedule cycle during device removal.

During a hot-plug LLD removal event or an EEH error event, iw_cxgb3
must ensure that any/all threads that might be in a cxgb3 exported 
function
concurrently must return from the function before iw_cxgb3 returns from
its event processing. Do this by calling synchronize_net().

Signed-off-by: Steve Wise sw...@opengridcomputing.com

commit 87581e84b1efa06c41e05b5865ca3b6430f5cac5
Author: Steve Wise sw...@opengridcomputing.com
Date:   Thu Mar 4 10:35:45 2010 -0600

cxgb3: fix hot plug removal crash

From: Divy Le Ray d...@chelsio.com

queue restart tasklets need to be stopped after napi handlers are 
stopped
since the latter can restart them.  So stop them after stopping napi.

Signed-off-by: Divy Le Ray d...@chelsio.com

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg