Re: [ewg] [GIT PULL] RDMA/nes: OFED 1.5.1 rc3 update
Tung, Chien Tin wrote: Vlad, Please pull from: git://sofa.openfabrics.org/~ctung/ofed-1.5.git ofed_kernel_1_5 for the following commits: Chien Tung (3): RDMA/nes: fix CX4 link detection in back-to-back configuration RDMA/nes: clear stall bit befor destroying nic qp RDMA/nes: update backports for OFED 1.5.1 rc3 Faisal Latif (2): RDMA/nes: Set assume_alligned_header bit RDMA/nes: enhance ethtool stats Thanks, Chien -- Chien Tung | chien.tin.t...@intel.com Done, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [ANNOUNCE] libnes-1.0.1 release
Tung, Chien Tin wrote: New release of libnes library v1.0.1 is available at: http://www.openfabrics.org/downloads/nes/ sha1sum: d0d123d477a7a55f3f9b861e66ebcde0fc09bbff libnes-1.0.1.tar.gz Changes since last release: Chien Tung (2): libnes: add support for new device id 0x0110 libnes: update license and copyright Vlad, please pull this in for OFED 1.5.1 rc3. Thanks, Chien -- Chien Tung | chien.tin.t...@intel.com Done, Regards, Vladimir ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH OFED-151] ehca: bump version number
Hi Vlad, please add this for OFED-1.5.1. Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com diff --git a/kernel_patches/fixes/ehca-0140-bump_version.patch b/kernel_patches/fixes/ehca-0140-bump_version.patch new file mode 100644 index 000..d28e073 --- /dev/null +++ b/kernel_patches/fixes/ehca-0140-bump_version.patch @@ -0,0 +1,13 @@ +Index: ofa_kernel-1.5.1/drivers/infiniband/hw/ehca/ehca_main.c +=== +--- ofa_kernel-1.5.1.orig/drivers/infiniband/hw/ehca/ehca_main.c ofa_kernel-1.5.1/drivers/infiniband/hw/ehca/ehca_main.c +@@ -52,7 +52,7 @@ + #include ehca_tools.h + #include hcp_if.h + +-#define HCAD_VERSION 0029 ++#define HCAD_VERSION 0030 + + MODULE_LICENSE(Dual BSD/GPL); + MODULE_AUTHOR(Christoph Raisch rai...@de.ibm.com); ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] ofa_1_5_kernel 20100304-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git git_branch: ofed_kernel_1_5 Common build parameters: Passed: Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16.60-0.54.5-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-128.el5 Passed on x86_64 with linux-2.6.18-164.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.27.19-5-smp Passed on x86_64 with linux-2.6.9-89.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Failed: ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Build Broken?
I'm having an issue with cma.c when running makedist.sh. It looks like EL5.5 is broken. Does anyone else have this problem? Thanks, Tom ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Build Broken?
Tom Tucker wrote: I'm having an issue with cma.c when running makedist.sh. It looks like EL5.5 is broken. Does anyone else have this problem? Looks like the 5.5 backport stuff is broken: git clone -q -s -n /home/swise/newgit/linux-2.6 /tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5 pushd /tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5 /tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5 ~/newgit/linux-2.6 /home/swise/newgit/linux-2.6/ofed_scripts/ofed_checkout.sh e0006c0b7cc5e378a210d1f00023a3be60d3c27c eaa5eec739637f32f8733d528ff0b94fd62b1214 /tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5.log /home/swise/newgit/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.18-EL5.5 /tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5.log Failed executing /home/swise/newgit/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.18-EL5.5 /tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5.log patching file drivers/infiniband/core/user_mad.c patching file drivers/infiniband/core/uverbs_main.c Hunk #1 succeeded at 802 (offset 1 line). /tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5/kernel_patches/backport/2.6.18-EL5.5/core__v2_6_23_dev_get_by_index.patch patching file drivers/infiniband/core/addr.c patching file drivers/infiniband/core/cma.c Hunk #1 FAILED at 1798. Hunk #2 FAILED at 3009. 2 out of 2 hunks FAILED -- saving rejects to file drivers/infiniband/core/cma.c.rej Failed to apply patch: /tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5/kernel_patches/backport/2.6.18-EL5.5/core__v2_6_23_dev_get_by_index.patch Build failed in /tmp/build-ofed_kernel-a13499 See log file /tmp/build-ofed_kernel-a13499/ofed_kernel-2.6.18-EL5.5.log ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [GIT PULL] RDMA/nes: update backports for RHEL 5.5
Vlad, Please pull from: git://sofa.openfabrics.org/home/ctung/scm/ofed-1.5.git ofed_kernel_1_5 for the following commit: commit 1ab03e6a07826382aa59f5a1b919e303ad92979d Author: Chien Tung chien.tin.t...@intel.com Date: Thu Mar 4 10:24:23 2010 -0600 RDMA/nes: update backports for RHEL 5.5 Signed-off-by: Chien Tung chien.tin.t...@intel.com I missed the RHEL 5.5 backports in the last commit. Thanks to the build broken message that alerted me to check and pull in RHEL 5.5 backport into my git. Thanks, Chien -- Chien Tung | chien.tin.t...@intel.com ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] nfsrdma fails to write big file,
Tom Tucker wrote: Mahesh Siddheshwar wrote: Hi Tom, Vu, Tom Tucker wrote: Roland Dreier wrote: + /* +* Add room for frmr register and invalidate WRs +* Requests sometimes have two chunks, each chunk +* requires to have different frmr. The safest +* WRs required are max_send_wr * 6; however, we +* get send completions and poll fast enough, it +* is pretty safe to have max_send_wr * 4. +*/ + ep-rep_attr.cap.max_send_wr *= 4; Seems like a bad design if there is a possibility of work queue overflow; if you're counting on events occurring in a particular order or completions being handled fast enough, then your design is going to fail in some high load situations, which I don't think you want. Vu, Would you please try the following: - Set the multiplier to 5 While trying to test this between a Linux client and Solaris server, I made the following changes in : /usr/src/ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c diff verbs.c.org verbs.c 653c653 ep-rep_attr.cap.max_send_wr *= 3; --- ep-rep_attr.cap.max_send_wr *= 8; 685c685 ep-rep_cqinit = ep-rep_attr.cap.max_send_wr/2 /* - 1*/; --- ep-rep_cqinit = ep-rep_attr.cap.max (I bumped it to 8) did make install. On reboot I see the errors on NFS READs as opposed to WRITEs as seen before, when I try to read a 10G file from the server. The client is running: RHEL 5.3 (2.6.18-128.el5PAE) with OFED-1.5.1-20100223-0740 bits. The client has an Sun IB HCA: SUN0070130001, MT25418, 2.7.0 firmware, hw_rev = a0. The server is running Solaris based on snv_128. rpcdebug output from the client: == RPC:85 call_bind (status 0) RPC:85 call_connect xprt ec78d800 is connected RPC:85 call_transmit (status 0) RPC:85 xprt_prepare_transmit RPC:85 xprt_cwnd_limited cong = 0 cwnd = 8192 RPC:85 rpc_xdr_encode (status 0) RPC:85 marshaling UNIX cred eddb4dc0 RPC:85 using AUTH_UNIX cred eddb4dc0 to wrap rpc data RPC:85 xprt_transmit(164) RPC: rpcrdma_inline_pullup: pad 0 destp 0xf1dd1410 len 164 hdrlen 164 RPC: rpcrdma_register_frmr_external: Using frmr ec7da920 to map 4 segments RPC: rpcrdma_create_chunks: write chunk elem 16...@0x38536d000:0xa601 (more) RPC: rpcrdma_register_frmr_external: Using frmr ec7da960 to map 1 segments RPC: rpcrdma_create_chunks: write chunk elem 1...@0x31dd153c:0xaa01 (last) RPC: rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 164 padlen 0 headerp 0xf1dd124c base 0xf1dd136c lkey 0x500 RPC:85 xmit complete RPC:85 sleep_on(queue xprt_pending time 4683109) RPC:85 added to queue ec78d994 xprt_pending RPC:85 setting alarm for 6 ms RPC: wake_up_next(ec78d944 xprt_resend) RPC: wake_up_next(ec78d8f4 xprt_sending) RPC: rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 ep ec78db40 RPC:85 __rpc_wake_up_task (now 4683110) RPC:85 disabling timer RPC:85 removed from queue ec78d994 xprt_pending RPC: __rpc_wake_up_task done RPC:85 __rpc_execute flags=0x1 RPC:85 call_status (status -107) RPC:85 call_bind (status 0) RPC:85 call_connect xprt ec78d800 is not connected RPC:85 xprt_connect xprt ec78d800 is not connected RPC:85 sleep_on(queue xprt_pending time 4683110) RPC:85 added to queue ec78d994 xprt_pending RPC:85 setting alarm for 6 ms RPC: rpcrdma_event_process: event rep ec116800 status 5 opcode 80 length 2493606 RPC: rpcrdma_event_process: recv WC status 5, connection lost RPC: rpcrdma_conn_upcall: disconnected: ec78dbccI4:20049 (ep 0xec78db40 event 0xa) RPC: rpcrdma_conn_upcall: disconnected rpcrdma: connection to ec78dbccI4:20049 closed (-103) RPC: xprt_rdma_connect_worker: reconnect == On the server I see: Mar 3 17:45:16 elena-ar hermon: [ID 271130 kern.notice] NOTICE: hermon0: Device Error: CQE remote access error Mar 3 17:45:16 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: bad sendreply Mar 3 17:45:21 elena-ar hermon: [ID 271130 kern.notice] NOTICE: hermon0: Device Error: CQE remote access error Mar 3 17:45:21 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: bad sendreply The remote access error is actually seen on RDMA_WRITE. Doing some more debug on the server with DTrace, I see that the destination address and length matches the write chunk element in the Linux debug output above. 0 9385 rib_write:entry daddr 38536d000, len 4000, hdl a601 0 9358 rib_init_sendwait:return ff44a715d308 1 9296 rib_svc_scq_handler:return 1f7 1 9356 rib_sendwait:return 14 1 9386 rib_write:return 14 ^^^ that is RDMA_FAILED in 1 63295
[ewg] [GIT PULL ofed-1.5.1] cxgb3 fixes
Hey Vlad, Please pull these upstream fixes from: ssh://v...@sofa.openfabrics.org/~swise/scm/ofed_kernel.git ofed_1_5 Steve. - commit 1e49d727e4a6ac012625ceb4f550783c632c11a9 Author: Steve Wise sw...@opengridcomputing.com Date: Thu Mar 4 10:38:48 2010 -0600 RDMA/cxgb3: wait at least one schedule cycle during device removal. During a hot-plug LLD removal event or an EEH error event, iw_cxgb3 must ensure that any/all threads that might be in a cxgb3 exported function concurrently must return from the function before iw_cxgb3 returns from its event processing. Do this by calling synchronize_net(). Signed-off-by: Steve Wise sw...@opengridcomputing.com commit 87581e84b1efa06c41e05b5865ca3b6430f5cac5 Author: Steve Wise sw...@opengridcomputing.com Date: Thu Mar 4 10:35:45 2010 -0600 cxgb3: fix hot plug removal crash From: Divy Le Ray d...@chelsio.com queue restart tasklets need to be stopped after napi handlers are stopped since the latter can restart them. So stop them after stopping napi. Signed-off-by: Divy Le Ray d...@chelsio.com ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg