[ewg] ofa_1_5_kernel 20100222-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git git_branch: ofed_kernel_1_5 Common build parameters: Passed: Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16.60-0.54.5-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-128.el5 Passed on x86_64 with linux-2.6.18-164.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.27.19-5-smp Passed on x86_64 with linux-2.6.9-89.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Failed: ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] nfsrdma fails to write big file,
Vu Pham wrote: Setup: 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600, ConnectX2 QDR HCAs fw 2.7.8-6, RHEL 5.2. 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA. Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M count=1*, operation fail, connection get drop, client cannot re-establish connection to server. After rebooting only the client, I can mount again. It happens with both solaris and linux nfsrdma servers. For linux client/server, I run memreg=5 (FRMR), I don't see problem with memreg=6 (global dma key) Awesome. This is the key I think. Thanks for the info Vu, Tom On Solaris server snv 130, we see problem decoding write request of 32K. The client send two read chunks (32K 16-byte), the server fail to do rdma read on the 16-byte chunk (cqe.status = 10 ie. IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the connection. We don't see this problem on nfs version 3 on Solaris. Solaris server run normal memory registration mode. On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR I added these notes in bug #1919 (bugs.openfabrics.org) to track the issue. thanks, -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH] sdp: Prevent kernel crash if device init fails (plus bonus fix)
Thanks Joachim. I will add it to OFED - Amir On 02/22/2010 07:06 PM, Joachim Fenkes wrote: If sdp_add_device() fails, there is no client data stored in the IB device, leading to a kernel crash when a connection is being established. Fix this by rejecting connections when the device is not initialized. Also, fix a bad goto target in an error case early in sdp_init_qp(). Signed-off-by: Joachim Fenkes fen...@de.ibm.com --- kernel_patches/fixes/sdp-0001-fix-error-path.patch | 38 1 files changed, 38 insertions(+), 0 deletions(-) create mode 100644 kernel_patches/fixes/sdp-0001-fix-error-path.patch diff --git a/kernel_patches/fixes/sdp-0001-fix-error-path.patch b/kernel_patches/fixes/sdp-0001-fix-error-path.patch new file mode 100644 index 000..5a5f784 --- /dev/null +++ b/kernel_patches/fixes/sdp-0001-fix-error-path.patch @@ -0,0 +1,38 @@ +[PATCH] sdp: Prevent kernel crash if device init fails (plus bonus fix) + +If sdp_add_device() fails, there is no client data stored in the IB device, +leading to a kernel crash when a connection is being established. Fix this +by rejecting connections when the device is not initialized. + +Also, fix a bad goto target in an error case early in sdp_init_qp(). + +Signed-off-by: Joachim Fenkes fen...@de.ibm.com + +--- + + sdp_cma.c |7 ++- + 1 file changed, 6 insertions(+), 1 deletion(-) + +diff -urp a/drivers/infiniband/ulp/sdp/sdp_cma.c b/drivers/infiniband/ulp/sdp/sdp_cma.c +--- a/drivers/infiniband/ulp/sdp/sdp_cma.c 2010-02-19 15:39:32.0 +0100 b/drivers/infiniband/ulp/sdp/sdp_cma.c 2010-02-19 15:38:13.0 +0100 +@@ -94,13 +94,18 @@ static int sdp_init_qp(struct sock *sk, + sdp_warn(sk, recv sge's. capability: %d needed: %ld\n, + sdp_sk(sk)-max_sge, SDP_MAX_RECV_SKB_FRAGS + 1); + rc = -ENOMEM; +-goto err_tx; ++goto err_rx; + } + + qp_init_attr.cap.max_send_sge = sdp_sk(sk)-max_sge; + sdp_dbg(sk, Setting max send sge to: %d\n, sdp_sk(sk)-max_sge); + + sdp_sk(sk)-sdp_dev = ib_get_client_data(device, sdp_client); ++if (!sdp_sk(sk)-sdp_dev) { ++sdp_warn(sk, SDP not available on device %s, device-name); ++rc = -ENODEV; ++goto err_rx; ++} + + rc = sdp_rx_ring_create(sdp_sk(sk), device); + if (rc) ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] nfsrdma fails to write big file,
Tom, Some more info on the problem: 1. Running with memreg=4 (FMR) I can not reproduce the problem 2. I also see different error on client Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name 'nobody' does not map into domain 'localdomain' Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send returned -12 cq_init 48 cq_count 32 Feb 22 12:17:00 mellanox-2 kernel: RPC: rpcrdma_event_process: send WC status 5, vend_err F5 Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to 13.20.1.9:20049 closed (-103) -vu -Original Message- From: Tom Tucker [mailto:t...@opengridcomputing.com] Sent: Monday, February 22, 2010 10:49 AM To: Vu Pham Cc: linux-r...@vger.kernel.org; Mahesh Siddheshwar; ewg@lists.openfabrics.org Subject: Re: [ewg] nfsrdma fails to write big file, Vu Pham wrote: Setup: 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600, ConnectX2 QDR HCAs fw 2.7.8-6, RHEL 5.2. 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA. Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M count=1*, operation fail, connection get drop, client cannot re-establish connection to server. After rebooting only the client, I can mount again. It happens with both solaris and linux nfsrdma servers. For linux client/server, I run memreg=5 (FRMR), I don't see problem with memreg=6 (global dma key) Awesome. This is the key I think. Thanks for the info Vu, Tom On Solaris server snv 130, we see problem decoding write request of 32K. The client send two read chunks (32K 16-byte), the server fail to do rdma read on the 16-byte chunk (cqe.status = 10 ie. IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the connection. We don't see this problem on nfs version 3 on Solaris. Solaris server run normal memory registration mode. On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR I added these notes in bug #1919 (bugs.openfabrics.org) to track the issue. thanks, -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg