[ewg] ofa_1_5_kernel 20100222-0200 daily build status

2010-02-22 Thread Vladimir Sokolovsky (Mellanox)
This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git
git_branch: ofed_kernel_1_5

Common build parameters: 

Passed:
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.26
Passed on i686 with linux-2.6.24
Passed on i686 with linux-2.6.22
Passed on i686 with linux-2.6.27
Passed on x86_64 with linux-2.6.16.60-0.54.5-smp
Passed on x86_64 with linux-2.6.16.60-0.21-smp
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18-128.el5
Passed on x86_64 with linux-2.6.18-164.el5
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18-93.el5
Passed on x86_64 with linux-2.6.24
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.22
Passed on x86_64 with linux-2.6.26
Passed on x86_64 with linux-2.6.27
Passed on x86_64 with linux-2.6.25
Passed on x86_64 with linux-2.6.27.19-5-smp
Passed on x86_64 with linux-2.6.9-89.ELsmp
Passed on x86_64 with linux-2.6.9-78.ELsmp
Passed on x86_64 with linux-2.6.9-67.ELsmp
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.24
Passed on ia64 with linux-2.6.23
Passed on ia64 with linux-2.6.22
Passed on ia64 with linux-2.6.26
Passed on ia64 with linux-2.6.25
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.19

Failed:
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] nfsrdma fails to write big file,

2010-02-22 Thread Tom Tucker
Vu Pham wrote:
 Setup: 
 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600, ConnectX2
 QDR HCAs fw 2.7.8-6, RHEL 5.2.
 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.


 Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
 count=1*, operation fail, connection get drop, client cannot
 re-establish connection to server.
 After rebooting only the client, I can mount again.

 It happens with both solaris and linux nfsrdma servers.

 For linux client/server, I run memreg=5 (FRMR), I don't see problem with
 memreg=6 (global dma key)

   

Awesome. This is the key I think.

Thanks for the info Vu,
Tom


 On Solaris server snv 130, we see problem decoding write request of 32K.
 The client send two read chunks (32K  16-byte), the server fail to do
 rdma read on the 16-byte chunk (cqe.status = 10 ie.
 IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the connection. We
 don't see this problem on nfs version 3 on Solaris. Solaris server run
 normal memory registration mode.

 On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR

 I added these notes in bug #1919 (bugs.openfabrics.org) to track the
 issue.

 thanks,
 -vu
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] sdp: Prevent kernel crash if device init fails (plus bonus fix)

2010-02-22 Thread Amir Vadai
Thanks Joachim.

I will add it to OFED

- Amir

On 02/22/2010 07:06 PM, Joachim Fenkes wrote:
 If sdp_add_device() fails, there is no client data stored in the IB device,
 leading to a kernel crash when a connection is being established. Fix this
 by rejecting connections when the device is not initialized.

 Also, fix a bad goto target in an error case early in sdp_init_qp().

 Signed-off-by: Joachim Fenkes fen...@de.ibm.com
 ---
  kernel_patches/fixes/sdp-0001-fix-error-path.patch |   38 
 
  1 files changed, 38 insertions(+), 0 deletions(-)
  create mode 100644 kernel_patches/fixes/sdp-0001-fix-error-path.patch

 diff --git a/kernel_patches/fixes/sdp-0001-fix-error-path.patch 
 b/kernel_patches/fixes/sdp-0001-fix-error-path.patch
 new file mode 100644
 index 000..5a5f784
 --- /dev/null
 +++ b/kernel_patches/fixes/sdp-0001-fix-error-path.patch
 @@ -0,0 +1,38 @@
 +[PATCH] sdp: Prevent kernel crash if device init fails (plus bonus fix)
 +
 +If sdp_add_device() fails, there is no client data stored in the IB device,
 +leading to a kernel crash when a connection is being established. Fix this
 +by rejecting connections when the device is not initialized.
 +
 +Also, fix a bad goto target in an error case early in sdp_init_qp().
 +
 +Signed-off-by: Joachim Fenkes fen...@de.ibm.com
 +
 +---
 +
 + sdp_cma.c |7 ++-
 + 1 file changed, 6 insertions(+), 1 deletion(-)
 +
 +diff -urp a/drivers/infiniband/ulp/sdp/sdp_cma.c 
 b/drivers/infiniband/ulp/sdp/sdp_cma.c
 +--- a/drivers/infiniband/ulp/sdp/sdp_cma.c   2010-02-19 15:39:32.0 
 +0100
  b/drivers/infiniband/ulp/sdp/sdp_cma.c   2010-02-19 15:38:13.0 
 +0100
 +@@ -94,13 +94,18 @@ static int sdp_init_qp(struct sock *sk, 
 + sdp_warn(sk, recv sge's. capability: %d needed: %ld\n,
 + sdp_sk(sk)-max_sge, SDP_MAX_RECV_SKB_FRAGS + 1);
 + rc = -ENOMEM;
 +-goto err_tx;
 ++goto err_rx;
 + }
 + 
 + qp_init_attr.cap.max_send_sge = sdp_sk(sk)-max_sge;
 + sdp_dbg(sk, Setting max send sge to: %d\n, sdp_sk(sk)-max_sge);
 + 
 + sdp_sk(sk)-sdp_dev = ib_get_client_data(device, sdp_client);
 ++if (!sdp_sk(sk)-sdp_dev) {
 ++sdp_warn(sk, SDP not available on device %s, device-name);
 ++rc = -ENODEV;
 ++goto err_rx;
 ++}
 + 
 + rc = sdp_rx_ring_create(sdp_sk(sk), device);
 + if (rc)
   
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] nfsrdma fails to write big file,

2010-02-22 Thread Vu Pham
Tom,

Some more info on the problem:
1. Running with memreg=4 (FMR) I can not reproduce the problem
2. I also see different error on client

Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name 'nobody'
does not map into domain 'localdomain' 
Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow
Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send
returned -12 cq_init 48 cq_count 32
Feb 22 12:17:00 mellanox-2 kernel: RPC:   rpcrdma_event_process:
send WC status 5, vend_err F5
Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to
13.20.1.9:20049 closed (-103)

-vu

 -Original Message-
 From: Tom Tucker [mailto:t...@opengridcomputing.com]
 Sent: Monday, February 22, 2010 10:49 AM
 To: Vu Pham
 Cc: linux-r...@vger.kernel.org; Mahesh Siddheshwar;
 ewg@lists.openfabrics.org
 Subject: Re: [ewg] nfsrdma fails to write big file,
 
 Vu Pham wrote:
  Setup:
  1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600,
 ConnectX2
  QDR HCAs fw 2.7.8-6, RHEL 5.2.
  2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.
 
 
  Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
  count=1*, operation fail, connection get drop, client cannot
  re-establish connection to server.
  After rebooting only the client, I can mount again.
 
  It happens with both solaris and linux nfsrdma servers.
 
  For linux client/server, I run memreg=5 (FRMR), I don't see problem
 with
  memreg=6 (global dma key)
 
 
 
 Awesome. This is the key I think.
 
 Thanks for the info Vu,
 Tom
 
 
  On Solaris server snv 130, we see problem decoding write request of
 32K.
  The client send two read chunks (32K  16-byte), the server fail to
 do
  rdma read on the 16-byte chunk (cqe.status = 10 ie.
  IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the
connection.
 We
  don't see this problem on nfs version 3 on Solaris. Solaris server
 run
  normal memory registration mode.
 
  On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR
 
  I added these notes in bug #1919 (bugs.openfabrics.org) to track the
  issue.
 
  thanks,
  -vu
  ___
  ewg mailing list
  ewg@lists.openfabrics.org
  http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
 

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg