Re: [ewg] nfsrdma fails to write big file,

2010-03-04 Thread Mahesh Siddheshwar
Tom Tucker wrote:
 Mahesh Siddheshwar wrote:
 Hi Tom, Vu,

 Tom Tucker wrote:
 Roland Dreier wrote:
   +   /*   +* Add room for frmr 
 register and invalidate WRs
   +* Requests sometimes have two chunks, each chunk
   +* requires to have different frmr. The safest
   +* WRs required are max_send_wr * 6; however, we
   +* get send completions and poll fast enough, it
   +* is pretty safe to have max_send_wr * 4.   
 +*/
   +   ep-rep_attr.cap.max_send_wr *= 4;

 Seems like a bad design if there is a possibility of work queue
 overflow; if you're counting on events occurring in a particular order
 or completions being handled fast enough, then your design is 
 going to
 fail in some high load situations, which I don't think you want.   

 Vu,

 Would you please try the following:

 - Set the multiplier to 5
 While trying to test this between a Linux client and Solaris server,
 I made the following changes in :
 /usr/src/ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c

 diff verbs.c.org verbs.c
 653c653
ep-rep_attr.cap.max_send_wr *= 3;
 ---
ep-rep_attr.cap.max_send_wr *= 8;
 685c685
ep-rep_cqinit = ep-rep_attr.cap.max_send_wr/2 /*  - 1*/;
 ---
ep-rep_cqinit = ep-rep_attr.cap.max

 (I bumped it to 8)

 did make install.
 On reboot I see the errors on NFS READs as opposed to WRITEs
 as seen before, when I try to read a 10G file from the server.

 The client is running: RHEL 5.3 (2.6.18-128.el5PAE) with
 OFED-1.5.1-20100223-0740 bits. The client has an Sun IB
 HCA: SUN0070130001, MT25418, 2.7.0 firmware, hw_rev = a0.
 The server is running Solaris based on snv_128.

 rpcdebug output from the client:

 ==
 RPC:85 call_bind (status 0)
 RPC:85 call_connect xprt ec78d800 is connected
 RPC:85 call_transmit (status 0)
 RPC:85 xprt_prepare_transmit
 RPC:85 xprt_cwnd_limited cong = 0 cwnd = 8192
 RPC:85 rpc_xdr_encode (status 0)
 RPC:85 marshaling UNIX cred eddb4dc0
 RPC:85 using AUTH_UNIX cred eddb4dc0 to wrap rpc data
 RPC:85 xprt_transmit(164)
 RPC:   rpcrdma_inline_pullup: pad 0 destp 0xf1dd1410 len 164 
 hdrlen 164
 RPC:   rpcrdma_register_frmr_external: Using frmr ec7da920 to map 
 4 segments
 RPC:   rpcrdma_create_chunks: write chunk elem 
 16...@0x38536d000:0xa601 (more)
 RPC:   rpcrdma_register_frmr_external: Using frmr ec7da960 to map 
 1 segments
 RPC:   rpcrdma_create_chunks: write chunk elem 
 1...@0x31dd153c:0xaa01 (last)
 RPC:   rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 164 
 padlen 0 headerp 0xf1dd124c base 0xf1dd136c lkey 0x500
 RPC:85 xmit complete
 RPC:85 sleep_on(queue xprt_pending time 4683109)
 RPC:85 added to queue ec78d994 xprt_pending
 RPC:85 setting alarm for 6 ms
 RPC:   wake_up_next(ec78d944 xprt_resend)
 RPC:   wake_up_next(ec78d8f4 xprt_sending)
 RPC:   rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 
 ep ec78db40
 RPC:85 __rpc_wake_up_task (now 4683110)
 RPC:85 disabling timer
 RPC:85 removed from queue ec78d994 xprt_pending
 RPC:   __rpc_wake_up_task done
 RPC:85 __rpc_execute flags=0x1
 RPC:85 call_status (status -107)
 RPC:85 call_bind (status 0)
 RPC:85 call_connect xprt ec78d800 is not connected
 RPC:85 xprt_connect xprt ec78d800 is not connected
 RPC:85 sleep_on(queue xprt_pending time 4683110)
 RPC:85 added to queue ec78d994 xprt_pending
 RPC:85 setting alarm for 6 ms
 RPC:   rpcrdma_event_process: event rep ec116800 status 5 opcode 
 80 length 2493606
 RPC:   rpcrdma_event_process: recv WC status 5, connection lost
 RPC:   rpcrdma_conn_upcall: disconnected: ec78dbccI4:20049 (ep 
 0xec78db40 event 0xa)
 RPC:   rpcrdma_conn_upcall: disconnected
 rpcrdma: connection to ec78dbccI4:20049 closed (-103)
 RPC:   xprt_rdma_connect_worker: reconnect
 ==

 On the server I see:

 Mar  3 17:45:16 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
 hermon0: Device Error: CQE remote access error
 Mar  3 17:45:16 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
 bad sendreply
 Mar  3 17:45:21 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
 hermon0: Device Error: CQE remote access error
 Mar  3 17:45:21 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
 bad sendreply

 The remote access error is actually seen on RDMA_WRITE.
 Doing some more debug on the server with DTrace, I see that
 the destination address and length matches the write chunk
 element in the Linux debug output above.


  0   9385  rib_write:entry daddr 38536d000, len 4000, 
 hdl a601
  0   9358 rib_init_sendwait:return ff44a715d308
  1   9296   rib_svc_scq_handler:return 1f7
  1   9356  rib_sendwait:return 14
  1   9386 rib_write:return 14

 ^^^ that is RDMA_FAILED in
  1  63295

Re: [ewg] nfsrdma fails to write big file,

2010-03-03 Thread Tom Tucker
Mahesh Siddheshwar wrote:
 Hi Tom, Vu,

 Tom Tucker wrote:
 Roland Dreier wrote:
   +   /*   +* Add room for frmr 
 register and invalidate WRs
   +* Requests sometimes have two chunks, each chunk
   +* requires to have different frmr. The safest
   +* WRs required are max_send_wr * 6; however, we
   +* get send completions and poll fast enough, it
   +* is pretty safe to have max_send_wr * 4.   
 +*/
   +   ep-rep_attr.cap.max_send_wr *= 4;

 Seems like a bad design if there is a possibility of work queue
 overflow; if you're counting on events occurring in a particular order
 or completions being handled fast enough, then your design is 
 going to
 fail in some high load situations, which I don't think you want.   

 Vu,

 Would you please try the following:

 - Set the multiplier to 5
 While trying to test this between a Linux client and Solaris server,
 I made the following changes in :
 /usr/src/ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c

 diff verbs.c.org verbs.c
 653c653
ep-rep_attr.cap.max_send_wr *= 3;
 ---
ep-rep_attr.cap.max_send_wr *= 8;
 685c685
ep-rep_cqinit = ep-rep_attr.cap.max_send_wr/2 /*  - 1*/;
 ---
ep-rep_cqinit = ep-rep_attr.cap.max

 (I bumped it to 8)

 did make install.
 On reboot I see the errors on NFS READs as opposed to WRITEs
 as seen before, when I try to read a 10G file from the server.

 The client is running: RHEL 5.3 (2.6.18-128.el5PAE) with
 OFED-1.5.1-20100223-0740 bits. The client has an Sun IB
 HCA: SUN0070130001, MT25418, 2.7.0 firmware, hw_rev = a0.
 The server is running Solaris based on snv_128.

 rpcdebug output from the client:

 ==
 RPC:85 call_bind (status 0)
 RPC:85 call_connect xprt ec78d800 is connected
 RPC:85 call_transmit (status 0)
 RPC:85 xprt_prepare_transmit
 RPC:85 xprt_cwnd_limited cong = 0 cwnd = 8192
 RPC:85 rpc_xdr_encode (status 0)
 RPC:85 marshaling UNIX cred eddb4dc0
 RPC:85 using AUTH_UNIX cred eddb4dc0 to wrap rpc data
 RPC:85 xprt_transmit(164)
 RPC:   rpcrdma_inline_pullup: pad 0 destp 0xf1dd1410 len 164 
 hdrlen 164
 RPC:   rpcrdma_register_frmr_external: Using frmr ec7da920 to map 
 4 segments
 RPC:   rpcrdma_create_chunks: write chunk elem 
 16...@0x38536d000:0xa601 (more)
 RPC:   rpcrdma_register_frmr_external: Using frmr ec7da960 to map 
 1 segments
 RPC:   rpcrdma_create_chunks: write chunk elem 
 1...@0x31dd153c:0xaa01 (last)
 RPC:   rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 164 
 padlen 0 headerp 0xf1dd124c base 0xf1dd136c lkey 0x500
 RPC:85 xmit complete
 RPC:85 sleep_on(queue xprt_pending time 4683109)
 RPC:85 added to queue ec78d994 xprt_pending
 RPC:85 setting alarm for 6 ms
 RPC:   wake_up_next(ec78d944 xprt_resend)
 RPC:   wake_up_next(ec78d8f4 xprt_sending)
 RPC:   rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 
 ep ec78db40
 RPC:85 __rpc_wake_up_task (now 4683110)
 RPC:85 disabling timer
 RPC:85 removed from queue ec78d994 xprt_pending
 RPC:   __rpc_wake_up_task done
 RPC:85 __rpc_execute flags=0x1
 RPC:85 call_status (status -107)
 RPC:85 call_bind (status 0)
 RPC:85 call_connect xprt ec78d800 is not connected
 RPC:85 xprt_connect xprt ec78d800 is not connected
 RPC:85 sleep_on(queue xprt_pending time 4683110)
 RPC:85 added to queue ec78d994 xprt_pending
 RPC:85 setting alarm for 6 ms
 RPC:   rpcrdma_event_process: event rep ec116800 status 5 opcode 
 80 length 2493606
 RPC:   rpcrdma_event_process: recv WC status 5, connection lost
 RPC:   rpcrdma_conn_upcall: disconnected: ec78dbccI4:20049 (ep 
 0xec78db40 event 0xa)
 RPC:   rpcrdma_conn_upcall: disconnected
 rpcrdma: connection to ec78dbccI4:20049 closed (-103)
 RPC:   xprt_rdma_connect_worker: reconnect
 ==

 On the server I see:

 Mar  3 17:45:16 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
 hermon0: Device Error: CQE remote access error
 Mar  3 17:45:16 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
 bad sendreply
 Mar  3 17:45:21 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
 hermon0: Device Error: CQE remote access error
 Mar  3 17:45:21 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
 bad sendreply

 The remote access error is actually seen on RDMA_WRITE.
 Doing some more debug on the server with DTrace, I see that
 the destination address and length matches the write chunk
 element in the Linux debug output above.


  0   9385  rib_write:entry daddr 38536d000, len 4000, 
 hdl a601
  0   9358 rib_init_sendwait:return ff44a715d308
  1   9296   rib_svc_scq_handler:return 1f7
  1   9356  rib_sendwait:return 14
  1   9386 rib_write:return 14

 ^^^ that is RDMA_FAILED in
  1  63295xdrrdma_send_read_data:return 0
  1  

Re: [ewg] nfsrdma fails to write big file,

2010-02-22 Thread Tom Tucker
Vu Pham wrote:
 Setup: 
 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600, ConnectX2
 QDR HCAs fw 2.7.8-6, RHEL 5.2.
 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.


 Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
 count=1*, operation fail, connection get drop, client cannot
 re-establish connection to server.
 After rebooting only the client, I can mount again.

 It happens with both solaris and linux nfsrdma servers.

 For linux client/server, I run memreg=5 (FRMR), I don't see problem with
 memreg=6 (global dma key)

   

Awesome. This is the key I think.

Thanks for the info Vu,
Tom


 On Solaris server snv 130, we see problem decoding write request of 32K.
 The client send two read chunks (32K  16-byte), the server fail to do
 rdma read on the 16-byte chunk (cqe.status = 10 ie.
 IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the connection. We
 don't see this problem on nfs version 3 on Solaris. Solaris server run
 normal memory registration mode.

 On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR

 I added these notes in bug #1919 (bugs.openfabrics.org) to track the
 issue.

 thanks,
 -vu
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] nfsrdma fails to write big file,

2010-02-22 Thread Vu Pham
Tom,

Some more info on the problem:
1. Running with memreg=4 (FMR) I can not reproduce the problem
2. I also see different error on client

Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name 'nobody'
does not map into domain 'localdomain' 
Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow
Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send
returned -12 cq_init 48 cq_count 32
Feb 22 12:17:00 mellanox-2 kernel: RPC:   rpcrdma_event_process:
send WC status 5, vend_err F5
Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to
13.20.1.9:20049 closed (-103)

-vu

 -Original Message-
 From: Tom Tucker [mailto:t...@opengridcomputing.com]
 Sent: Monday, February 22, 2010 10:49 AM
 To: Vu Pham
 Cc: linux-r...@vger.kernel.org; Mahesh Siddheshwar;
 ewg@lists.openfabrics.org
 Subject: Re: [ewg] nfsrdma fails to write big file,
 
 Vu Pham wrote:
  Setup:
  1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600,
 ConnectX2
  QDR HCAs fw 2.7.8-6, RHEL 5.2.
  2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.
 
 
  Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
  count=1*, operation fail, connection get drop, client cannot
  re-establish connection to server.
  After rebooting only the client, I can mount again.
 
  It happens with both solaris and linux nfsrdma servers.
 
  For linux client/server, I run memreg=5 (FRMR), I don't see problem
 with
  memreg=6 (global dma key)
 
 
 
 Awesome. This is the key I think.
 
 Thanks for the info Vu,
 Tom
 
 
  On Solaris server snv 130, we see problem decoding write request of
 32K.
  The client send two read chunks (32K  16-byte), the server fail to
 do
  rdma read on the 16-byte chunk (cqe.status = 10 ie.
  IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the
connection.
 We
  don't see this problem on nfs version 3 on Solaris. Solaris server
 run
  normal memory registration mode.
 
  On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR
 
  I added these notes in bug #1919 (bugs.openfabrics.org) to track the
  issue.
 
  thanks,
  -vu
  ___
  ewg mailing list
  ewg@lists.openfabrics.org
  http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
 

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg