RE: [PATCH V2 RFC 0/3] svcrdma: refactor marshalling logic
While testing with ocrdma driver I am finding server side SQ full. Following is the log, yet to identify why it's happening. Once this is reported Client side crashes due to some reason. My kdump is not working properly therefore I am not able to analyze the situation properly. May 19 23:47:02 neo01-el64 kernel: svcrdma: RDMA_WRITE rmr=8008b12, to=45a2d790c, xdr_off=0, write_len=68, vec-sge=88086cb4a0c8, vec-count=2 May 19 23:47:02 neo01-el64 kernel: svcrdma: send_reply returns 0 May 19 23:47:02 neo01-el64 kernel: svc: server 88086409a000 waiting for data (to = 360) May 19 23:47:02 neo01-el64 kernel: svc: transport 88087dfa2400 served by daemon 88086409a000 May 19 23:47:02 neo01-el64 kernel: svc: server 88086409a000, pool 0, transport 88087dfa2400, inuse=18 May 19 23:47:02 neo01-el64 kernel: svcrdma: rqstp=88086409a000 May 19 23:47:02 neo01-el64 kernel: svcrdma: processing ctxt=880866754540 on xprt=88087dfa2400, rqstp=88086409a000, status=0 May 19 23:47:02 neo01-el64 kernel: svcrdma: failed to post SQ WR rc=-22, sc_sq_count=0, sc_sq_depth=128 May 19 23:47:02 neo01-el64 kernel: svcrdma: Error -22 posting RDMA_READ May 19 23:47:02 neo01-el64 kernel: svc: got len=0 May 19 23:47:02 neo01-el64 kernel: svc: transport 88087dfa2400 served by daemon 88086e782000 May 19 23:47:02 neo01-el64 kernel: svc: transport 88087dfa2400 busy, not enqueued May 19 23:47:02 neo01-el64 kernel: svc: server 88086409a000 waiting for data (to = 360) May 19 23:47:02 neo01-el64 kernel: svc_recv: found XPT_CLOSE May 19 23:47:02 neo01-el64 kernel: svc: svc_delete_xprt(88087dfa2400) May 19 23:47:02 neo01-el64 kernel: svc: svc_rdma_detach(88087dfa2400) -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Steve Wise Sent: Wednesday, May 07, 2014 2:39 AM To: J. Bruce Fields Cc: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org; t...@opengridcomputing.com Subject: Re: [PATCH V2 RFC 0/3] svcrdma: refactor marshalling logic On 5/6/2014 2:27 PM, J. Bruce Fields wrote: On Tue, May 06, 2014 at 12:46:21PM -0500, Steve Wise wrote: This patch series refactors the NFSRDMA server marshalling logic to remove the intermediary map structures. It also fixes an existing bug where the NFSRDMA server was not minding the device fast register page list length limitations. I've also made a git repo available with these patches on top of 3.15-rc4: git://git.openfabrics.org/~swise/linux svcrdma-refactor Changes since V1: - fixed regression for devices that don't support FRMRs (see rdma_read_chunk_lcl()) - split patch up for closer review. However I request it be squashed before merging as they is not bisectable, and I think these changes should all be a single commit anyway. If it's not split up in a way that's bisectable, then yes, just don't bother. I didn't see a good way to split it up, have it bisectable, and not have all the big stuff in one patch. I think its a little more reviewable in these 3 patches, but when I post V3, I'll put it back as an uber patch. Hopefully folks can have a look at these 3 patches ignoring the bisect issue.Having said that, the rdma read logic really is better reviewed by look at the code after applying the patches. That's why I published a git branch. Thanks! Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V2 RFC 0/3] svcrdma: refactor marshalling logic
-Original Message- From: linux-nfs-ow...@vger.kernel.org [mailto:linux-nfs-ow...@vger.kernel.org] On Behalf Of Devesh Sharma Sent: Monday, May 19, 2014 2:07 PM To: Steve Wise; J. Bruce Fields Cc: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org; t...@opengridcomputing.com Subject: RE: [PATCH V2 RFC 0/3] svcrdma: refactor marshalling logic While testing with ocrdma driver I am finding server side SQ full. Following is the log, yet to identify why it's happening. Once this is reported Client side crashes due to some reason. My kdump is not working properly therefore I am not able to analyze the situation properly. May 19 23:47:02 neo01-el64 kernel: svcrdma: RDMA_WRITE rmr=8008b12, to=45a2d790c, xdr_off=0, write_len=68, vec-sge=88086cb4a0c8, vec-count=2 May 19 23:47:02 neo01-el64 kernel: svcrdma: send_reply returns 0 May 19 23:47:02 neo01-el64 kernel: svc: server 88086409a000 waiting for data (to = 360) May 19 23:47:02 neo01-el64 kernel: svc: transport 88087dfa2400 served by daemon 88086409a000 May 19 23:47:02 neo01-el64 kernel: svc: server 88086409a000, pool 0, transport 88087dfa2400, inuse=18 May 19 23:47:02 neo01-el64 kernel: svcrdma: rqstp=88086409a000 May 19 23:47:02 neo01-el64 kernel: svcrdma: processing ctxt=880866754540 on xprt=88087dfa2400, rqstp=88086409a000, status=0 May 19 23:47:02 neo01-el64 kernel: svcrdma: failed to post SQ WR rc=-22, sc_sq_count=0, sc_sq_depth=128 May 19 23:47:02 neo01-el64 kernel: svcrdma: Error -22 posting RDMA_READ Hey Deevesh, Looking ocrdma_post_send(),-22 (-EINVAL) is returned when the QP is not in RTS. If the SQ is full, -ENOMEM is returned. So I think the send error is a downstream error because the connection got knocked down. You should try and figure out what kicked the QP out of RTS. Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V2 RFC 0/3] svcrdma: refactor marshalling logic
Hi Steve, -Original Message- From: Steve Wise [mailto:sw...@opengridcomputing.com] Sent: Tuesday, May 20, 2014 12:44 AM To: Devesh Sharma; 'J. Bruce Fields' Cc: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org; t...@opengridcomputing.com Subject: RE: [PATCH V2 RFC 0/3] svcrdma: refactor marshalling logic -Original Message- From: linux-nfs-ow...@vger.kernel.org [mailto:linux-nfs-ow...@vger.kernel.org] On Behalf Of Devesh Sharma Sent: Monday, May 19, 2014 2:07 PM To: Steve Wise; J. Bruce Fields Cc: linux-...@vger.kernel.org; linux-rdma@vger.kernel.org; t...@opengridcomputing.com Subject: RE: [PATCH V2 RFC 0/3] svcrdma: refactor marshalling logic While testing with ocrdma driver I am finding server side SQ full. Following is the log, yet to identify why it's happening. Once this is reported Client side crashes due to some reason. My kdump is not working properly therefore I am not able to analyze the situation properly. May 19 23:47:02 neo01-el64 kernel: svcrdma: RDMA_WRITE rmr=8008b12, to=45a2d790c, xdr_off=0, write_len=68, vec-sge=88086cb4a0c8, vec-count=2 May 19 23:47:02 neo01-el64 kernel: svcrdma: send_reply returns 0 May 19 23:47:02 neo01-el64 kernel: svc: server 88086409a000 waiting for data (to = 360) May 19 23:47:02 neo01-el64 kernel: svc: transport 88087dfa2400 served by daemon 88086409a000 May 19 23:47:02 neo01-el64 kernel: svc: server 88086409a000, pool 0, transport 88087dfa2400, inuse=18 May 19 23:47:02 neo01-el64 kernel: svcrdma: rqstp=88086409a000 May 19 23:47:02 neo01-el64 kernel: svcrdma: processing ctxt=880866754540 on xprt=88087dfa2400, rqstp=88086409a000, status=0 May 19 23:47:02 neo01-el64 kernel: svcrdma: failed to post SQ WR rc=-22, sc_sq_count=0, sc_sq_depth=128 May 19 23:47:02 neo01-el64 kernel: svcrdma: Error -22 posting RDMA_READ Hey Deevesh, Looking ocrdma_post_send(),-22 (-EINVAL) is returned when the QP is not in RTS. If the SQ is full, -ENOMEM is returned. So I think the send error is a downstream error because the connection got knocked down. You should try and figure out what kicked the QP out of RTS. Oh wow! I perfectly missed it, let me go through the logs once again and update you. Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 RFC 0/3] svcrdma: refactor marshalling logic
On Tue, May 06, 2014 at 12:46:21PM -0500, Steve Wise wrote: This patch series refactors the NFSRDMA server marshalling logic to remove the intermediary map structures. It also fixes an existing bug where the NFSRDMA server was not minding the device fast register page list length limitations. I've also made a git repo available with these patches on top of 3.15-rc4: git://git.openfabrics.org/~swise/linux svcrdma-refactor Changes since V1: - fixed regression for devices that don't support FRMRs (see rdma_read_chunk_lcl()) - split patch up for closer review. However I request it be squashed before merging as they is not bisectable, and I think these changes should all be a single commit anyway. If it's not split up in a way that's bisectable, then yes, just don't bother. --b. Please review, and test if you can. Signed-off-by: Tom Tucker t...@opengridcomputing.com Signed-off-by: Steve Wise sw...@opengridcomputing.com --- Tom Tucker (3): svcrdma: Sendto changes svcrdma: Recvfrom changes svcrdma: Transport and header file changes include/linux/sunrpc/svc_rdma.h |3 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 633 -- net/sunrpc/xprtrdma/svc_rdma_sendto.c| 230 +-- net/sunrpc/xprtrdma/svc_rdma_transport.c | 62 ++- 4 files changed, 318 insertions(+), 610 deletions(-) -- Steve / Tom -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 RFC 0/3] svcrdma: refactor marshalling logic
On 5/6/2014 2:27 PM, J. Bruce Fields wrote: On Tue, May 06, 2014 at 12:46:21PM -0500, Steve Wise wrote: This patch series refactors the NFSRDMA server marshalling logic to remove the intermediary map structures. It also fixes an existing bug where the NFSRDMA server was not minding the device fast register page list length limitations. I've also made a git repo available with these patches on top of 3.15-rc4: git://git.openfabrics.org/~swise/linux svcrdma-refactor Changes since V1: - fixed regression for devices that don't support FRMRs (see rdma_read_chunk_lcl()) - split patch up for closer review. However I request it be squashed before merging as they is not bisectable, and I think these changes should all be a single commit anyway. If it's not split up in a way that's bisectable, then yes, just don't bother. I didn't see a good way to split it up, have it bisectable, and not have all the big stuff in one patch. I think its a little more reviewable in these 3 patches, but when I post V3, I'll put it back as an uber patch. Hopefully folks can have a look at these 3 patches ignoring the bisect issue.Having said that, the rdma read logic really is better reviewed by look at the code after applying the patches. That's why I published a git branch. Thanks! Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html