RE: [[PATCH v1] 18/37] [CIFS] SMBD: Implement API for upper layer to send data
> -Original Message- > From: Tom Talpey > Sent: Monday, August 14, 2017 1:44 PM > To: Long Li ; Steve French ; > linux-c...@vger.kernel.org; samba-techni...@lists.samba.org; linux- > ker...@vger.kernel.org; linux-r...@vger.kernel.org > Subject: RE: [[PATCH v1] 18/37] [CIFS] SMBD: Implement API for upper layer > to send data > > > -Original Message- > > From: linux-cifs-ow...@vger.kernel.org [mailto:linux-cifs- > > ow...@vger.kernel.org] On Behalf Of Long Li > > Sent: Wednesday, August 2, 2017 4:10 PM > > To: Steve French ; linux-c...@vger.kernel.org; > > samba- techni...@lists.samba.org; linux-kernel@vger.kernel.org > > Cc: Long Li > > Subject: [[PATCH v1] 18/37] [CIFS] SMBD: Implement API for upper layer > > to send data > > > > +/* > > + * Write data to transport > > + * Each rqst is transported as a SMBDirect payload > > + * rqst: the data to write > > + * return value: 0 if successfully write, otherwise error code */ > > +int cifs_rdma_write(struct cifs_rdma_info *info, struct smb_rqst > > +*rqst) { > > !!! > This is a VERY confusing name. It is not sending an RDMA Write, which will > confuse any RDMA-enlightened reader. It's performing an RDMA Send, so > that name is perhaps one possibility. I have fixed that in v3. > > > + if (info->transport_status != CIFS_RDMA_CONNECTED) { > > + log_cifs_write("disconnected returning -EIO\n"); > > + return -EIO; > > + } > > Isn't this optimizing the error case? There's no guarantee it's still > connected > by the time the following request construction occurs. Why not just proceed > without the check? I rearranged the shutdown logic in v3. Checking for transport status is still needed, but it checks after checking for other counters on pending activities. For example, on sending code: info->smbd_send_pending++; if (info->transport_status != SMBD_CONNECTED) { info->smbd_send_pending--; wake_up(&info->wait_smbd_send_pending); } On transport shutdown code: info->transport_status = SMBD_DISCONNECTING; ... ... ... log_rdma_event(INFO, "wait for all send to finish\n"); wait_event(info->wait_smbd_send_pending, info->smbd_send_pending == 0); It guarantees no sending code can enter transport after shutdown is finished. Shutdown is running on a separate work queue, so it is needed. > > > + /* Strip the first 4 bytes MS-SMB2 section 2.1 > > +* they are used only for TCP transport */ > > + iov[0].iov_base = (char*)rqst->rq_iov[0].iov_base + 4; > > + iov[0].iov_len = rqst->rq_iov[0].iov_len - 4; > > + buflen += iov[0].iov_len; > > Ok, that layering choice in the cifs.ko client code needs to be corrected. > After > all, it will need to be RDMA-aware to build the SMB3 read/write channel > structures. > And, the code in cifs_post_send_data() is allocating and building a structure > that could have been accounted for much earlier, avoiding the extra > overhead. > > That change could happen later, the hack is mostly ok for now. But > something needs to be said in a comment. > > Tom.
RE: [[PATCH v1] 18/37] [CIFS] SMBD: Implement API for upper layer to send data
> -Original Message- > From: Tom Talpey > Sent: Monday, August 14, 2017 1:44 PM > To: Long Li ; Steve French ; > linux-c...@vger.kernel.org; samba-techni...@lists.samba.org; linux- > ker...@vger.kernel.org; linux-r...@vger.kernel.org > Subject: RE: [[PATCH v1] 18/37] [CIFS] SMBD: Implement API for upper layer > to send data > > > -Original Message- > > From: linux-cifs-ow...@vger.kernel.org [mailto:linux-cifs- > > ow...@vger.kernel.org] On Behalf Of Long Li > > Sent: Wednesday, August 2, 2017 4:10 PM > > To: Steve French ; linux-c...@vger.kernel.org; > > samba- techni...@lists.samba.org; linux-kernel@vger.kernel.org > > Cc: Long Li > > Subject: [[PATCH v1] 18/37] [CIFS] SMBD: Implement API for upper layer > > to send data > > > > +/* > > + * Write data to transport > > + * Each rqst is transported as a SMBDirect payload > > + * rqst: the data to write > > + * return value: 0 if successfully write, otherwise error code */ > > +int cifs_rdma_write(struct cifs_rdma_info *info, struct smb_rqst > > +*rqst) { > > !!! > This is a VERY confusing name. It is not sending an RDMA Write, which will > confuse any RDMA-enlightened reader. It's performing an RDMA Send, so > that name is perhaps one possibility. > > > + if (info->transport_status != CIFS_RDMA_CONNECTED) { > > + log_cifs_write("disconnected returning -EIO\n"); > > + return -EIO; > > + } > > Isn't this optimizing the error case? There's no guarantee it's still > connected > by the time the following request construction occurs. Why not just proceed > without the check? > > > + /* Strip the first 4 bytes MS-SMB2 section 2.1 > > +* they are used only for TCP transport */ > > + iov[0].iov_base = (char*)rqst->rq_iov[0].iov_base + 4; > > + iov[0].iov_len = rqst->rq_iov[0].iov_len - 4; > > + buflen += iov[0].iov_len; > > Ok, that layering choice in the cifs.ko client code needs to be corrected. > After > all, it will need to be RDMA-aware to build the SMB3 read/write channel > structures. > And, the code in cifs_post_send_data() is allocating and building a structure > that could have been accounted for much earlier, avoiding the extra > overhead. > > That change could happen later, the hack is mostly ok for now. But > something needs to be said in a comment. Will address those in v2. > > Tom.
RE: [[PATCH v1] 18/37] [CIFS] SMBD: Implement API for upper layer to send data
> -Original Message- > From: linux-cifs-ow...@vger.kernel.org [mailto:linux-cifs- > ow...@vger.kernel.org] On Behalf Of Long Li > Sent: Wednesday, August 2, 2017 4:10 PM > To: Steve French ; linux-c...@vger.kernel.org; samba- > techni...@lists.samba.org; linux-kernel@vger.kernel.org > Cc: Long Li > Subject: [[PATCH v1] 18/37] [CIFS] SMBD: Implement API for upper layer to > send data > > +/* > + * Write data to transport > + * Each rqst is transported as a SMBDirect payload > + * rqst: the data to write > + * return value: 0 if successfully write, otherwise error code > + */ > +int cifs_rdma_write(struct cifs_rdma_info *info, struct smb_rqst *rqst) > +{ !!! This is a VERY confusing name. It is not sending an RDMA Write, which will confuse any RDMA-enlightened reader. It's performing an RDMA Send, so that name is perhaps one possibility. > + if (info->transport_status != CIFS_RDMA_CONNECTED) { > + log_cifs_write("disconnected returning -EIO\n"); > + return -EIO; > + } Isn't this optimizing the error case? There's no guarantee it's still connected by the time the following request construction occurs. Why not just proceed without the check? > + /* Strip the first 4 bytes MS-SMB2 section 2.1 > +* they are used only for TCP transport */ > + iov[0].iov_base = (char*)rqst->rq_iov[0].iov_base + 4; > + iov[0].iov_len = rqst->rq_iov[0].iov_len - 4; > + buflen += iov[0].iov_len; Ok, that layering choice in the cifs.ko client code needs to be corrected. After all, it will need to be RDMA-aware to build the SMB3 read/write channel structures. And, the code in cifs_post_send_data() is allocating and building a structure that could have been accounted for much earlier, avoiding the extra overhead. That change could happen later, the hack is mostly ok for now. But something needs to be said in a comment. Tom.
[[PATCH v1] 18/37] [CIFS] SMBD: Implement API for upper layer to send data
From: Long Li Implement cifs_rdma_write for send an upper layer data. Upper layer uses this function to do a RDMA send. This function is also used to pass SMB packets for doing a RDMA read/write via memory registration. Signed-off-by: Long Li --- fs/cifs/cifsrdma.c | 177 + fs/cifs/cifsrdma.h | 5 ++ 2 files changed, 182 insertions(+) diff --git a/fs/cifs/cifsrdma.c b/fs/cifs/cifsrdma.c index ef21f1c..eb48651 100644 --- a/fs/cifs/cifsrdma.c +++ b/fs/cifs/cifsrdma.c @@ -229,6 +229,10 @@ static void send_done(struct ib_cq *cq, struct ib_wc *wc) request->sge[i].length, DMA_TO_DEVICE); + if (atomic_dec_and_test(&request->info->send_pending)) { + wake_up(&request->info->wait_send_pending); + } + kfree(request->sge); mempool_free(request, request->info->request_mempool); } @@ -551,12 +555,14 @@ static int cifs_rdma_post_send_negotiate_req(struct cifs_rdma_info *info) request->sge[0].addr, request->sge[0].length, request->sge[0].lkey); + atomic_inc(&info->send_pending); rc = ib_post_send(info->id->qp, &send_wr, &send_wr_fail); if (!rc) return 0; // if we reach here, post send failed log_rdma_send("ib_post_send failed rc=%d\n", rc); + atomic_dec(&info->send_pending); ib_dma_unmap_single(info->id->device, request->sge[0].addr, request->sge[0].length, DMA_TO_DEVICE); @@ -662,12 +668,14 @@ static int cifs_rdma_post_send_page(struct cifs_rdma_info *info, struct page *pa send_wr.opcode = IB_WR_SEND; send_wr.send_flags = IB_SEND_SIGNALED; + atomic_inc(&info->send_pending); rc = ib_post_send(info->id->qp, &send_wr, &send_wr_fail); if (!rc) return 0; // post send failed log_rdma_send("ib_post_send failed rc=%d\n", rc); + atomic_dec(&info->send_pending); dma_mapping_failed: for (i=0; i<2; i++) @@ -768,11 +776,13 @@ static int cifs_rdma_post_send_empty(struct cifs_rdma_info *info) send_wr.opcode = IB_WR_SEND; send_wr.send_flags = IB_SEND_SIGNALED; + atomic_inc(&info->send_pending); rc = ib_post_send(info->id->qp, &send_wr, &send_wr_fail); if (!rc) return 0; log_rdma_send("ib_post_send failed rc=%d\n", rc); + atomic_dec(&info->send_pending); ib_dma_unmap_single(info->id->device, request->sge[0].addr, request->sge[0].length, DMA_TO_DEVICE); @@ -885,12 +895,14 @@ static int cifs_rdma_post_send_data( send_wr.opcode = IB_WR_SEND; send_wr.send_flags = IB_SEND_SIGNALED; + atomic_inc(&info->send_pending); rc = ib_post_send(info->id->qp, &send_wr, &send_wr_fail); if (!rc) return 0; // post send failed log_rdma_send("ib_post_send failed rc=%d\n", rc); + atomic_dec(&info->send_pending); dma_mapping_failure: for (i=0; ireceive_credit_max); init_waitqueue_head(&info->wait_send_queue); + init_waitqueue_head(&info->wait_send_pending); + atomic_set(&info->send_pending, 0); + init_waitqueue_head(&info->wait_recv_pending); atomic_set(&info->recv_pending, 0); @@ -1202,3 +1217,165 @@ struct cifs_rdma_info* cifs_create_rdma_session( kfree(info); return NULL; } + +/* + * Write data to transport + * Each rqst is transported as a SMBDirect payload + * rqst: the data to write + * return value: 0 if successfully write, otherwise error code + */ +int cifs_rdma_write(struct cifs_rdma_info *info, struct smb_rqst *rqst) +{ + struct kvec vec; + int nvecs; + int size; + int buflen=0, remaining_data_length; + int start, i, j; + int max_iov_size = info->max_send_size - sizeof(struct smbd_data_transfer); + struct kvec *iov; + int rc; + + if (info->transport_status != CIFS_RDMA_CONNECTED) { + log_cifs_write("disconnected returning -EIO\n"); + return -EIO; + } + + iov = kzalloc(sizeof(struct kvec)*rqst->rq_nvec, GFP_KERNEL); + if (!iov) { + log_cifs_write("failed to allocate iov returing -ENOMEM\n"); + return -ENOMEM; + } + + /* Strip the first 4 bytes MS-SMB2 section 2.1 +* they are used only for TCP transport */ + iov[0].iov_base = (char*)rqst->rq_iov[0].iov_base + 4; + iov[0].iov_len = rqst->rq_iov[0].iov_len - 4; + buflen += iov[0].iov_len; + + /* total up iov array first */ + for (i = 1; i < rqst->rq_nvec; i++) { + iov[i].iov_base = rqst->rq_iov[i].iov_base; + iov[i].iov_len = rqst->rq_iov[i].iov_len; + buflen += iov[i].iov_len; + } + + /* add in the page array if there is one */ + if (rqst->rq_npages) {