On Mar 16, 2015, at 1:13 PM, Steve Wise <sw...@opengridcomputing.com> wrote:
> On 3/16/2015 1:15 PM, Sagi Grimberg wrote: >> On 3/16/2015 6:48 PM, Chuck Lever wrote: >>> >>> On Mar 16, 2015, at 3:28 AM, Sagi Grimberg <sa...@dev.mellanox.co.il> wrote: >>> >>>> On 3/13/2015 11:27 PM, Chuck Lever wrote: >>>>> There is very little common processing among the different external >>>>> memory registration functions. Have rpcrdma_create_chunks() call >>>>> the registration method directly. This removes a stack frame and a >>>>> switch statement from the external registration path. >>>>> >>>>> Signed-off-by: Chuck Lever <chuck.le...@oracle.com> >>>>> --- >>>>> net/sunrpc/xprtrdma/fmr_ops.c | 51 +++++++++++ >>>>> net/sunrpc/xprtrdma/frwr_ops.c | 88 ++++++++++++++++++ >>>>> net/sunrpc/xprtrdma/physical_ops.c | 17 ++++ >>>>> net/sunrpc/xprtrdma/rpc_rdma.c | 5 + >>>>> net/sunrpc/xprtrdma/verbs.c | 172 >>>>> +----------------------------------- >>>>> net/sunrpc/xprtrdma/xprt_rdma.h | 6 + >>>>> 6 files changed, 166 insertions(+), 173 deletions(-) >>>>> >>>>> diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c >>>>> index eec2660..45fb646 100644 >>>>> --- a/net/sunrpc/xprtrdma/fmr_ops.c >>>>> +++ b/net/sunrpc/xprtrdma/fmr_ops.c >>>>> @@ -29,7 +29,58 @@ fmr_op_maxpages(struct rpcrdma_xprt *r_xprt) >>>>> rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES); >>>>> } >>>>> >>>>> +/* Use the ib_map_phys_fmr() verb to register a memory region >>>>> + * for remote access via RDMA READ or RDMA WRITE. >>>>> + */ >>>>> +static int >>>>> +fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, >>>>> + int nsegs, bool writing) >>>>> +{ >>>>> + struct rpcrdma_ia *ia = &r_xprt->rx_ia; >>>>> + struct rpcrdma_mr_seg *seg1 = seg; >>>>> + struct rpcrdma_mw *mw = seg1->rl_mw; >>>>> + u64 physaddrs[RPCRDMA_MAX_DATA_SEGS]; >>>>> + int len, pageoff, i, rc; >>>>> + >>>>> + pageoff = offset_in_page(seg1->mr_offset); >>>>> + seg1->mr_offset -= pageoff; /* start of page */ >>>>> + seg1->mr_len += pageoff; >>>>> + len = -pageoff; >>>>> + if (nsegs > RPCRDMA_MAX_FMR_SGES) >>>>> + nsegs = RPCRDMA_MAX_FMR_SGES; >>>>> + for (i = 0; i < nsegs;) { >>>>> + rpcrdma_map_one(ia, seg, writing); >>>>> + physaddrs[i] = seg->mr_dma; >>>>> + len += seg->mr_len; >>>>> + ++seg; >>>>> + ++i; >>>>> + /* Check for holes */ >>>>> + if ((i < nsegs && offset_in_page(seg->mr_offset)) || >>>>> + offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len)) >>>>> + break; >>>>> + } >>>>> + >>>>> + rc = ib_map_phys_fmr(mw->r.fmr, physaddrs, i, seg1->mr_dma); >>>>> + if (rc) >>>>> + goto out_maperr; >>>>> + >>>>> + seg1->mr_rkey = mw->r.fmr->rkey; >>>>> + seg1->mr_base = seg1->mr_dma + pageoff; >>>>> + seg1->mr_nsegs = i; >>>>> + seg1->mr_len = len; >>>>> + return i; >>>>> + >>>>> +out_maperr: >>>>> + dprintk("RPC: %s: ib_map_phys_fmr %u@0x%llx+%i (%d) status >>>>> %i\n", >>>>> + __func__, len, (unsigned long long)seg1->mr_dma, >>>>> + pageoff, i, rc); >>>>> + while (i--) >>>>> + rpcrdma_unmap_one(ia, --seg); >>>>> + return rc; >>>>> +} >>>>> + >>>>> const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = { >>>>> + .ro_map = fmr_op_map, >>>>> .ro_maxpages = fmr_op_maxpages, >>>>> .ro_displayname = "fmr", >>>>> }; >>>>> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c >>>>> b/net/sunrpc/xprtrdma/frwr_ops.c >>>>> index 73a5ac8..2b5ccb0 100644 >>>>> --- a/net/sunrpc/xprtrdma/frwr_ops.c >>>>> +++ b/net/sunrpc/xprtrdma/frwr_ops.c >>>>> @@ -29,7 +29,95 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt) >>>>> rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth); >>>>> } >>>>> >>>>> +/* Post a FAST_REG Work Request to register a memory region >>>>> + * for remote access via RDMA READ or RDMA WRITE. >>>>> + */ >>>>> +static int >>>>> +frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, >>>>> + int nsegs, bool writing) >>>>> +{ >>>>> + struct rpcrdma_ia *ia = &r_xprt->rx_ia; >>>>> + struct rpcrdma_mr_seg *seg1 = seg; >>>>> + struct rpcrdma_mw *mw = seg1->rl_mw; >>>>> + struct rpcrdma_frmr *frmr = &mw->r.frmr; >>>>> + struct ib_mr *mr = frmr->fr_mr; >>>>> + struct ib_send_wr fastreg_wr, *bad_wr; >>>>> + u8 key; >>>>> + int len, pageoff; >>>>> + int i, rc; >>>>> + int seg_len; >>>>> + u64 pa; >>>>> + int page_no; >>>>> + >>>>> + pageoff = offset_in_page(seg1->mr_offset); >>>>> + seg1->mr_offset -= pageoff; /* start of page */ >>>>> + seg1->mr_len += pageoff; >>>>> + len = -pageoff; >>>>> + if (nsegs > ia->ri_max_frmr_depth) >>>>> + nsegs = ia->ri_max_frmr_depth; >>>>> + for (page_no = i = 0; i < nsegs;) { >>>>> + rpcrdma_map_one(ia, seg, writing); >>>>> + pa = seg->mr_dma; >>>>> + for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) { >>>>> + frmr->fr_pgl->page_list[page_no++] = pa; >>>>> + pa += PAGE_SIZE; >>>>> + } >>>>> + len += seg->mr_len; >>>>> + ++seg; >>>>> + ++i; >>>>> + /* Check for holes */ >>>>> + if ((i < nsegs && offset_in_page(seg->mr_offset)) || >>>>> + offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len)) >>>>> + break; >>>>> + } >>>>> + dprintk("RPC: %s: Using frmr %p to map %d segments\n", >>>>> + __func__, mw, i); >>>>> + >>>>> + frmr->fr_state = FRMR_IS_VALID; >>>>> + >>>>> + memset(&fastreg_wr, 0, sizeof(fastreg_wr)); >>>>> + fastreg_wr.wr_id = (unsigned long)(void *)mw; >>>>> + fastreg_wr.opcode = IB_WR_FAST_REG_MR; >>>>> + fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma; >>>>> + fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl; >>>>> + fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT; >>>>> + fastreg_wr.wr.fast_reg.page_list_len = page_no; >>>>> + fastreg_wr.wr.fast_reg.length = page_no << PAGE_SHIFT; >>>> >>>> Umm, is this necessarily true? will the region length be full pages? >>>> Why don't you assign fast_reg.length = len? >>>> I might be missing something here… >>> >>> Don’t know. This goes back to when FRWR was introduced in this code, >>> with commit 3197d309 in 2008. >> >> I see. >> >> So from inspecting at the code it seems that if you had a buffer of >> length PAGE_SIZE + 512, the client would register two pages. So in case >> the server attempts a write that exceeds this length the client might >> get a data corruption instead of a much nicer remote access error >> completion... >> >> But then again, I may be missing something here... >> > > I think you're correct. The length does not need to be page aligned… Anyone want to send a patch on top of this series? If not, I can look into this after I get back home next week. We need to confirm that there are no pervasive server bugs that rely on the protection slop. If there are, a fix here could break interoperability until the server is fixed. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html