On Mar 16, 2015, at 1:13 PM, Steve Wise <sw...@opengridcomputing.com> wrote:

> On 3/16/2015 1:15 PM, Sagi Grimberg wrote:
>> On 3/16/2015 6:48 PM, Chuck Lever wrote:
>>> 
>>> On Mar 16, 2015, at 3:28 AM, Sagi Grimberg <sa...@dev.mellanox.co.il> wrote:
>>> 
>>>> On 3/13/2015 11:27 PM, Chuck Lever wrote:
>>>>> There is very little common processing among the different external
>>>>> memory registration functions. Have rpcrdma_create_chunks() call
>>>>> the registration method directly. This removes a stack frame and a
>>>>> switch statement from the external registration path.
>>>>> 
>>>>> Signed-off-by: Chuck Lever <chuck.le...@oracle.com>
>>>>> ---
>>>>>  net/sunrpc/xprtrdma/fmr_ops.c      |   51 +++++++++++
>>>>>  net/sunrpc/xprtrdma/frwr_ops.c     |   88 ++++++++++++++++++
>>>>>  net/sunrpc/xprtrdma/physical_ops.c |   17 ++++
>>>>>  net/sunrpc/xprtrdma/rpc_rdma.c     |    5 +
>>>>>  net/sunrpc/xprtrdma/verbs.c        |  172 
>>>>> +-----------------------------------
>>>>>  net/sunrpc/xprtrdma/xprt_rdma.h    |    6 +
>>>>>  6 files changed, 166 insertions(+), 173 deletions(-)
>>>>> 
>>>>> diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
>>>>> index eec2660..45fb646 100644
>>>>> --- a/net/sunrpc/xprtrdma/fmr_ops.c
>>>>> +++ b/net/sunrpc/xprtrdma/fmr_ops.c
>>>>> @@ -29,7 +29,58 @@ fmr_op_maxpages(struct rpcrdma_xprt *r_xprt)
>>>>>               rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES);
>>>>>  }
>>>>> 
>>>>> +/* Use the ib_map_phys_fmr() verb to register a memory region
>>>>> + * for remote access via RDMA READ or RDMA WRITE.
>>>>> + */
>>>>> +static int
>>>>> +fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
>>>>> +       int nsegs, bool writing)
>>>>> +{
>>>>> +    struct rpcrdma_ia *ia = &r_xprt->rx_ia;
>>>>> +    struct rpcrdma_mr_seg *seg1 = seg;
>>>>> +    struct rpcrdma_mw *mw = seg1->rl_mw;
>>>>> +    u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
>>>>> +    int len, pageoff, i, rc;
>>>>> +
>>>>> +    pageoff = offset_in_page(seg1->mr_offset);
>>>>> +    seg1->mr_offset -= pageoff;    /* start of page */
>>>>> +    seg1->mr_len += pageoff;
>>>>> +    len = -pageoff;
>>>>> +    if (nsegs > RPCRDMA_MAX_FMR_SGES)
>>>>> +        nsegs = RPCRDMA_MAX_FMR_SGES;
>>>>> +    for (i = 0; i < nsegs;) {
>>>>> +        rpcrdma_map_one(ia, seg, writing);
>>>>> +        physaddrs[i] = seg->mr_dma;
>>>>> +        len += seg->mr_len;
>>>>> +        ++seg;
>>>>> +        ++i;
>>>>> +        /* Check for holes */
>>>>> +        if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
>>>>> +            offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
>>>>> +            break;
>>>>> +    }
>>>>> +
>>>>> +    rc = ib_map_phys_fmr(mw->r.fmr, physaddrs, i, seg1->mr_dma);
>>>>> +    if (rc)
>>>>> +        goto out_maperr;
>>>>> +
>>>>> +    seg1->mr_rkey = mw->r.fmr->rkey;
>>>>> +    seg1->mr_base = seg1->mr_dma + pageoff;
>>>>> +    seg1->mr_nsegs = i;
>>>>> +    seg1->mr_len = len;
>>>>> +    return i;
>>>>> +
>>>>> +out_maperr:
>>>>> +    dprintk("RPC:       %s: ib_map_phys_fmr %u@0x%llx+%i (%d) status 
>>>>> %i\n",
>>>>> +        __func__, len, (unsigned long long)seg1->mr_dma,
>>>>> +        pageoff, i, rc);
>>>>> +    while (i--)
>>>>> +        rpcrdma_unmap_one(ia, --seg);
>>>>> +    return rc;
>>>>> +}
>>>>> +
>>>>>  const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
>>>>> +    .ro_map                = fmr_op_map,
>>>>>      .ro_maxpages            = fmr_op_maxpages,
>>>>>      .ro_displayname            = "fmr",
>>>>>  };
>>>>> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c 
>>>>> b/net/sunrpc/xprtrdma/frwr_ops.c
>>>>> index 73a5ac8..2b5ccb0 100644
>>>>> --- a/net/sunrpc/xprtrdma/frwr_ops.c
>>>>> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
>>>>> @@ -29,7 +29,95 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
>>>>>               rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth);
>>>>>  }
>>>>> 
>>>>> +/* Post a FAST_REG Work Request to register a memory region
>>>>> + * for remote access via RDMA READ or RDMA WRITE.
>>>>> + */
>>>>> +static int
>>>>> +frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
>>>>> +        int nsegs, bool writing)
>>>>> +{
>>>>> +    struct rpcrdma_ia *ia = &r_xprt->rx_ia;
>>>>> +    struct rpcrdma_mr_seg *seg1 = seg;
>>>>> +    struct rpcrdma_mw *mw = seg1->rl_mw;
>>>>> +    struct rpcrdma_frmr *frmr = &mw->r.frmr;
>>>>> +    struct ib_mr *mr = frmr->fr_mr;
>>>>> +    struct ib_send_wr fastreg_wr, *bad_wr;
>>>>> +    u8 key;
>>>>> +    int len, pageoff;
>>>>> +    int i, rc;
>>>>> +    int seg_len;
>>>>> +    u64 pa;
>>>>> +    int page_no;
>>>>> +
>>>>> +    pageoff = offset_in_page(seg1->mr_offset);
>>>>> +    seg1->mr_offset -= pageoff;    /* start of page */
>>>>> +    seg1->mr_len += pageoff;
>>>>> +    len = -pageoff;
>>>>> +    if (nsegs > ia->ri_max_frmr_depth)
>>>>> +        nsegs = ia->ri_max_frmr_depth;
>>>>> +    for (page_no = i = 0; i < nsegs;) {
>>>>> +        rpcrdma_map_one(ia, seg, writing);
>>>>> +        pa = seg->mr_dma;
>>>>> +        for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
>>>>> +            frmr->fr_pgl->page_list[page_no++] = pa;
>>>>> +            pa += PAGE_SIZE;
>>>>> +        }
>>>>> +        len += seg->mr_len;
>>>>> +        ++seg;
>>>>> +        ++i;
>>>>> +        /* Check for holes */
>>>>> +        if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
>>>>> +            offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
>>>>> +            break;
>>>>> +    }
>>>>> +    dprintk("RPC:       %s: Using frmr %p to map %d segments\n",
>>>>> +        __func__, mw, i);
>>>>> +
>>>>> +    frmr->fr_state = FRMR_IS_VALID;
>>>>> +
>>>>> +    memset(&fastreg_wr, 0, sizeof(fastreg_wr));
>>>>> +    fastreg_wr.wr_id = (unsigned long)(void *)mw;
>>>>> +    fastreg_wr.opcode = IB_WR_FAST_REG_MR;
>>>>> +    fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma;
>>>>> +    fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl;
>>>>> +    fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
>>>>> +    fastreg_wr.wr.fast_reg.page_list_len = page_no;
>>>>> +    fastreg_wr.wr.fast_reg.length = page_no << PAGE_SHIFT;
>>>> 
>>>> Umm, is this necessarily true? will the region length be full pages?
>>>> Why don't you assign fast_reg.length = len?
>>>> I might be missing something here…
>>> 
>>> Don’t know. This goes back to when FRWR was introduced in this code,
>>> with commit 3197d309 in 2008.
>> 
>> I see.
>> 
>> So from inspecting at the code it seems that if you had a buffer of
>> length PAGE_SIZE + 512, the client would register two pages. So in case
>> the server attempts a write that exceeds this length the client might
>> get a data corruption instead of a much nicer remote access error
>> completion...
>> 
>> But then again, I may be missing something here...
>> 
> 
> I think you're correct.   The length does not need to be page aligned…

Anyone want to send a patch on top of this series? If not, I can look
into this after I get back home next week.

We need to confirm that there are no pervasive server bugs that
rely on the protection slop. If there are, a fix here could break
interoperability until the server is fixed.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to